From yw5aj at virginia.edu Thu Jan 1 13:00:00 2015 From: yw5aj at virginia.edu (Yuxiang Wang) Date: Thu, 1 Jan 2015 13:00:00 -0500 Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes Message-ID: Dear all, I am currently using a piece of C code, where one of the input argument of a function is **double. So, in numpy, I tried np.ctypeslib.ndpointer(ctypes.c_double), but obviously this wouldn't work because this is only *double, not **double. Then I tried np.ctypeslib.ndpointer(np.ctypeslib.ndpointer(ctypes.c_double)), but this didn't work either because it says "ArgumentError: argument 4: : array must have data type uint64 ". np.ctypeslib.ndpointer(ctypes.c_double, ndim=2) wound't work too, because **double is not the same with *double[]. Could anyone please give any thoughts to help? Thanks, Shawn -- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia yw5aj at virginia.edu +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/ From sturla.molden at gmail.com Thu Jan 1 13:30:42 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 01 Jan 2015 19:30:42 +0100 Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes In-Reply-To: References: Message-ID: You can pretend double** is an array of dtype np.intp. This is because on all modern systems, double** has the size of void*, and np.intp is an integer with the size of void* (np.intp maps to Py_intptr_t). Now you just need to fill in the adresses. If you have a 2d ndarray in C order, or at least one which is contiguous along the second dimension, you set up the array of double* like so: xpp = (x.__array_interface__['data'][0] + np.arange(x.shape[0])*x.strides[0]).astype(np.intp) (The last cast to np.intp is probably only required on Windows 64 with older NumPy versions, but it never hurts.) Next, make a dtype that corresponds to this Py_intptr_t array: doublepp = np.ctypeslib.ndpointer(dtype=np.intp) Declare your function with doublepp instead of ndpointer with dtype np.double, and pass xpp instead of the 2d array x. Sturla On 01/01/15 19:00, Yuxiang Wang wrote: > Dear all, > > I am currently using a piece of C code, where one of the input > argument of a function is **double. > > So, in numpy, I tried np.ctypeslib.ndpointer(ctypes.c_double), but > obviously this wouldn't work because this is only *double, not > **double. > > Then I tried np.ctypeslib.ndpointer(np.ctypeslib.ndpointer(ctypes.c_double)), > but this didn't work either because it says "ArgumentError: argument > 4: : array must have data type uint64 > ". > > np.ctypeslib.ndpointer(ctypes.c_double, ndim=2) wound't work too, > because **double is not the same with *double[]. > > Could anyone please give any thoughts to help? > > Thanks, > > Shawn > From sturla.molden at gmail.com Thu Jan 1 13:35:58 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 01 Jan 2015 19:35:58 +0100 Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes In-Reply-To: References: Message-ID: On 01/01/15 19:30, Sturla Molden wrote: > You can pretend double** is an array of dtype np.intp. This is because > on all modern systems, double** has the size of void*, and np.intp is an > integer with the size of void* (np.intp maps to Py_intptr_t). Well, it also requires that the user space is the lower half of the address space, which is usually true. But to be safe against this you should use np.uintp instead of np.intp: xpp = (x.__array_interface__['data'][0] + np.arange(x.shape[0])*x.strides[0]).astype(np.uintp) doublepp = np.ctypeslib.ndpointer(dtype=np.uintp) Sturla From sturla.molden at gmail.com Thu Jan 1 13:55:00 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 01 Jan 2015 19:55:00 +0100 Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes In-Reply-To: References: Message-ID: On 01/01/15 19:00, Yuxiang Wang wrote: > Could anyone please give any thoughts to help? Say you want to call "void foobar(int m, int n, double **x)" from dummy.so or dummpy.dll with ctypes. Here is a fully worked out example (no tested, but is will work unless I made a typo): import numpy as np from numpy.ctypeslib import ndpointer import ctypes __all__ = ['foobar'] _doublepp = ndpointer(dtype=np.uintp, ndim=1, order='c') _dll = ctypes.CDLL('dummpy.so') # or dummpy.dll _foobar = _dll.foobar _foobar.argtypes = [ctypes.c_int, ctypes.c_int, _doublepp] _foobar.restype = None def foobar(x): assert(x.flags['C_CONTIGUOUS']) assert(x.ndim == 2) xpp = (x.__array_interface__['data'][0] + np.arange(x.shape[0])*x.strides[0]).astype(np.uintp) m = ctype.c_int(x.shape[0]) n = ctype.c_int(x.shape[1]) _foobar(m,n,xpp) Sturla From njs at pobox.com Thu Jan 1 13:56:36 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 1 Jan 2015 18:56:36 +0000 Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes In-Reply-To: References: Message-ID: On Thu, Jan 1, 2015 at 6:00 PM, Yuxiang Wang wrote: > Dear all, > > I am currently using a piece of C code, where one of the input > argument of a function is **double. As you discovered, Numpy's ctypes utilities are helpful for getting a *double out of an ndarray, but they don't really have anything to do with **double's -- for that you should refer to the plain-old-ctypes documentation: https://docs.python.org/2/library/ctypes.html#ctypes.pointer However, I suspect that this question can't really be answered in a useful way without more information about why exactly the C code wants a **double (instead of a *double) and what it expects to do with it. E.g., is it going to throw away the passed in array and return a new one? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From yw5aj at virginia.edu Thu Jan 1 14:25:54 2015 From: yw5aj at virginia.edu (Yuxiang Wang) Date: Thu, 1 Jan 2015 14:25:54 -0500 Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes In-Reply-To: References: Message-ID: Great thanks to both Strula and also Nathaniel! @Strula, thanks for your help! And I do think your solution makes total sense. However, the code doesn't work well on my computer. ---------------- // dummy.c #include __declspec(dllexport) void foobar(const int m, const int n, const double **x, double **y) { size_t i, j; y = (** double)malloc(sizeof(double *) * m); for(i=0; i wrote: > On Thu, Jan 1, 2015 at 6:00 PM, Yuxiang Wang wrote: >> Dear all, >> >> I am currently using a piece of C code, where one of the input >> argument of a function is **double. > > As you discovered, Numpy's ctypes utilities are helpful for getting a > *double out of an ndarray, but they don't really have anything to do > with **double's -- for that you should refer to the plain-old-ctypes > documentation: https://docs.python.org/2/library/ctypes.html#ctypes.pointer > > However, I suspect that this question can't really be answered in a > useful way without more information about why exactly the C code wants > a **double (instead of a *double) and what it expects to do with it. > E.g., is it going to throw away the passed in array and return a new > one? > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia yw5aj at virginia.edu +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/ From sturla.molden at gmail.com Thu Jan 1 14:35:28 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 01 Jan 2015 20:35:28 +0100 Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes In-Reply-To: References: Message-ID: On 01/01/15 20:25, Yuxiang Wang wrote: > #include > > __declspec(dllexport) void foobar(const int m, const int n, const > double **x, double **y) > { > size_t i, j; > y = (** double)malloc(sizeof(double *) * m); > for(i=0; i y[i] = (*double)calloc(sizeof(double), n); > for(i=0; i for(j=0; j y[i][j] = x[i][j]; > } > Was I doing something wrong here? You are not getting the data back because of the malloc/calloc statements. The numpy array y after calling _foobar is still pointing to its original buffer, not the new memory you allocated. You just created a memory leak. Try this instead: __declspec(dllexport) void foobar(const int m, const int n, const double **x, double **y) { size_t i, j; for(i=0; i References: <20141229221017.GA31208@kudu.in-berlin.de> <20141230230339.GA6317@kudu.in-berlin.de> Message-ID: <20150101193810.GA2824@kudu.in-berlin.de> Hi, * Nathaniel Smith [2014-12-31]: > On Tue, Dec 30, 2014 at 11:03 PM, Valentin Haenel wrote: > > * Eric Moore [2014-12-30]: > >> On Monday, December 29, 2014, Valentin Haenel wrote: > >> > >> > Hi, > >> > > >> > how do I access the kind of the data from cython, i.e. the single > >> > character string: > >> > > >> > 'b' boolean > >> > 'i' (signed) integer > >> > 'u' unsigned integer > >> > 'f' floating-point > >> > 'c' complex-floating point > >> > 'O' (Python) objects > >> > 'S', 'a' (byte-)string > >> > 'U' Unicode > >> > 'V' raw data (void) > >> > > >> > In regular Python I can do: > >> > > >> > In [7]: d = np.dtype('S') > >> > > >> > In [8]: d.kind > >> > Out[8]: 'S' > >> > > >> > Looking at the definition of dtype that comes with cython, I see: > >> > > >> > ctypedef class numpy.dtype [object PyArray_Descr]: > >> > # Use PyDataType_* macros when possible, however there are no macros > >> > # for accessing some of the fields, so some are defined. Please > >> > # ask on cython-dev if you need more. > >> > cdef int type_num > >> > cdef int itemsize "elsize" > >> > cdef char byteorder > >> > cdef object fields > >> > cdef tuple names > >> > > >> > I.e. no kind. > > The problem is just that whoever wrote numpy.pxd was feeling a bit > lazy that day and only filled in the fields they felt were most > important :-). There are a bunch of public fields in PyArray_Descr > that are just being left out of the Cython file you quote: > > https://github.com/numpy/numpy/blob/master/numpy/core/include/numpy/ndarraytypes.h#L566 > > In particular, there's a 'char kind' field. > > The quick workaround is > > cdef extern from "*": > cdef struct my_numpy_dtype [object PyArray_Descr]: > cdef char kind > # ... whatever other fields you might need > > and then cast to my_numpy_dtype when you need to get at the kind field > from Cython. > > If feeling generous, then submit a PR to Cython adding 'cdef char > kind' to the definition above. If feeling extra generous, it would be > awesome if someone systematically went through and added all the > missing fields that are in the numpy header but not cython -- I've run > into these missing field issues annoyingly often myself, and it's > silly that we should all keep making our own individual workarounds > for numpy.pxd's limitations... Thanks for the suggestions, it got me thinking. So, I actually discovered an additional ugly workaround. Basically it turns out, that my dtype instance does have a 'kind' attribute, but it is a Python str object. Hence I needed to do: ord(dtype_.kind[0]) To cast it to a Cython char... This is because---for reasons I don't understand---when you define a char in cython and you try to assign a python object to it, that object needs to be an integer. Otherwise you get: TypeError: an integer is required During run-time. Using the hack above my code now compiles and the tests all pass. I would guess that it probably won't perform very well due to various python to c back and forth activities. V- PS: none the less I may look into getting some patches into cython as suggested, as the solution above isn't exactly clean code... From sturla.molden at gmail.com Thu Jan 1 14:52:23 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 01 Jan 2015 20:52:23 +0100 Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes In-Reply-To: References: Message-ID: On 01/01/15 19:56, Nathaniel Smith wrote: > However, I suspect that this question can't really be answered in a > useful way without more information about why exactly the C code wants > a **double (instead of a *double) and what it expects to do with it. > E.g., is it going to throw away the passed in array and return a new > one? That is an important question. The solution I provided only allows a 2D array to be passed in and possibly modified inplace. It does not allow the C function pass back a freshly allocated array. The problem is of course that the meaning of double** is ambiguous. It could mean a pointer to an array of pointers. But it could also mean a double* passed by reference, in which case the function would modify the pointer instead of the data it points to. Sturla From fomcl at yahoo.com Thu Jan 1 14:57:50 2015 From: fomcl at yahoo.com (Albert-Jan Roskam) Date: Thu, 1 Jan 2015 19:57:50 +0000 (UTC) Subject: [Numpy-discussion] numpy.fromiter in numpypy Message-ID: <645635478.812990.1420142270937.JavaMail.yahoo@jws10729.mail.gq1.yahoo.com> Hi, I would like to use the numpy implementation for Pypy. In particular, I would like to use numpy.fromiter, which is available according to this overview: http://buildbot.pypy.org/numpy-status/latest.html. However, contrary to what this website says, this function is not yet available. Conclusion: the website is wrong. Or am I missing something? albertjan at debian:~$ sudo pypy $(which pip) install -U git+https://bitbucket.org/pypy/numpy.git albertjan at debian:~$ sudo pypy -c 'import numpy' # sudo: as per the installation instructions albertjan at debian:~$ pypy Python 2.7.8 (f5dcc2477b97386c11e4b67f08a2d00fbd2fce5d, Sep 19 2014, 10:37:41) [PyPy 2.4.0 with GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>>> import sys >>>> import numpy as np >>>> np.__version__, sys.version ('1.9.0', '2.7.8 (f5dcc2477b97386c11e4b67f08a2d00fbd2fce5d, Sep 19 2014, 10:37:41)\n[PyPy 2.4.0 with GCC 4.8.2]') >>>> np.fromiter >>>> np.fromiter((i for i in range(10)), np.float) Traceback (most recent call last): File "", line 1, in File "/opt/pypy-2.4/site-packages/numpy/core/multiarray.py", line 55, in tmp raise NotImplementedError("%s not implemented yet" % func) NotImplementedError: fromiter not implemented yet The same also applies to numpy.fromfile Thanks in advance and happy 2015. Regards, Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From sturla.molden at gmail.com Thu Jan 1 15:06:34 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 01 Jan 2015 21:06:34 +0100 Subject: [Numpy-discussion] npymath on Windows In-Reply-To: References: Message-ID: On 28/12/14 01:59, Matthew Brett wrote: > As far as I can see, 'acosf' is defined in the msvc runtime library. > I guess that '_acosf' is defined in some mingw runtime library? AFAIK it is a GCC built-in function. When the GCC compiler or linker sees it the binary code will be inlined. https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Other-Builtins.html Sturla From sturla.molden at gmail.com Thu Jan 1 15:34:16 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 01 Jan 2015 21:34:16 +0100 Subject: [Numpy-discussion] npymath on Windows In-Reply-To: References: Message-ID: On 28/12/14 17:17, David Cournapeau wrote: > This is not really supported. You should avoid mixing compilers when > building C extensions using numpy C API. Either all mingw, or all MSVC. That is not really good enough. Even if we build binary wheels with MinGW (see link) the binary npymath library should be useable from MSVC. https://github.com/numpy/numpy/pull/5328 Sturla From ndarray at mac.com Thu Jan 1 16:35:08 2015 From: ndarray at mac.com (Alexander Belopolsky) Date: Thu, 1 Jan 2015 16:35:08 -0500 Subject: [Numpy-discussion] The future of ndarray.diagonal() Message-ID: A discussion [1] is currently underway at GitHub which will benefit from a larger forum. In version 1.9, the diagonal() method was changed to return a read-only (non-contiguous) view into the original array instead of a plain copy. Also, it has been announced [2] that in 1.10 the view will become read/write. A concern has now been raised [3] that this change breaks backward compatibility too much. Consider the following code: x = numy.eye(2) d = x.diagonal() d[0] = 2 In 1.8, this code runs without errors and results in [2, 1] stored in array d. In 1.9, this is an error. With the current plan, in 1.10 this will become valid again, but the result will be different: x[0,0] will be 2 while it is 1 in 1.8. Two alternatives are suggested for discussion: 1. Add copy=True flag to diagonal() method. 2. Roll back 1.9 change to diagonal() and introduce an additional diagonal_view() method to return a view. [1] https://github.com/numpy/numpy/pull/5409 [2] http://docs.scipy.org/doc/numpy/reference/generated/numpy.diagonal.html [3] http://khinsen.wordpress.com/2014/09/12/the-state-of-numpy/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Jan 1 17:16:34 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 1 Jan 2015 22:16:34 +0000 Subject: [Numpy-discussion] Access dtype kind from cython In-Reply-To: <20150101193810.GA2824@kudu.in-berlin.de> References: <20141229221017.GA31208@kudu.in-berlin.de> <20141230230339.GA6317@kudu.in-berlin.de> <20150101193810.GA2824@kudu.in-berlin.de> Message-ID: On Thu, Jan 1, 2015 at 7:38 PM, Valentin Haenel wrote: [...] > So, I actually discovered an additional ugly workaround. Basically it > turns out, that my dtype instance does have a 'kind' attribute, but it > is a Python str object. Hence I needed to do: > > ord(dtype_.kind[0]) Your Cython dtype object is simultaneously a C and Python object -- if you ask for a C-level attribute that Cython knows about, then it'll access the C struct field directly; if you ask for anything else, then it'll do a normal Python attribute lookup. > To cast it to a Cython char... This is because---for reasons I don't > understand---when you define a char in cython and you try to assign a > python object to it, that object needs to be an integer. Otherwise you > get: > > TypeError: an integer is required > > During run-time. This is because in C, char is the name for an integer type. It was the 70s, they didn't know any better... > Using the hack above my code now compiles and the tests all pass. I > would guess that it probably won't perform very well due to various > python to c back and forth activities. Eh, there's an excellent chance that it won't matter. Usually this kidn of thing only matters if you're accessing the field from inside a tight inner loop that gets called thousands of times per second. This is one of the nice things about Cython: you can be lazy and write ordinary Python code everywhere outside those loops, as compared to regular extension modules where you have to laboriously write everything in C, even the parts that aren't speed critical. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From yw5aj at virginia.edu Thu Jan 1 21:56:07 2015 From: yw5aj at virginia.edu (Yuxiang Wang) Date: Thu, 1 Jan 2015 21:56:07 -0500 Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes In-Reply-To: References: Message-ID: 1) @Strula Sorry about my stupid mistake! That piece of code totally gave away how green I am in coding C :) And yes, that piece of code works like a charm now! I am able to run my model. Thanks a million! 2) @Strula and also thanks for your insight on the limitation of the method. Currently I am just passing in 2d ndarray for data input, so I can get away with this method; but it is really important to keep that piece of knowledge in mind. 3) @Nathaniel Could you please give a hint on how should this be done with the ctypes library (only for reading a 2d ndarray)? I noticed that it wouldn't work if I set: _doublepp = ctypes.POINTER(ctypes.POINTER(ctypes.c_double)) xpp = x.ctypes.data_as(ctypes.POINTER(ctypes.POINTER(ctypes.c_double))) Could you please give a hint if possible? (Complete code is attached at the end of this message) 4) I wanted to say that it seems to me, as the project gradually scales up, Cython is easier to deal with, especially when I am using a lot of numpy arrays. If it is even higher dimensional data, it would be verbose while it is really succinct to use Cython. Attached is the complete code. Code #1: From Strula, and it worked: // dummy.c #include __declspec(dllexport) void foobar(const int m, const int n, const double **x, double **y) { size_t i, j; for(i=0; i wrote: > On 01/01/15 19:56, Nathaniel Smith wrote: > >> However, I suspect that this question can't really be answered in a >> useful way without more information about why exactly the C code wants >> a **double (instead of a *double) and what it expects to do with it. > >> E.g., is it going to throw away the passed in array and return a new >> one? > > That is an important question. > > The solution I provided only allows a 2D array to be passed in and > possibly modified inplace. It does not allow the C function pass back a > freshly allocated array. > > The problem is of course that the meaning of double** is ambiguous. It > could mean a pointer to an array of pointers. But it could also mean a > double* passed by reference, in which case the function would modify the > pointer instead of the data it points to. > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia yw5aj at virginia.edu +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/ From charlesr.harris at gmail.com Thu Jan 1 23:13:34 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 1 Jan 2015 21:13:34 -0700 Subject: [Numpy-discussion] numpy developers Message-ID: Hi All, I've invited Alex Griffing onto the team to be a numpy developer. He has been contributing fixes and reviews for a while and it is time to give him more opportunity to contribute. I think he will do well. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Fri Jan 2 01:56:57 2015 From: matti.picus at gmail.com (Matti Picus) Date: Fri, 02 Jan 2015 08:56:57 +0200 Subject: [Numpy-discussion] numpy.fromiter in numpypy In-Reply-To: References: Message-ID: <54A64139.2040506@gmail.com> An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Jan 2 04:17:18 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 2 Jan 2015 10:17:18 +0100 Subject: [Numpy-discussion] numpy developers In-Reply-To: References: Message-ID: On Fri, Jan 2, 2015 at 5:13 AM, Charles R Harris wrote: > Hi All, > > I've invited Alex Griffing onto the team to be a numpy developer. He has > been contributing fixes and reviews for a while and it is time to give him > more opportunity to contribute. I think he will do well. > +1 Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From simlangen at gmail.com Fri Jan 2 06:06:09 2015 From: simlangen at gmail.com (Simen Langseth) Date: Fri, 2 Jan 2015 20:06:09 +0900 Subject: [Numpy-discussion] Extracting required indices from the array of tuples Message-ID: import numpy as np from scipy import signal y = np.array([[2, 1, 2, 3, 2, 0, 1, 0], [2, 1, 2, 3, 2, 0, 1, 0]]) maximas = signal.argrelmax(y, axis=1) print maximas (array([0, 0, 1, 1], dtype=int64), array([3, 6, 3, 6], dtype=int64)) I want to extract only the first maxima of both rows, i.e., [3, 3] using the tuples (maximas). How would you do it? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Fri Jan 2 07:29:53 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Fri, 2 Jan 2015 04:29:53 -0800 Subject: [Numpy-discussion] Extracting required indices from the array of tuples In-Reply-To: References: Message-ID: On Fri, Jan 2, 2015 at 3:06 AM, Simen Langseth wrote: > import numpy as np > from scipy import signal > > y = np.array([[2, 1, 2, 3, 2, 0, 1, 0], > [2, 1, 2, 3, 2, 0, 1, 0]]) > > maximas = signal.argrelmax(y, axis=1) > > print maximas > > (array([0, 0, 1, 1], dtype=int64), array([3, 6, 3, 6], dtype=int64)) > > > I want to extract only the first maxima of both rows, i.e., [3, 3] using > the tuples (maximas). How would you do it? > > Something like this should work: >>> rows, cols = maximas >>> first_in_row = np.concatenate(([True], rows[:-1] != rows[1:])) >>> rows = rows[first_in_row] >>> cols = cols[first_in_row] >>> y[rows, cols] array([3, 3]) Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmkleffner at gmail.com Fri Jan 2 07:36:39 2015 From: cmkleffner at gmail.com (Carl Kleffner) Date: Fri, 2 Jan 2015 13:36:39 +0100 Subject: [Numpy-discussion] npymath on Windows In-Reply-To: References: Message-ID: Hi, without further testing; this approach may help: (1) create a shared library with all symbols from libnpymath.a: $ gcc -shared -o libnpymath.dll -Wl,--whole-archive libnpymath.a -Wl,--no-whole-archive -lm (2) create a def file: gendef libnpymath.dll There are now two files created by mings-w64 tools: libnpymath.dll, libnpymath.def (3) create import libs for MSVC: first open a new command Window with the VC command prompt: > lib /machine:i386 /def:libnpymath.def (for 64bit use: /machine:X64) Microsoft (R) Library Manager Version 9.00.30729.01 Copyright (C) Microsoft Corporation. All rights reserved. Creating library libnpymath.lib and object libnpymath.exp libnpymath.dll, libnpymath.lib and libnpymath.exp should be sufficient for MSVC. libnpymath.dll has to be deployed. -- carlkl 2015-01-01 21:34 GMT+01:00 Sturla Molden : > On 28/12/14 17:17, David Cournapeau wrote: > > > This is not really supported. You should avoid mixing compilers when > > building C extensions using numpy C API. Either all mingw, or all MSVC. > > That is not really good enough. Even if we build binary wheels with > MinGW (see link) the binary npymath library should be useable from MSVC. > > https://github.com/numpy/numpy/pull/5328 > > > Sturla > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From simlangen at gmail.com Fri Jan 2 07:39:18 2015 From: simlangen at gmail.com (Simen Langseth) Date: Fri, 2 Jan 2015 21:39:18 +0900 Subject: [Numpy-discussion] Extracting required indices from the array of tuples In-Reply-To: References: Message-ID: Dear Jaime: Thank you so much. Your codes are always great. By the way, I have been waiting for several hours to get satisfactory answer at: http://codereview.stackexchange.com/questions/75457/faster-way-of-using-interp1d-in-2d-array?noredirect=1#comment137329_75457 http://stackoverflow.com/questions/27735832/faster-way-of-using-interp1d-in-2d-array Please help me there if you have time. Simen On Fri, Jan 2, 2015 at 9:29 PM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > On Fri, Jan 2, 2015 at 3:06 AM, Simen Langseth > wrote: > >> import numpy as np >> from scipy import signal >> >> y = np.array([[2, 1, 2, 3, 2, 0, 1, 0], >> [2, 1, 2, 3, 2, 0, 1, 0]]) >> >> maximas = signal.argrelmax(y, axis=1) >> >> print maximas >> >> (array([0, 0, 1, 1], dtype=int64), array([3, 6, 3, 6], dtype=int64)) >> >> >> I want to extract only the first maxima of both rows, i.e., [3, 3] using >> the tuples (maximas). How would you do it? >> >> Something like this should work: > > >>> rows, cols = maximas > >>> first_in_row = np.concatenate(([True], rows[:-1] != rows[1:])) > >>> rows = rows[first_in_row] > >>> cols = cols[first_in_row] > >>> y[rows, cols] > array([3, 3]) > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Fri Jan 2 08:22:48 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 2 Jan 2015 13:22:48 +0000 (UTC) Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes References: Message-ID: <964675802441893669.982718sturla.molden-gmail.com@news.gmane.org> Yuxiang Wang wrote: > 4) I wanted to say that it seems to me, as the project gradually > scales up, Cython is easier to deal with, especially when I am using a > lot of numpy arrays. If it is even higher dimensional data, it would > be verbose while it is really succinct to use Cython. The easiest way to speed up NumPy code is to use Numba which is an LLVM based JIT compiler. Numba will often give you performance comparable to C for free. All you have to do is to add the @numba.jit decorator to your Python function and possibly include some type hints. If all you want is to speed up NumPy code, just use Numba and it will take care of it in at least 9 of 10 cases. Numexpr is also a JIT compiler which can speed up Numpy code, but it does not give as dramatic results as Numba. Cython is easier to work with than ctypes, particularly when the problem scales up. If you use typed memoryviews in Cython you can also avoid having to work with pointer arithmetics. Cython is mainly a competitior to using the Python C API manually for C extension modules to Python. Cython also allows you to wrap external C and C++ code, and e.g. use Python and C++ objects together. The drawback is that you need to learn the Cython language as well as Python and C and C++ and know how they differ. Cython also have many of the same hidden dangers as C++, due to the possibility of exceptions being raised between C statements. But because Cython is the most flexible tool for writing C extensions to Python you will in the long run do yourself a favor by learning to use it. ctypes is good when you have a DLL, possibly form a foreign source, and you just want to use it without any build step. CFFI is easier to work with than ctypes and has the same usecase. It can parse C headers and does not require you to define the C API with Python statements like ctypes do. Generally I would say it is alway better to use CFFI than ctypes. ctypes is also currently an orphan, albeit in the Python standard library, while CFFI is actively maintained. Numba will also JIT compile ctypes and CFFI calls to remove the extra overhead. This is good to know if you need to call a C function in a tight loop. In that case Numba can JIT compile away the Python as well as the ctypes/CFFI overhead. Fortran 90/95 is also underrated. It is easier to work with than C, and gives similar results performance wise. You can call Fortran with f2py, ctypes, CFFI, or Cython (use fwrap). Generally I would say that it is better for a student to learn C than Fortran if you have to choose, because C is also useful for other things than numerical computing. But if you want fast and robust numerical code, it is easier to get good results with Fortran than C or Cython. Sturla From sturla.molden at gmail.com Fri Jan 2 08:35:08 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 2 Jan 2015 13:35:08 +0000 (UTC) Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes References: Message-ID: <1048628219441898315.286638sturla.molden-gmail.com@news.gmane.org> Yuxiang Wang wrote: > 1) @Strula Sorry about my stupid mistake! That piece of code totally > gave away how green I am in coding C :) Don't worry. C is a high-level assember. It will bite you again and again, it happens to everyone. Those who say they have never made a stupid mistake while coding in C are lying. Sturla From charlesr.harris at gmail.com Fri Jan 2 21:04:57 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 2 Jan 2015 19:04:57 -0700 Subject: [Numpy-discussion] diag, diagonal, ravel and all that Message-ID: Hi All, The diag, diagonal, and ravel functions have recently been changed to preserve subtypes. However, this causes lots of backward compatibility problems for matrix users, in particular, scipy.sparse. One possibility for fixing this is to special case matrix and so that these functions continue to return 1-d arrays for matrix instances. This is kind of ugly as `a..ravel` will still return a matrix when a is a matrix, an ugly inconsistency. This may be a case where practicality beats beauty. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dgasmith at icloud.com Fri Jan 2 21:45:53 2015 From: dgasmith at icloud.com (Daniel Smith) Date: Fri, 02 Jan 2015 20:45:53 -0600 Subject: [Numpy-discussion] Optimizing multi-tensor contractions in numpy In-Reply-To: <0193F3A5-B3AA-41BF-9A47-536671801090@icloud.com> References: <0193F3A5-B3AA-41BF-9A47-536671801090@icloud.com> Message-ID: <8F357109-C3BD-44EE-804C-87254BD2D1A4@icloud.com> Hello everyone, I have been working on a chunk of code that basically sets out to provide a single function that can take an arbitrary einsum expression and computes it in the most optimal way. While np.einsum can compute arbitrary expressions, there are two drawbacks to using pure einsum: einsum does not consider building intermediate arrays for possible reductions in overall rank and is not currently capable of using a vendor BLAS. I have been working on a project that aims to solve both issues simultaneously: https://github.com/dgasmith/opt_einsum This program first builds the optimal way to contract the tensors together, or using my own nomenclature a ?path.? This path is then iterated over and uses tensordot when possible and einsum for everything else. In test cases the worst case scenario adds a 20 microsecond overhead penalty and, in the best case scenario, it can reduce the overall rank of the tensor. The primary (if somewhat exaggerated) example is a 5-term N^8 index transformation that can be reduced to N^5; even when N is very small (N=10) there is a 2,000 fold speed increase over pure einsum or, if using tensordot, a 2,400 fold speed increase. This is somewhat similar to the new np.linalg.multi_dot function. If you are interested in this function please head over to the github repo and check it out. I believe the README is starting to become self-explanatory, but feel free to email me with any questions. This originally started because I was looking into using numpy to rapidly prototype quantum chemistry codes. The results of which can be found here: https://github.com/dgasmith/psi4numpy As such, I am very interested in implementing this into numpy. While I think opt_einsum is in a pretty good place, there is still quite a bit to do (see outstanding issues in the README). Even if this is not something that would fit into numpy I would still be very interested in your comments. Thank you for your time, -Daniel Smith -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jan 2 22:21:49 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 2 Jan 2015 20:21:49 -0700 Subject: [Numpy-discussion] Optimizing multi-tensor contractions in numpy In-Reply-To: <8F357109-C3BD-44EE-804C-87254BD2D1A4@icloud.com> References: <0193F3A5-B3AA-41BF-9A47-536671801090@icloud.com> <8F357109-C3BD-44EE-804C-87254BD2D1A4@icloud.com> Message-ID: On Fri, Jan 2, 2015 at 7:45 PM, Daniel Smith wrote: > Hello everyone, > > I have been working on a chunk of code that basically sets out to provide > a single function that can take an arbitrary einsum expression and computes > it in the most optimal way. While np.einsum can compute arbitrary > expressions, there are two drawbacks to using pure einsum: einsum does not > consider building intermediate arrays for possible reductions in overall > rank and is not currently capable of using a vendor BLAS. I have been > working on a project that aims to solve both issues simultaneously: > > https://github.com/dgasmith/opt_einsum > > This program first builds the optimal way to contract the tensors > together, or using my own nomenclature a ?path.? This path is then iterated > over and uses tensordot when possible and einsum for everything else. In > test cases the worst case scenario adds a 20 microsecond overhead penalty > and, in the best case scenario, it can reduce the overall rank of the > tensor. The primary (if somewhat exaggerated) example is a 5-term N^8 index > transformation that can be reduced to N^5; even when N is very small (N=10) > there is a 2,000 fold speed increase over pure einsum or, if using > tensordot, a 2,400 fold speed increase. This is somewhat similar to the new > np.linalg.multi_dot function. > > If you are interested in this function please head over to the github repo > and check it out. I believe the README is starting to become > self-explanatory, but feel free to email me with any questions. > > This originally started because I was looking into using numpy to rapidly > prototype quantum chemistry codes. The results of which can be found here: > https://github.com/dgasmith/psi4numpy > > As such, I am very interested in implementing this into numpy. While I > think opt_einsum is in a pretty good place, there is still quite a bit to > do (see outstanding issues in the README). Even if this is not something > that would fit into numpy I would still be very interested in your comments. > > Sounds interesting. I wouldn't be opposed to including an optimized einsum in numpy, there has even been some mention of using blas. Note that cblas is used in multiarray in the current development branch, so that might be useful. I also looked into using einsum to implement the '@' operator coming in Python 3.5, but there was a rather large fixed overhead involved in parsing the input string, and the multiplications were much slower than the numpy dot operator. If those times could be reduced einsum might become a real possibility. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Sat Jan 3 10:05:17 2015 From: alan.isaac at gmail.com (Alan G Isaac) Date: Sat, 03 Jan 2015 10:05:17 -0500 Subject: [Numpy-discussion] diag, diagonal, ravel and all that In-Reply-To: References: Message-ID: <54A8052D.8020804@gmail.com> Would this really be practicality beating purity? It would be nice to have know the principle governing this. For example, is there a way to convincingly group these as array operations vs matrix operations? Personally I am puzzled by preserving subtype of `diagonal` and very especially of `ravel`. Has anyone requested this? (I can see the argument for `diag`.) Alan Isaac On 1/2/2015 9:04 PM, Charles R Harris wrote: > The diag, diagonal, and ravel functions have recently been changed to preserve subtypes. However, this causes lots of backward compatibility problems > for matrix users, in particular, scipy.sparse. One possibility for fixing this is to special case matrix and so that these functions continue to > return 1-d arrays for matrix instances. This is kind of ugly as `a..ravel` will still return a matrix when a is a matrix, an ugly inconsistency. This > may be a case where practicality beats beauty. From charlesr.harris at gmail.com Sat Jan 3 10:32:09 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 3 Jan 2015 08:32:09 -0700 Subject: [Numpy-discussion] diag, diagonal, ravel and all that In-Reply-To: <54A8052D.8020804@gmail.com> References: <54A8052D.8020804@gmail.com> Message-ID: On Sat, Jan 3, 2015 at 8:05 AM, Alan G Isaac wrote: > Would this really be practicality beating purity? > It would be nice to have know the principle governing this. > For example, is there a way to convincingly group these as > array operations vs matrix operations? > > Personally I am puzzled by preserving subtype of `diagonal` and > very especially of `ravel`. Has anyone requested this? > (I can see the argument for `diag`.) > > Alan Isaac > In [1]: from astropy import units as u In [2]: a = eye(2) * u.m In [3]: a Out[3]: In [4]: diagonal(a) Out[4]: In [5]: diag(a) Out[5]: In [6]: ravel(a) Out[6]: None of those examples keep the units without the recent changes. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Sat Jan 3 10:49:28 2015 From: shoyer at gmail.com (Stephan Hoyer) Date: Sat, 3 Jan 2015 17:49:28 +0200 Subject: [Numpy-discussion] Add a function to broadcast arrays to a given shape to numpy's stride_tricks? In-Reply-To: References: Message-ID: Here is an update on a new function for broadcasting arrays to a given shape (now named np.broadcast_to). I have a pull request up for review, which has received some feedback now: https://github.com/numpy/numpy/pull/5371 There is still at least one design decision to settle: should we expose "broadcast_shape" in the public API? In the current implementation, it is exposed as a public function in numpy.lib.tride_tricks (like as_strided), but it is not exported into the main numpy namespace. The alternatives would be to either make it a private function (_broadcast_shape) or expose it publicly (np.broadcast_shape). Please do speak if you have any thoughts to share on the implementation, either here or in the pull request. Best, Stephan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Jan 3 11:57:45 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 3 Jan 2015 16:57:45 +0000 Subject: [Numpy-discussion] Optimizing multi-tensor contractions in numpy In-Reply-To: <8F357109-C3BD-44EE-804C-87254BD2D1A4@icloud.com> References: <0193F3A5-B3AA-41BF-9A47-536671801090@icloud.com> <8F357109-C3BD-44EE-804C-87254BD2D1A4@icloud.com> Message-ID: On 3 Jan 2015 02:46, "Daniel Smith" wrote: > > Hello everyone, > > I have been working on a chunk of code that basically sets out to provide a single function that can take an arbitrary einsum expression and computes it in the most optimal way. While np.einsum can compute arbitrary expressions, there are two drawbacks to using pure einsum: einsum does not consider building intermediate arrays for possible reductions in overall rank and is not currently capable of using a vendor BLAS. I have been working on a project that aims to solve both issues simultaneously: > >https://github.com/dgasmith/opt_einsum > > This program first builds the optimal way to contract the tensors together, or using my own nomenclature a ?path.? This path is then iterated over and uses tensordot when possible and einsum for everything else. In test cases the worst case scenario adds a 20 microsecond overhead penalty and, in the best case scenario, it can reduce the overall rank of the tensor. The primary (if somewhat exaggerated) example is a 5-term N^8 index transformation that can be reduced to N^5; even when N is very small (N=10) there is a 2,000 fold speed increase over pure einsum or, if using tensordot, a 2,400 fold speed increase. This is somewhat similar to the new np.linalg.multi_dot function. This sounds super awesome. Who *wouldn't* want a free 2,000x speed increase? And I especially like your test suite. I'd also be interested in hearing more about the memory requirements of this approach. How does the temporary memory required typically scale with the size of the input arrays? Is there an upper bound on the temporary memory required? > As such, I am very interested in implementing this into numpy. While I think opt_einsum is in a pretty good place, there is still quite a bit to do (see outstanding issues in the README). Even if this is not something that would fit into numpy I would still be very interested in your comments. We would definitely be interested in integrating this functionality into numpy. After all, half the point of having an interface like einsum is that it provides a clean boundary where we can swap in complicated, sophisticated machinery without users having to care. No one wants to curate their own pile of custom optimized libraries. :-) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From yw5aj at virginia.edu Sat Jan 3 12:51:03 2015 From: yw5aj at virginia.edu (Yuxiang Wang) Date: Sat, 3 Jan 2015 12:51:03 -0500 Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes In-Reply-To: <964675802441893669.982718sturla.molden-gmail.com@news.gmane.org> References: <964675802441893669.982718sturla.molden-gmail.com@news.gmane.org> Message-ID: Hi Sturla, First of all, my apologies to have spelled your name wrong for the past year - I just realized it! Thanks to Eric Firing who pointed this out to me. Thank you Sturla for bearing with me! And then - thank you for pointing out Numba! I tried to use it years ago, but ended up Cython eventually because the loop-jitting constraint (http://numba.pydata.org/numba-doc/0.16.0/arrays.html#loop-jitting-constraints) was too strict by that time. After seeing your email, I went to the latest version and saw that it has been greatly relaxed! I really look forward to using it for any new projects. Most of the time, loop-jitting is all I need. Lastly, your comments on Fortran 90/95 is convincing me to move away from Fortran 77. I am writing a small section of code that was called by some other legacy code written in F77. I heard that as long as I compile it correctly, it will interface with the legacy code with no problem. I'll definitely give it a try! Thanks again for all the help Sturla, Shawn On Fri, Jan 2, 2015 at 8:22 AM, Sturla Molden wrote: > Yuxiang Wang wrote: > >> 4) I wanted to say that it seems to me, as the project gradually >> scales up, Cython is easier to deal with, especially when I am using a >> lot of numpy arrays. If it is even higher dimensional data, it would >> be verbose while it is really succinct to use Cython. > > The easiest way to speed up NumPy code is to use Numba which is an LLVM > based JIT compiler. Numba will often give you performance comparable to C > for free. All you have to do is to add the @numba.jit decorator to your > Python function and possibly include some type hints. If all you want is to > speed up NumPy code, just use Numba and it will take care of it in at least > 9 of 10 cases. > > Numexpr is also a JIT compiler which can speed up Numpy code, but it does > not give as dramatic results as Numba. > > Cython is easier to work with than ctypes, particularly when the problem > scales up. If you use typed memoryviews in Cython you can also avoid having > to work with pointer arithmetics. Cython is mainly a competitior to using > the Python C API manually for C extension modules to Python. Cython also > allows you to wrap external C and C++ code, and e.g. use Python and C++ > objects together. The drawback is that you need to learn the Cython > language as well as Python and C and C++ and know how they differ. Cython > also have many of the same hidden dangers as C++, due to the possibility of > exceptions being raised between C statements. But because Cython is the > most flexible tool for writing C extensions to Python you will in the long > run do yourself a favor by learning to use it. > > ctypes is good when you have a DLL, possibly form a foreign source, and you > just want to use it without any build step. CFFI is easier to work with > than ctypes and has the same usecase. It can parse C headers and does not > require you to define the C API with Python statements like ctypes do. > Generally I would say it is alway better to use CFFI than ctypes. ctypes is > also currently an orphan, albeit in the Python standard library, while CFFI > is actively maintained. > > Numba will also JIT compile ctypes and CFFI calls to remove the extra > overhead. This is good to know if you need to call a C function in a tight > loop. In that case Numba can JIT compile away the Python as well as the > ctypes/CFFI overhead. > > Fortran 90/95 is also underrated. It is easier to work with than C, and > gives similar results performance wise. You can call Fortran with f2py, > ctypes, CFFI, or Cython (use fwrap). Generally I would say that it is > better for a student to learn C than Fortran if you have to choose, because > C is also useful for other things than numerical computing. But if you want > fast and robust numerical code, it is easier to get good results with > Fortran than C or Cython. > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia yw5aj at virginia.edu +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/ From alan.isaac at gmail.com Sat Jan 3 12:54:27 2015 From: alan.isaac at gmail.com (Alan G Isaac) Date: Sat, 03 Jan 2015 12:54:27 -0500 Subject: [Numpy-discussion] diag, diagonal, ravel and all that In-Reply-To: References: <54A8052D.8020804@gmail.com> Message-ID: <54A82CD3.8060504@gmail.com> > On Sat, Jan 3, 2015 at 8:05 AM, Alan G Isaac wrote: >> Would this really be practicality beating purity? >> It would be nice to have know the principle governing this. >> For example, is there a way to convincingly group these as >> array operations vs matrix operations? >> Personally I am puzzled by preserving subtype of >> `diagonal` and >> very especially of `ravel`. Has anyone requested this? >> (I can see the argument for `diag`.) On 1/3/2015 10:32 AM, Charles R Harris wrote: > In [1]: from astropy import units as u > In [2]: a = eye(2) * u.m > In [3]: a > Out[3]: > [ 0., 1.]] m> > In [4]: diagonal(a) > Out[4]: > In [5]: diag(a) > Out[5]: > In [6]: ravel(a) > Out[6]: > None of those examples keep the units without the recent changes. Thanks for a nice example. It seems that the core principle you are proposing is that design considerations generally require that subtypes determine the return types of numpy functions. If that is correct, then it seems matrices should then be subject to this; more special casing of the behavior of matrix objects seems highly undesirable. Cheers, Alan From sturla.molden at gmail.com Sat Jan 3 13:15:27 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 03 Jan 2015 19:15:27 +0100 Subject: [Numpy-discussion] Correct C string handling in the NumPy C API? Message-ID: Here is an example: NPY_NO_EXPORT NpyIter_IterNextFunc * NpyIter_GetIterNext(NpyIter *iter, char **errmsg) { npy_uint32 itflags = NIT_ITFLAGS(iter); int ndim = NIT_NDIM(iter); int nop = NIT_NOP(iter); if (NIT_ITERSIZE(iter) < 0) { if (errmsg == NULL) { PyErr_SetString(PyExc_ValueError, "iterator is too large"); } else { *errmsg = "iterator is too large"; } return NULL; } After NpyIter_GetIterNext returns, *errmsg points to a local variable in a returned function. Either I am wrong about C, or this code has undefied behavior... My gutfeeling is that *errmsg = "iterator is too large"; puts the string "iterator is too large" on the stack and points *errmsg to the string. Shouldn't this really be strcpy(*errmsg, "iterator is too large"); and then *errmsg should point to a char buffer allocated before NpyIter_GetIterNext is called? Or will the statement *errmsg = "iterator is too large"; put the string on the stack in the calling C function? Before I open an issue I will ask if my understanding of C is correct or not. I am a bit confused here... Regards, Sturla From njs at pobox.com Sat Jan 3 13:29:13 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 3 Jan 2015 18:29:13 +0000 Subject: [Numpy-discussion] Correct C string handling in the NumPy C API? In-Reply-To: References: Message-ID: 2015-01-03 18:15 GMT+00:00 Sturla Molden : > > Here is an example: > > NPY_NO_EXPORT NpyIter_IterNextFunc * > NpyIter_GetIterNext(NpyIter *iter, char **errmsg) > { > npy_uint32 itflags = NIT_ITFLAGS(iter); > int ndim = NIT_NDIM(iter); > int nop = NIT_NOP(iter); > > if (NIT_ITERSIZE(iter) < 0) { > if (errmsg == NULL) { > PyErr_SetString(PyExc_ValueError, "iterator is too large"); > } > else { > *errmsg = "iterator is too large"; > } > return NULL; > } > > > After NpyIter_GetIterNext returns, *errmsg points to a local variable in > a returned function. > > Either I am wrong about C, or this code has undefied behavior... > > My gutfeeling is that > > *errmsg = "iterator is too large"; > > puts the string "iterator is too large" on the stack and points *errmsg > to the string. No, this code is safe (fortunately!). C string literals have "static storage" (see paragraph 6.4.5.5 in C99), which means that their lifetime is the same as the lifetime of a 'static char[]'. They aren't stack allocated. There's lots more details about this available around the web, e.g.: https://stackoverflow.com/questions/4836534/returning-a-pointer-to-a-literal-or-constant-character-array-string -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From sturla.molden at gmail.com Sat Jan 3 13:32:46 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 3 Jan 2015 18:32:46 +0000 (UTC) Subject: [Numpy-discussion] Correct C string handling in the NumPy C API? References: Message-ID: <877788910442002719.289084sturla.molden-gmail.com@news.gmane.org> Nathaniel Smith wrote: > No, this code is safe (fortunately!). C string literals have "static > storage" (see paragraph 6.4.5.5 in C99), which means that their > lifetime is the same as the lifetime of a 'static char[]'. They aren't > stack allocated. Thanks. That explains it. Sturla From charlesr.harris at gmail.com Sat Jan 3 13:51:54 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 3 Jan 2015 11:51:54 -0700 Subject: [Numpy-discussion] diag, diagonal, ravel and all that In-Reply-To: <54A82CD3.8060504@gmail.com> References: <54A8052D.8020804@gmail.com> <54A82CD3.8060504@gmail.com> Message-ID: On Sat, Jan 3, 2015 at 10:54 AM, Alan G Isaac wrote: > > On Sat, Jan 3, 2015 at 8:05 AM, Alan G Isaac wrote: > >> Would this really be practicality beating purity? > >> It would be nice to have know the principle governing this. > >> For example, is there a way to convincingly group these as > >> array operations vs matrix operations? > >> Personally I am puzzled by preserving subtype of > >> `diagonal` and > >> very especially of `ravel`. Has anyone requested this? > >> (I can see the argument for `diag`.) > > > > On 1/3/2015 10:32 AM, Charles R Harris wrote: > > In [1]: from astropy import units as u > > > In [2]: a = eye(2) * u.m > > > In [3]: a > > Out[3]: > > > [ 0., 1.]] m> > > > In [4]: diagonal(a) > > Out[4]: > > > In [5]: diag(a) > > Out[5]: > > > In [6]: ravel(a) > > Out[6]: > > > None of those examples keep the units without the recent changes. > > > > Thanks for a nice example. It seems that the core principle > you are proposing is that design considerations generally > require that subtypes determine the return types of numpy > functions. If that is correct, then it seems matrices should > then be subject to this; more special casing of the behavior > of matrix objects seems highly undesirable I would agree with you, except that the changes breaks code that uses matrices because matrices are always 2-d whereas the previous results were 1-d. If it were a few not widely used projects I'd stick with it, but scipy.sparse is one of the packages that is broken. Numpy/scipy are not released together, and numpy is often used to compile older versions of scipy, so breaking scipy is undesirable. Becaus we are hoping to phase matrices out over time, preserving the old behavior for matrices until we can dispense with them looks to be the easiest solution. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sat Jan 3 13:58:10 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 3 Jan 2015 18:58:10 +0000 (UTC) Subject: [Numpy-discussion] Correct C string handling in the NumPy C API? References: <877788910442002719.289084sturla.molden-gmail.com@news.gmane.org> Message-ID: <723999938442003413.672980sturla.molden-gmail.com@news.gmane.org> Sturla Molden wrote: > Thanks. That explains it. 20 years after learning C I still discover new things... On the other hand, Fortran is Fortran, and seems to be free of these gotchas... Python is better as well. I hate to say it but C++ would also be less confusing here. I would just pass in a reference to a std::string and assign to it, which I know is safe... On the other hand, implicit static storage of string litterals -- which may or may not be modified depending on compiler? Not so obvious without reading all the small details... Sturla From dgasmith at icloud.com Sat Jan 3 14:26:43 2015 From: dgasmith at icloud.com (Daniel Smith) Date: Sat, 03 Jan 2015 13:26:43 -0600 Subject: [Numpy-discussion] Optimizing multi-tensor contractions in numpy In-Reply-To: References: <0193F3A5-B3AA-41BF-9A47-536671801090@icloud.com> <8F357109-C3BD-44EE-804C-87254BD2D1A4@icloud.com> Message-ID: Hello Nathaniel, > I'd also be interested in hearing more about the memory requirements of this approach. How does the temporary memory required typically scale with the size of the input arrays? Is there an upper bound on the temporary memory required? > Currently the algorithm will not create an array larger than the largest input or output array (maximum_array_size). This gives a maximum upper bound of (number_of_terms/2 + 1) * maximum_array_size. In practice, this rarely goes beyond the maximum_array_size as building large outer products is not usually helpful. The views are also dereferenced after they are used, I believe this should delete the arrays correctly. However, this is one thing I am not sure is being handled in the best way and can use further testing. Figuring out cumulative memory should also be possible for the brute force path algorithm, but I am not sure if this is possible for the faster greedy path algorithm without large changes. Overall this sounds great. If anyone has a suggestion of where this should go I can start working on a PR and we can work out the remaining issues there? -Daniel Smith > On Jan 3, 2015, at 10:57 AM, Nathaniel Smith wrote: > > On 3 Jan 2015 02:46, "Daniel Smith" > wrote: > > > > Hello everyone, > > > > I have been working on a chunk of code that basically sets out to provide a single function that can take an arbitrary einsum expression and computes it in the most optimal way. While np.einsum can compute arbitrary expressions, there are two drawbacks to using pure einsum: einsum does not consider building intermediate arrays for possible reductions in overall rank and is not currently capable of using a vendor BLAS. I have been working on a project that aims to solve both issues simultaneously: > > > >https://github.com/dgasmith/opt_einsum > > > > This program first builds the optimal way to contract the tensors together, or using my own nomenclature a ?path.? This path is then iterated over and uses tensordot when possible and einsum for everything else. In test cases the worst case scenario adds a 20 microsecond overhead penalty and, in the best case scenario, it can reduce the overall rank of the tensor. The primary (if somewhat exaggerated) example is a 5-term N^8 index transformation that can be reduced to N^5; even when N is very small (N=10) there is a 2,000 fold speed increase over pure einsum or, if using tensordot, a 2,400 fold speed increase. This is somewhat similar to the new np.linalg.multi_dot function. > > This sounds super awesome. Who *wouldn't* want a free 2,000x speed increase? And I especially like your test suite. > > I'd also be interested in hearing more about the memory requirements of this approach. How does the temporary memory required typically scale with the size of the input arrays? Is there an upper bound on the temporary memory required? > > > As such, I am very interested in implementing this into numpy. While I think opt_einsum is in a pretty good place, there is still quite a bit to do (see outstanding issues in the README). Even if this is not something that would fit into numpy I would still be very interested in your comments. > > We would definitely be interested in integrating this functionality into numpy. After all, half the point of having an interface like einsum is that it provides a clean boundary where we can swap in complicated, sophisticated machinery without users having to care. No one wants to curate their own pile of custom optimized libraries. :-) > > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Jan 3 14:49:44 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 3 Jan 2015 19:49:44 +0000 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: Message-ID: On 1 Jan 2015 21:35, "Alexander Belopolsky" wrote: > > A discussion [1] is currently underway at GitHub which will benefit from a larger forum. > > In version 1.9, the diagonal() method was changed to return a read-only (non-contiguous) view into the original array instead of a plain copy. Also, it has been announced [2] that in 1.10 the view will become read/write. > > A concern has now been raised [3] that this change breaks backward compatibility too much. > > Consider the following code: > > x = numy.eye(2) > d = x.diagonal() > d[0] = 2 > > In 1.8, this code runs without errors and results in [2, 1] stored in array d. In 1.9, this is an error. With the current plan, in 1.10 this will become valid again, but the result will be different: x[0,0] will be 2 while it is 1 in 1.8. Further context: In 1.7 and 1.8, the code above works as described, but also issues a visible-by-default warning: >>> np.__version__ '1.7.2' >>> x = np.eye(2) >>> x.diagonal()[0] = 2 __main__:1: FutureWarning: Numpy has detected that you (may be) writing to an array returned by numpy.diagonal or by selecting multiple fields in a record array. This code will likely break in the next numpy release -- see numpy.diagonal or arrays.indexing reference docs for details. The quick fix is to make an explicit copy (e.g., do arr.diagonal().copy() or arr[['f0','f1']].copy()). 1.7 was released in Feb. 2013, ~22 months ago. (I'm not implying this number is particularly large or small, it's just something that I find useful to calculate when thinking about things like this.) The choice of "1.10" as the target for completing this change is more-or-less a strawman and we shouldn't feel bound by it. The schedule was originally written in between the 1.6 and 1.7 releases, when our release process was kinda broken and we had no idea what the future release schedule would look like (1.6 -> 1.7 ultimately ended up being a ~21 month gap). We've already adjusted the schedule for this deprecation once before (see issue #596: The original schedule called for the change to returning a ro-view to happen in 1.8, rather than 1.9 as it actually did). Now that our release frequency is higher, 1.11 might well be a more reasonable target than 1.10. As for the overall question, this is really a bigger question about what strategy we should use in general to balance between conservatism (which is a Good Thing) and making improvements (which is also a Good Thing). The post you cite brings this up explicitly: > [3] http://khinsen.wordpress.com/2014/09/12/the-state-of-numpy/ I have huge respect for the problems and pain that Konrad describes in this blog post, but I really can't agree with the argument or the conclusions. His conclusion is that when it comes to compatibility breaks, slow-incremental-change is bad, and that we should instead prefer big all-at-once compatibility breaks like the Numeric->Numpy or Py2->Py3 transitions. But when describing his own experiences that he uses to motivate this, he says: *"The two main dependencies of my code, NumPy and Python itself, did sometimes introduce incompatible changes (by design or as consequences of bug fixes) that required changes on my own code base, but they were surprisingly minor and never required more than about a day of work."* i.e., slow-incremental-change has actually worked well in his experience. (And in particular, the np.diagonal issue only comes in as an example to illustrate what he means by the phrase "slow continuous change" -- this particular change hasn't actually broken anything in his code.) OTOH the big problem that motivated his post was that his code is all written against the APIs of the ancient and long-abandoned Numeric project, and he finds the costs of transitioning them to the "new" numpy APIs to be prohibitively expensive, i.e. this big-bang transition broke his code. (It did manage to limp on for some years b/c numpy used to contain some compatibility code to emulate the Numeric API, but this doesn't really change the basic situation: there were two implementations of the API he needed -- numpy.numeric and Numeric itself -- and both implementations still exist in the sense that you can download them, but neither is usable because no-one's willing to maintain them anymore.) Maybe I'm missing something, but his data seems to be pi radians off from his conclusion. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Sat Jan 3 15:39:20 2015 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 3 Jan 2015 15:39:20 -0500 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: Message-ID: Wasn't all of this discussed way back when the deprecation plan was made? This was known to happen and was entirely the intent, right? What new argument is there to deviate from the plan? As for that particular blog post, I remember reading it back when it was posted. I, again, sympathize with the author's plight, but I pointed out that the reason for some of the changes he noted was because they could cause bugs, which would mean that results could be wrong. Reproducibility is nigh useless without a test suite to ensure the component parts are reproducible on their own. OTOH, there is an argument for slow, carefully-considered changes to APIs (which I think the diagonal() changes were). As an example of a potentially poor change is in matplotlib. We are starting to move to using properties, away from get/setters(). In my upcoming book, I ran into a problem where I needed to use an Artist's get_axes() or use its property "axes", but there will only be one release of matplotlib where both of them will be valid. I was faced with either using the get_axes() and have my code obsolete sometime in the summer, use the propery, and have my code invalid for all but the most recent version of matplotlib, or to have some version checking code that would distract from the lesson at hand. I now think that a single release cycle for deprecation of get_axes() was not a wise decision, especially since the old code was merely verbose, not buggy. To conclude, unless someone can present a *new* argument to deviate from the diagonal() plan that was set a couple years ago, I don't see any reason why the decisions that were agreed upon then are invalid now. The pros-and-cons were weighed, and this particular con was known then and was considered acceptable at that time. Cheers! Ben Root On Sat, Jan 3, 2015 at 2:49 PM, Nathaniel Smith wrote: > On 1 Jan 2015 21:35, "Alexander Belopolsky" wrote: > > > > A discussion [1] is currently underway at GitHub which will benefit from > a larger forum. > > > > In version 1.9, the diagonal() method was changed to return a read-only > (non-contiguous) view into the original array instead of a plain copy. > Also, it has been announced [2] that in 1.10 the view will become > read/write. > > > > A concern has now been raised [3] that this change breaks backward > compatibility too much. > > > > Consider the following code: > > > > x = numy.eye(2) > > d = x.diagonal() > > d[0] = 2 > > > > In 1.8, this code runs without errors and results in [2, 1] stored in > array d. In 1.9, this is an error. With the current plan, in 1.10 this > will become valid again, but the result will be different: x[0,0] will be 2 > while it is 1 in 1.8. > > Further context: > > In 1.7 and 1.8, the code above works as described, but also issues a > visible-by-default warning: > > >>> np.__version__ > '1.7.2' > >>> x = np.eye(2) > >>> x.diagonal()[0] = 2 > __main__:1: FutureWarning: Numpy has detected that you (may be) writing to > an array returned > by numpy.diagonal or by selecting multiple fields in a record > array. This code will likely break in the next numpy release -- > see numpy.diagonal or arrays.indexing reference docs for details. > The quick fix is to make an explicit copy (e.g., do > arr.diagonal().copy() or arr[['f0','f1']].copy()). > > 1.7 was released in Feb. 2013, ~22 months ago. (I'm not implying this > number is particularly large or small, it's just something that I find > useful to calculate when thinking about things like this.) > > The choice of "1.10" as the target for completing this change is > more-or-less a strawman and we shouldn't feel bound by it. The schedule was > originally written in between the 1.6 and 1.7 releases, when our release > process was kinda broken and we had no idea what the future release > schedule would look like (1.6 -> 1.7 ultimately ended up being a ~21 month > gap). We've already adjusted the schedule for this deprecation once before > (see issue #596: The original schedule called for the change to returning a > ro-view to happen in 1.8, rather than 1.9 as it actually did). Now that our > release frequency is higher, 1.11 might well be a more reasonable target > than 1.10. > > As for the overall question, this is really a bigger question about what > strategy we should use in general to balance between conservatism (which is > a Good Thing) and making improvements (which is also a Good Thing). The > post you cite brings this up explicitly: > > > [3] http://khinsen.wordpress.com/2014/09/12/the-state-of-numpy/ > > I have huge respect for the problems and pain that Konrad describes in > this blog post, but I really can't agree with the argument or the > conclusions. His conclusion is that when it comes to compatibility breaks, > slow-incremental-change is bad, and that we should instead prefer big > all-at-once compatibility breaks like the Numeric->Numpy or Py2->Py3 > transitions. But when describing his own experiences that he uses to > motivate this, he says: > > *"The two main dependencies of my code, NumPy and Python itself, did > sometimes introduce incompatible changes (by design or as consequences of > bug fixes) that required changes on my own code base, but they were > surprisingly minor and never required more than about a day of work."* > > i.e., slow-incremental-change has actually worked well in his experience. > (And in particular, the np.diagonal issue only comes in as an example to > illustrate what he means by the phrase "slow continuous change" -- this > particular change hasn't actually broken anything in his code.) OTOH the > big problem that motivated his post was that his code is all written > against the APIs of the ancient and long-abandoned Numeric project, and he > finds the costs of transitioning them to the "new" numpy APIs to be > prohibitively expensive, i.e. this big-bang transition broke his code. (It > did manage to limp on for some years b/c numpy used to contain some > compatibility code to emulate the Numeric API, but this doesn't really > change the basic situation: there were two implementations of the API he > needed -- numpy.numeric and Numeric itself -- and both implementations > still exist in the sense that you can download them, but neither is usable > because no-one's willing to maintain them anymore.) Maybe I'm missing > something, but his data seems to be pi radians off from his conclusion. > > -n > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From maniteja.modesty067 at gmail.com Sat Jan 3 16:44:10 2015 From: maniteja.modesty067 at gmail.com (Maniteja Nandana) Date: Sun, 4 Jan 2015 03:14:10 +0530 Subject: [Numpy-discussion] Regarding np.ma.masked_equal behavior Message-ID: Hello friends, This is an issue related to the working of *masked_equal* method. I was thinking if anyone related to an old ticket #1851 , regarding the modification of *masked_equal *function effect on *fill_value *could clarify the situation, since right now, the documentation and implementation conflict. There is an issue raised regarding this #5408 . Cheers*,* N.Maniteja _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Jan 3 16:55:38 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 3 Jan 2015 21:55:38 +0000 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: Message-ID: Hi, On Thu, Jan 1, 2015 at 9:35 PM, Alexander Belopolsky wrote: > A discussion [1] is currently underway at GitHub which will benefit from a > larger forum. > > In version 1.9, the diagonal() method was changed to return a read-only > (non-contiguous) view into the original array instead of a plain copy. > Also, it has been announced [2] that in 1.10 the view will become > read/write. > > A concern has now been raised [3] that this change breaks backward > compatibility too much. > > Consider the following code: > > x = numy.eye(2) > d = x.diagonal() > d[0] = 2 > > In 1.8, this code runs without errors and results in [2, 1] stored in array > d. In 1.9, this is an error. With the current plan, in 1.10 this will > become valid again, but the result will be different: x[0,0] will be 2 while > it is 1 in 1.8. > > Two alternatives are suggested for discussion: > > 1. Add copy=True flag to diagonal() method. > 2. Roll back 1.9 change to diagonal() and introduce an additional > diagonal_view() method to return a view. I think this point is a good one, from Konrad Hinsen's blog post: If you get a Python script, say as a reviewer for a submitted article, and see ?import numpy?, you don?t know which version of numpy the authors had in mind. If that script calls array.diag() and modifies the return value, does it expect to modify a copy or a view? The result is very different, but there is no way to tell. It is possible, even quite probable, that the code would execute fine with both NumPy 1.8 and the upcoming NumPy 1.10, but yield different results. That rules out the current 1.10 plan I think. copy=True as default seems like a nice compact and explicit solution to me. Cheers, Matthew From charlesr.harris at gmail.com Sat Jan 3 17:08:29 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 3 Jan 2015 15:08:29 -0700 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: Message-ID: On Sat, Jan 3, 2015 at 2:55 PM, Matthew Brett wrote: > Hi, > > On Thu, Jan 1, 2015 at 9:35 PM, Alexander Belopolsky > wrote: > > A discussion [1] is currently underway at GitHub which will benefit from > a > > larger forum. > > > > In version 1.9, the diagonal() method was changed to return a read-only > > (non-contiguous) view into the original array instead of a plain copy. > > Also, it has been announced [2] that in 1.10 the view will become > > read/write. > > > > A concern has now been raised [3] that this change breaks backward > > compatibility too much. > > > > Consider the following code: > > > > x = numy.eye(2) > > d = x.diagonal() > > d[0] = 2 > > > > In 1.8, this code runs without errors and results in [2, 1] stored in > array > > d. In 1.9, this is an error. With the current plan, in 1.10 this will > > become valid again, but the result will be different: x[0,0] will be 2 > while > > it is 1 in 1.8. > > > > Two alternatives are suggested for discussion: > > > > 1. Add copy=True flag to diagonal() method. > > 2. Roll back 1.9 change to diagonal() and introduce an additional > > diagonal_view() method to return a view. > > I think this point is a good one, from Konrad Hinsen's blog post: > > > If you get a Python script, say as a reviewer for a submitted article, > and see ?import numpy?, you don?t know which version of numpy the > authors had in mind. If that script calls array.diag() and modifies > the return value, does it expect to modify a copy or a view? The > result is very different, but there is no way to tell. It is possible, > even quite probable, that the code would execute fine with both NumPy > 1.8 and the upcoming NumPy 1.10, but yield different results. > > > That rules out the current 1.10 plan I think. > > copy=True as default seems like a nice compact and explicit solution to me. > > Bear in mind that this also affects the C-API via the PyArray_Diagonal function, so the rollback proposal would be 1) Roll back the change to PyArray_Diagonal 2) Introduce a new C-API function PyArray_Diagonal2 that has a 'copy' argument 3) Make PyArray_Diagonal call PyArray_Diagonal2 with 'copy=1' 4) Add a copy argument to do the diagonal method. I'm thinking we should have a rule that functions in the C-API can be refactored or deprecated, but they don't change otherwise. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sat Jan 3 18:54:58 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 04 Jan 2015 00:54:58 +0100 Subject: [Numpy-discussion] diag, diagonal, ravel and all that In-Reply-To: References: Message-ID: On 03/01/15 03:04, Charles R Harris wrote: > The diag, diagonal, and ravel functions have recently been changed to > preserve subtypes. However, this causes lots of backward compatibility > problems for matrix users, in particular, scipy.sparse. One possibility > for fixing this is to special case matrix and so that these functions > continue to return 1-d arrays for matrix instances. This is kind of ugly > as `a..ravel` will still return a matrix when a is a matrix, an ugly > inconsistency. This may be a case where practicality beats beauty. > > Thoughts? What about fixing scipy.sparse? Sturla From charlesr.harris at gmail.com Sat Jan 3 19:28:41 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 3 Jan 2015 17:28:41 -0700 Subject: [Numpy-discussion] diag, diagonal, ravel and all that In-Reply-To: References: Message-ID: On Sat, Jan 3, 2015 at 4:54 PM, Sturla Molden wrote: > On 03/01/15 03:04, Charles R Harris wrote: > > > The diag, diagonal, and ravel functions have recently been changed to > > preserve subtypes. However, this causes lots of backward compatibility > > problems for matrix users, in particular, scipy.sparse. One possibility > > for fixing this is to special case matrix and so that these functions > > continue to return 1-d arrays for matrix instances. This is kind of ugly > > as `a..ravel` will still return a matrix when a is a matrix, an ugly > > inconsistency. This may be a case where practicality beats beauty. > > > > Thoughts? > > What about fixing scipy.sparse? > PR already in. The problem is that versions of scipy <=15 will not work with numpy 1.10 if we don't fix this in numpy. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Jan 4 03:44:32 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 4 Jan 2015 09:44:32 +0100 Subject: [Numpy-discussion] diag, diagonal, ravel and all that In-Reply-To: References: Message-ID: On Sun, Jan 4, 2015 at 1:28 AM, Charles R Harris wrote: > > > On Sat, Jan 3, 2015 at 4:54 PM, Sturla Molden > wrote: > >> On 03/01/15 03:04, Charles R Harris wrote: >> >> > The diag, diagonal, and ravel functions have recently been changed to >> > preserve subtypes. However, this causes lots of backward compatibility >> > problems for matrix users, in particular, scipy.sparse. One possibility >> > for fixing this is to special case matrix and so that these functions >> > continue to return 1-d arrays for matrix instances. This is kind of ugly >> > as `a..ravel` will still return a matrix when a is a matrix, an ugly >> > inconsistency. This may be a case where practicality beats beauty. >> > >> > Thoughts? >> > I think it makes sense to special-case matrix here. Arguable, ravel() is an operation that should return a 1-D array (ndarray or other array-like object). np.matrix doesn't allow 1-D objects, hence can't be returned. The method is also documented to return a 1-D array, so maybe the matrix.ravel method is wrong here: In [1]: x = np.matrix(np.eye(3)) In [2]: x.ravel() Out[2]: matrix([[ 1., 0., 0., 0., 1., 0., 0., 0., 1.]]) # 2-D In [3]: print(x.ravel.__doc__) a.ravel([order]) Return a flattened array. Refer to `numpy.ravel` for full documentation. Ralf > >> What about fixing scipy.sparse? >> > > PR already in. The problem is that versions of scipy <=15 will not work > with numpy 1.10 if we don't fix this in numpy. > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Jan 4 04:46:17 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 4 Jan 2015 10:46:17 +0100 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: Message-ID: On Sat, Jan 3, 2015 at 11:08 PM, Charles R Harris wrote: > > > On Sat, Jan 3, 2015 at 2:55 PM, Matthew Brett > wrote: > >> Hi, >> >> On Thu, Jan 1, 2015 at 9:35 PM, Alexander Belopolsky >> wrote: >> > A discussion [1] is currently underway at GitHub which will benefit >> from a >> > larger forum. >> > >> > In version 1.9, the diagonal() method was changed to return a read-only >> > (non-contiguous) view into the original array instead of a plain copy. >> > Also, it has been announced [2] that in 1.10 the view will become >> > read/write. >> > >> > A concern has now been raised [3] that this change breaks backward >> > compatibility too much. >> > >> > Consider the following code: >> > >> > x = numy.eye(2) >> > d = x.diagonal() >> > d[0] = 2 >> > >> > In 1.8, this code runs without errors and results in [2, 1] stored in >> array >> > d. In 1.9, this is an error. With the current plan, in 1.10 this will >> > become valid again, but the result will be different: x[0,0] will be 2 >> while >> > it is 1 in 1.8. >> > >> > Two alternatives are suggested for discussion: >> > >> > 1. Add copy=True flag to diagonal() method. >> > 2. Roll back 1.9 change to diagonal() and introduce an additional >> > diagonal_view() method to return a view. >> >> I think this point is a good one, from Konrad Hinsen's blog post: >> >> >> If you get a Python script, say as a reviewer for a submitted article, >> and see ?import numpy?, you don?t know which version of numpy the >> authors had in mind. If that script calls array.diag() and modifies >> the return value, does it expect to modify a copy or a view? The >> result is very different, but there is no way to tell. It is possible, >> even quite probable, that the code would execute fine with both NumPy >> 1.8 and the upcoming NumPy 1.10, but yield different results. >> >> >> That rules out the current 1.10 plan I think. >> > I think maybe making the change in 1.10 is too quick, but it doesn't rule it out long-term. This issue and the copy=True alternative were extensively discussed when making the change: http://thread.gmane.org/gmane.comp.python.numeric.general/49887/focus=49888 It's not impossible that we made the wrong decision a while back, but rehashing that whole discussion based on an argument that was already brought up back then doesn't sound good to me. > copy=True as default seems like a nice compact and explicit solution to me. >> > > Bear in mind that this also affects the C-API via the PyArray_Diagonal > function, so the rollback proposal would be > > 1) Roll back the change to PyArray_Diagonal > 2) Introduce a new C-API function PyArray_Diagonal2 that has a 'copy' > argument > 3) Make PyArray_Diagonal call PyArray_Diagonal2 with 'copy=1' > 4) Add a copy argument to do the diagonal method. > > I'm thinking we should have a rule that functions in the C-API can be > refactored or deprecated, but they don't change otherwise. > Makes sense. It's time to document the policy on deprecations and incompatible changes in more detail I think. We had a few sentences long statement on this on the Trac wiki, IIRC written by Robert Kern, but that's gone now. Do we have anything else written down anywhere? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Jan 4 09:56:28 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 4 Jan 2015 07:56:28 -0700 Subject: [Numpy-discussion] diag, diagonal, ravel and all that In-Reply-To: References: Message-ID: On Sun, Jan 4, 2015 at 1:44 AM, Ralf Gommers wrote: > > > On Sun, Jan 4, 2015 at 1:28 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sat, Jan 3, 2015 at 4:54 PM, Sturla Molden >> wrote: >> >>> On 03/01/15 03:04, Charles R Harris wrote: >>> >>> > The diag, diagonal, and ravel functions have recently been changed to >>> > preserve subtypes. However, this causes lots of backward compatibility >>> > problems for matrix users, in particular, scipy.sparse. One possibility >>> > for fixing this is to special case matrix and so that these functions >>> > continue to return 1-d arrays for matrix instances. This is kind of >>> ugly >>> > as `a..ravel` will still return a matrix when a is a matrix, an ugly >>> > inconsistency. This may be a case where practicality beats beauty. >>> > >>> > Thoughts? >>> >> > I think it makes sense to special-case matrix here. Arguable, ravel() is > an operation that should return a 1-D array (ndarray or other array-like > object). np.matrix doesn't allow 1-D objects, hence can't be returned. > > The method is also documented to return a 1-D array, so maybe the > matrix.ravel method is wrong here: > > In [1]: x = np.matrix(np.eye(3)) > > In [2]: x.ravel() > Out[2]: matrix([[ 1., 0., 0., 0., 1., 0., 0., 0., 1.]]) # 2-D > > In [3]: print(x.ravel.__doc__) > a.ravel([order]) > > Return a flattened array. > > Refer to `numpy.ravel` for full documentation. > Just to clarify the previous behavior for matrix m. 1) m.diagonal() and m.ravel() both return matrices 2) diagonal(m) and ravel(m) both return 1-D arrays Currently in master, which is incompatible with scipy master 1) m.diagonal() and m.ravel() both return matrices 2) diagonal(m) and ravel(m) both return matrices There is a PR to revert to the previous behavior. Another option is to change m.ravel() to return a 1-D array and leave diagonal(m) returning a matrix. The incompatibilites with diagonal didn't seem to be as troublesome. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Jan 4 11:08:17 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 4 Jan 2015 17:08:17 +0100 Subject: [Numpy-discussion] numpy.fromiter in numpypy In-Reply-To: <645635478.812990.1420142270937.JavaMail.yahoo@jws10729.mail.gq1.yahoo.com> References: <645635478.812990.1420142270937.JavaMail.yahoo@jws10729.mail.gq1.yahoo.com> Message-ID: Hi Albert-Jan, On Thu, Jan 1, 2015 at 8:57 PM, Albert-Jan Roskam wrote: > Hi, > > I would like to use the numpy implementation for Pypy. In particular, I > would like to use numpy.fromiter, which is available according to this > overview: http://buildbot.pypy.org/numpy-status/latest.html. However, > contrary to what this website says, this function is not yet available. > Conclusion: the website is wrong. Or am I missing something? > No idea to be honest. Note that numpypy is developed by the PyPy team and not by the Numpy team. So you may want to ask on the PyPy mailing list: https://mail.python.org/mailman/listinfo/pypy-dev Cheers, Ralf > > albertjan at debian:~$ sudo pypy $(which pip) install -U git+ > https://bitbucket.org/pypy/numpy.git > albertjan at debian:~$ sudo pypy -c 'import numpy' # sudo: as per the > installation instructions > albertjan at debian:~$ pypy > Python 2.7.8 (f5dcc2477b97386c11e4b67f08a2d00fbd2fce5d, Sep 19 2014, > 10:37:41) > [PyPy 2.4.0 with GCC 4.8.2] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>>> import sys > >>>> import numpy as np > >>>> np.__version__, sys.version > ('1.9.0', '2.7.8 (f5dcc2477b97386c11e4b67f08a2d00fbd2fce5d, Sep 19 2014, > 10:37:41)\n[PyPy 2.4.0 with GCC 4.8.2]') > >>>> np.fromiter > > >>>> np.fromiter((i for i in range(10)), np.float) > Traceback (most recent call last): > File "", line 1, in > File "/opt/pypy-2.4/site-packages/numpy/core/multiarray.py", line 55, in > tmp > raise NotImplementedError("%s not implemented yet" % func) > NotImplementedError: fromiter not implemented yet > > The same also applies to numpy.fromfile > > Thanks in advance and happy 2015. > > > > Regards, > > Albert-Jan > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > All right, but apart from the sanitation, the medicine, education, wine, > public order, irrigation, roads, a > > fresh water system, and public health, what have the Romans ever done for > us? > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From konrad.hinsen at fastmail.net Sun Jan 4 11:22:30 2015 From: konrad.hinsen at fastmail.net (Konrad Hinsen) Date: Sun, 04 Jan 2015 17:22:30 +0100 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: Message-ID: <54A968C6.1040502@fastmail.net> On 03/01/15 20:49, Nathaniel Smith wrote: > The post you cite brings this up explicitly: > > > [3] http://khinsen.wordpress.com/2014/09/12/the-state-of-numpy/ > > I have huge respect for the problems and pain that Konrad describes in > this blog post, but I really can't agree with the argument or the > conclusions. His conclusion is that when it comes to compatibility > breaks, slow-incremental-change is bad, and that we should instead > prefer big all-at-once compatibility breaks like the Numeric->Numpy or > Py2->Py3 transitions. But when describing his own experiences that he > uses to motivate this, he says: ... There are two different scenarios to consider here, and perhaps I didn't make that distinction clear enough. One scenario is that of a maintained library or application that depends on NumPy. The other scenario is a set of scripts written for a specific study (let's say a thesis) that is then published for its documentation value. Such scripts are in general not maintained. In the first scenario, gradual API changes work reasonably well, as long as the effort involved in applying the fixes are sufficiently minor that developers can integrate them into their routine maintenance efforts. That is the situation I have described for my own past experience as a library author. It's the second scenario where gradual changes are a real problem. Suppose I have a set of scripts from a thesis published in year X, and I need to understand them in detail in year X+5 for a related scientific project. If the script produces different results with NumPy 1.7 and NumPy 1.10, which result should I assume the author intended? People rarely write down which versions of all dependencies they used. Yes, they should, but it's actually a pain to do this, in particular when you work on multiple machines and don't manage the Python installation yourself. In this rather frequent situation, the published scripts are ambiguous - I can't really know what they did when the author ran them. There is a third scenario where this problem shows up: outdated legacy system installations, which are particularly frequent on clusters and supercomputer. For example, the cluster at my lab runs a CentOS version that is a few years old. CentOS is known for its conservatism, and therefore the default Python installation on that machine is based on Python 2.6 with correspondingly old NumPy versions. People do install recent application libraries there. Suppose someone runs code there that assumes the future semantics for diagonal() - this will silently yield wrong results. In summary, my point of view on breaking changes is 1) Changes that can make legacy code fail can be introduced gradually. The right compromise between stability and progress must be figured out by the community. 2) Changes that yield to different results for unmodified legacy code should never be allowed. 3) The best overall solution for API evolution is a version number visible in client code with a version number change whenever some breaking change is introduced. This guarantees point 2). So if the community decides that it is important to change the behavior of diagonal(), this should be done in one of two ways: a) Deprecate diagonal() and introduce a differently-named method with the new functionality. This will make old code fail rather than produce wrong results. b) Accumulate this change with other such changes and call the new API "numpy2". Konrad. From ben.root at ou.edu Sun Jan 4 14:37:44 2015 From: ben.root at ou.edu (Benjamin Root) Date: Sun, 4 Jan 2015 14:37:44 -0500 Subject: [Numpy-discussion] Regarding np.ma.masked_equal behavior In-Reply-To: References: Message-ID: Personally, I have never depended upon an implicit fill value. I would always handle it explicitly. Off the top of my head, a project that might have really good insight into how fill_value should work is the python-netCDF4 project (so, talk to Jeff Whitaker, I think), and/or the HDF5 people. I know the netCDF4 package, which supports masked arrays, takes advantage of fill_value attributes. Cheers! Ben Root On Sat, Jan 3, 2015 at 4:44 PM, Maniteja Nandana < maniteja.modesty067 at gmail.com> wrote: > Hello friends, > > This is an issue related to the working of *masked_equal* method. I was > thinking if anyone related to an old ticket #1851 > , regarding the modification > of *masked_equal *function effect on *fill_value *could clarify the > situation, since right now, the documentation and implementation conflict. > There is an issue raised regarding this #5408 > . > > Cheers*,* > N.Maniteja > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sun Jan 4 15:28:41 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 04 Jan 2015 21:28:41 +0100 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: <54A968C6.1040502@fastmail.net> References: <54A968C6.1040502@fastmail.net> Message-ID: On 04/01/15 17:22, Konrad Hinsen wrote: > There are two different scenarios to consider here, and perhaps I didn't > make that distinction clear enough. One scenario is that of a maintained > library or application that depends on NumPy. The other scenario is a > set of scripts written for a specific study (let's say a thesis) that is > then published for its documentation value. Such scripts are in general > not maintained. > It's the second scenario where gradual changes are a real problem. > Suppose I have a set of scripts from a thesis published in year X, and I > need to understand them in detail in year X+5 for a related scientific > project. If the script produces different results with NumPy 1.7 and > NumPy 1.10, which result should I assume the author intended? A scientific paper or thesis should be written so it is completely reproducible. That would include describing the computer, OS, Python version and NumPy version, as well as C or Fortran compiler. I will happily fail any student who writes a thesis without providing such details, and if I review a research paper for a journal you can be sure I will ask that is corrected. Sturla From sturla.molden at gmail.com Sun Jan 4 15:55:47 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 04 Jan 2015 21:55:47 +0100 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: Message-ID: On 03/01/15 20:49, Nathaniel Smith wrote: > i.e., slow-incremental-change has actually worked well in his > experience. (And in particular, the np.diagonal issue only comes in as > an example to illustrate what he means by the phrase "slow continuous > change" -- this particular change hasn't actually broken anything in his > code.) OTOH the big problem that motivated his post was that his code is > all written against the APIs of the ancient and long-abandoned Numeric > project, and he finds the costs of transitioning them to the "new" numpy > APIs to be prohibitively expensive, i.e. this big-bang transition broke > his code. Given that a big-bang transition broke his code everywhere, I don't really see why he wants more of them. The question of reproducible research is orthogonal to this, I think. Sturla From valentin at haenel.co Sun Jan 4 15:59:40 2015 From: valentin at haenel.co (Valentin Haenel) Date: Sun, 4 Jan 2015 21:59:40 +0100 Subject: [Numpy-discussion] [ANN] bcolz 0.7.3 Message-ID: <20150104205940.GB9729@kudu.in-berlin.de> ====================== Announcing bcolz 0.7.3 ====================== What's new ========== This release includes the support for pickling persistent carray/ctable objects contributed by Matthew Rocklin. Also, the included version of Blosc is updated to ``v1.5.2``. Lastly, several minor issues and typos have been fixed, please see the release notes for details. ``bcolz`` is a renaming of the ``carray`` project. The new goals for the project are to create simple, yet flexible compressed containers, that can live either on-disk or in-memory, and with some high-performance iterators (like `iter()`, `where()`) for querying them. Together, bcolz and the Blosc compressor, are finally fulfilling the promise of accelerating memory I/O, at least for some real scenarios: http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots For more detailed info, see the release notes in: https://github.com/Blosc/bcolz/wiki/Release-Notes What it is ========== bcolz provides columnar and compressed data containers. Column storage allows for efficiently querying tables with a large number of columns. It also allows for cheap addition and removal of column. In addition, bcolz objects are compressed by default for reducing memory/disk I/O needs. The compression process is carried out internally by Blosc, a high-performance compressor that is optimized for binary data. bcolz can use numexpr internally so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr optimizes the memory usage and use several cores for doing the computations, so it is blazing fast. Moreover, the carray/ctable containers can be disk-based, and it is possible to use them for seamlessly performing out-of-memory computations. bcolz has minimal dependencies (NumPy), comes with an exhaustive test suite and fully supports both 32-bit and 64-bit platforms. Also, it is typically tested on both UNIX and Windows operating systems. Installing ========== bcolz is in the PyPI repository, so installing it is easy:: $ pip install -U bcolz Resources ========= Visit the main bcolz site repository at: http://github.com/Blosc/bcolz Manual: http://bcolz.blosc.org Home of Blosc compressor: http://blosc.org User's mail list: bcolz at googlegroups.com http://groups.google.com/group/bcolz License is the new BSD: https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt ---- **Enjoy data!** From konrad.hinsen at fastmail.net Sun Jan 4 23:22:13 2015 From: konrad.hinsen at fastmail.net (Konrad Hinsen) Date: Mon, 05 Jan 2015 05:22:13 +0100 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: Message-ID: <54AA1175.4050900@fastmail.net> On 04/01/15 21:55, Sturla Molden wrote: > On 03/01/15 20:49, Nathaniel Smith wrote: > >> OTOH the big problem that motivated his post was that his code is >> all written against the APIs of the ancient and long-abandoned Numeric >> project, and he finds the costs of transitioning them to the "new" numpy >> APIs to be prohibitively expensive, i.e. this big-bang transition broke >> his code. > > Given that a big-bang transition broke his code everywhere, I don't > really see why he wants more of them. I am not asking for "big-bang transitions" as such. I am asking for breaking changes to go along with a clearly visible and clearly announced change in the API name and/or major version. A change as important as dropping support for an API that has been around for 20 years shouldn't happen as one point in the change list from version 1.8 to 1.9. It can happen in the transition from "numpy" to "numpy2", which ideally should be done in a way that permits users to install both "numpy" and "numpy2" in parallel to ease the transition. There is a tacit convention in computing that "higher" version numbers of a package indicate improvements and extensions but not reduction in functionality. This convention also underlies most of today's package management systems. Major breaking changes violate this tacit convention. > The question of reproducible research is orthogonal to this, I think. Indeed. My blog post addresses two distinct issues, whose common point is that they relate to the evolution of NumPy. Konrad. From konrad.hinsen at fastmail.net Sun Jan 4 23:31:21 2015 From: konrad.hinsen at fastmail.net (Konrad Hinsen) Date: Mon, 05 Jan 2015 05:31:21 +0100 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: <54A968C6.1040502@fastmail.net> Message-ID: <54AA1399.4000201@fastmail.net> On 04/01/15 21:28, Sturla Molden wrote: > A scientific paper or thesis should be written so it is completely > reproducible. That would include describing the computer, OS, Python > version and NumPy version, as well as C or Fortran compiler. I completely agree and we should all work towards this goal. But we aren't there yet. Most of the scientific community is just beginning to realize that there is a problem. Anyone writing scientific software for use in today's environment has to take this into account. More importantly, there is not only the technical problem of reproducibility, but also the meta-level problem of human understanding. Scientific communication depends more and more on scripts as the only precise documentation of a computational method. Our programming languages are becoming a major form of scientific notation, alongside traditional mathematics. Humans don't read written text with version numbers in mind. This is a vast problem which can't be solved merely by "fixing" software technology, but it's something to keep in mind nevertheless when writing software. For those interested in this aspect, I have written a much more detailed account in a recent paper: http://dx.doi.org/10.12688/f1000research.3978.2 Konrad. From antony.lee at berkeley.edu Mon Jan 5 02:34:09 2015 From: antony.lee at berkeley.edu (Antony Lee) Date: Mon, 5 Jan 2015 00:34:09 -0700 Subject: [Numpy-discussion] edge-cases of ellipsis indexing Message-ID: While trying to reproduce various fancy indexings for astropy's FITS sections (a loaded-on-demand array), I found the following interesting behavior: >>> np.array([1])[..., 0] array(1) >>> np.array([1])[0] 1 >>> np.array([1])[(0,)] 1 The docs say "Ellipsis expand to the number of : objects needed to make a selection tuple of the same length as x.ndim.", so it's not totally clear to me how to explain that difference in the results. Antony -------------- next part -------------- An HTML attachment was scrubbed... URL: From maniteja.modesty067 at gmail.com Mon Jan 5 03:43:33 2015 From: maniteja.modesty067 at gmail.com (Maniteja Nandana) Date: Mon, 5 Jan 2015 14:13:33 +0530 Subject: [Numpy-discussion] edge-cases of ellipsis indexing In-Reply-To: References: Message-ID: Hi Anthony, I am not sure whether the following section in documentation is relevant to the behavior you were referring to. When an ellipsis (...) is present but has no size (i.e. replaces zero :) the result will still always be an array. A view if no advanced index is present, otherwise a copy. Here, ...replaces zero : Advanced indexing always returns a *copy* of the data (contrast with basic slicing that returns a *view* ). And I think it is a view that is returned in this case. >>> a = array([1]) >>>a array([1]) >>>a[:,0] # zero : are present Traceback (most recent call last): File "", line 1, in IndexError: too many indices for array >>>a[...,0]=2 >>>a array([2]) >>>a[0] = 3 >>>a array([3]) >>>a[(0,)] = 4 >>>a array([4]) >>>a[: array([1]) Hope I helped. Cheers, N.Maniteja. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Mon Jan 5 03:43:45 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 5 Jan 2015 08:43:45 +0000 (UTC) Subject: [Numpy-discussion] The future of ndarray.diagonal() References: <54A968C6.1040502@fastmail.net> <54AA1399.4000201@fastmail.net> Message-ID: <1893055725442139350.794591sturla.molden-gmail.com@news.gmane.org> Konrad Hinsen wrote: > Scientific communication depends more and more on scripts as the only > precise documentation of a computational method. Our programming > languages are becoming a major form of scientific notation, alongside > traditional mathematics. To me it seems that algorithms in scientific papers and books are described in various forms of pseudo-code. Perhaps we need a notation which is universal and ethernal like the language mathematics. But I am not sure Python could or should try to be that "scripting" language. I also think it is reasonable to ask if journals should require code as algorithmic documentation to be written in some ISO standard language like C or Fortran 90. The behavior of Python and NumPy are not dictated by standards, and as such is not better than pseudo-code. Sturla From konrad.hinsen at fastmail.net Mon Jan 5 04:08:19 2015 From: konrad.hinsen at fastmail.net (Konrad Hinsen) Date: Mon, 05 Jan 2015 10:08:19 +0100 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: <1893055725442139350.794591sturla.molden-gmail.com@news.gmane.org> References: <54A968C6.1040502@fastmail.net> <54AA1399.4000201@fastmail.net> <1893055725442139350.794591sturla.molden-gmail.com@news.gmane.org> Message-ID: <34E233C6626E440F136224C0@Ordinateur-de-Catherine--Konrad.local> --On 5 janvier 2015 08:43:45 +0000 Sturla Molden wrote: > To me it seems that algorithms in scientific papers and books are > described in various forms of pseudo-code. That's indeed what people do when they write a paper about an algorithm. But many if not most algorithms in computational science are never published in a specific article. Very often, a scientific article gives only an outline of a method in plain English. The only full documentation of the method is the implementation. > Perhaps we need a notation > which is universal and ethernal like the language mathematics. But I am > not sure Python could or should try to be that "scripting" language. Neither Python nor any other programming was designed for that task, and none of them is really a good fit. But today's de facto situation is that programming languages fulfill the role of algorithmic specification languages in computational science. And I don't expect this to change rapidly, in particular because to the best of my knowledge there is no better choice available at the moment. I wrote an article on this topic that will appear in the March 2015 issue of "Computing in Science and Engineering". It concludes that for now, a simple Python script is probably the best you can do for an executable specification of an algorithm. However, I also recommend not using big libraries such as NumPy in such scripts. > I also think it is reasonable to ask if journals should require code as > algorithmic documentation to be written in some ISO standard language like > C or Fortran 90. The behavior of Python and NumPy are not dictated by > standards, and as such is not better than pseudo-code. True, but the ISO specifications of C and Fortran have so many holes ("undefined behavior") that they are not really much better for the job. And again, we can't ignore the reality of the de facto use today: there are no such requirements or even guidelines, so Python scripts are often the best we have as algorithmic documentation. Konrad. From sebastian at sipsolutions.net Mon Jan 5 04:14:56 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 05 Jan 2015 10:14:56 +0100 Subject: [Numpy-discussion] edge-cases of ellipsis indexing In-Reply-To: References: Message-ID: <1420449296.31170.3.camel@sebastian-t440> On Mo, 2015-01-05 at 14:13 +0530, Maniteja Nandana wrote: > Hi Anthony, > > > I am not sure whether the following section in documentation is > relevant to the behavior you were referring to. > > > When an ellipsis (...) is present but has no size (i.e. replaces > zero :) the result will still always be an array. A view if no > advanced index is present, otherwise a copy. > Exactly. There are actually three forms of indexing to distinguish. 1. Indexing with integers (also scalar arrays) matching the number of dimensions. This will return a *scalar*. 2. Slicing, etc. which returns a view. This also occurs as soon there is an ellipsis in there (even if it replaces 0 `:`). You should see it as a feature to get a view if the result might be a scalar otherwise ;)! 3. Advanced indexing which cannot be view based and returns a copy. - Sebastian > Here, ...replaces zero : > > > > Advanced indexing always returns a copy of the data (contrast with > basic slicing that returns a view). > And I think it is a view that is returned in this case. > > > >>> a = array([1]) > >>>a > array([1]) > >>>a[:,0] # zero : are present > Traceback (most recent call last): > File "", line 1, in > IndexError: too many indices for array > >>>a[...,0]=2 > >>>a > array([2]) > >>>a[0] = 3 > >>>a > array([3]) > >>>a[(0,)] = 4 > >>>a > array([4]) > >>>a[: > array([1]) > > > Hope I helped. > > > Cheers, > N.Maniteja. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From josef.pktd at gmail.com Mon Jan 5 10:48:55 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 5 Jan 2015 10:48:55 -0500 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: <34E233C6626E440F136224C0@Ordinateur-de-Catherine--Konrad.local> References: <54A968C6.1040502@fastmail.net> <54AA1399.4000201@fastmail.net> <1893055725442139350.794591sturla.molden-gmail.com@news.gmane.org> <34E233C6626E440F136224C0@Ordinateur-de-Catherine--Konrad.local> Message-ID: On Mon, Jan 5, 2015 at 4:08 AM, Konrad Hinsen wrote: > --On 5 janvier 2015 08:43:45 +0000 Sturla Molden > wrote: > > > To me it seems that algorithms in scientific papers and books are > > described in various forms of pseudo-code. > > That's indeed what people do when they write a paper about an algorithm. > But many if not most algorithms in computational science are never > published in a specific article. Very often, a scientific article gives > only an outline of a method in plain English. The only full documentation > of the method is the implementation. > > > Perhaps we need a notation > > which is universal and ethernal like the language mathematics. But I am > > not sure Python could or should try to be that "scripting" language. > > Neither Python nor any other programming was designed for that task, and > none of them is really a good fit. But today's de facto situation is that > programming languages fulfill the role of algorithmic specification > languages in computational science. And I don't expect this to change > rapidly, in particular because to the best of my knowledge there is no > better choice available at the moment. > > I wrote an article on this topic that will appear in the March 2015 issue > of "Computing in Science and Engineering". It concludes that for now, a > simple Python script is probably the best you can do for an executable > specification of an algorithm. However, I also recommend not using big > libraries such as NumPy in such scripts. > > > I also think it is reasonable to ask if journals should require code as > > algorithmic documentation to be written in some ISO standard language > like > > C or Fortran 90. The behavior of Python and NumPy are not dictated by > > standards, and as such is not better than pseudo-code. > > True, but the ISO specifications of C and Fortran have so many holes > ("undefined behavior") that they are not really much better for the job. > And again, we can't ignore the reality of the de facto use today: there are > no such requirements or even guidelines, so Python scripts are often the > best we have as algorithmic documentation. > Matlab is more "well defined" than numpy. numpy has too many features. I think, if you want a runnable python script as algorithmic documentation, then it will be necessary and relatively easy in most cases to stick to the "stable" basic features. The same for a library, if we want to minimize compatibility problems, then we shouldn't use features that are most likely a moving target. One of the issues is whether we want to write "safe" or "fancy" code. (Fancy code might or will be faster, with a specific version.) For example in most of my use cases having a view or copy of an array makes a difference to the performance but not the results. I didn't participate in the `diagonal` debate because I don't have a strong opinion and don't use it with an assignment. There is an explicit np.fill_diagonal that is inplace. Having views or copies of arrays never sounded like having a clear cut answer, there are too many functions that "return views if possible". When our (statsmodels) code correctness depends on whether it's a view or copy, then we usually make sure and write the matching unit tests. Other cases, the behavior of numpy in edge cases like empty arrays is still in flux. We usually try to avoid relying on implicit behavior. Dtypes are a mess (in terms of code compatibility). Matlab is much nicer, it's all just doubles. Now pandas and numpy are making object arrays popular and introduce strange things like datetime dtypes, and users think a program written a while ago can handle them. Related compatibility issue python 2 and python 3: For non-string manipulation scientific code the main limitation is to avoid version specific features, and decide when to use lists versus iterators for range, zip, map. Other than that, it looks much simpler to me than expected. Overall I think the current policy of incremental changes in numpy works very well. Statsmodels needs a few minor adjustments in each version. But most of those are for cases where numpy became more strict or where we used a specific behavior in edge cases, AFAIR. One problem with accumulating changes for a larger version change like numpy 2 or 3 or 4 is to decide what changes would require this. Most changes will break some code, if the code requires or uses some exotic or internal behavior. If we want to be strict, then we don't change the policy but change the version numbers, instead of 1.8, 1.9 we have numpy 18 and numpy 19. However, from my perspective none of the recent changes were fundamental enough. BTW: Stata is versioning scripts. Each script can define for which version of Stata it was written, but I have no idea how they handle the compatibility issues. It looks to me that it would be way too much work to do something like this in an open source project. Legacy cleanups like removal of numeric compatibility in numpy or weave (and maxentropy) in scipy have been announced for a long time, and eventually all legacy code needs to run in a legacy environment. But that's a different issue from developing numpy and the current scientific python related packages which need the improvements. It is always possible just to "freeze" a package, with it's own frozen python and frozen versions of dependencies. Josef > > Konrad. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Mon Jan 5 11:13:36 2015 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 05 Jan 2015 11:13:36 -0500 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: <54A968C6.1040502@fastmail.net> <54AA1399.4000201@fastmail.net> <1893055725442139350.794591sturla.molden-gmail.com@news.gmane.org> <34E233C6626E440F136224C0@Ordinateur-de-Catherine--Konrad.local> Message-ID: <54AAB830.2040409@gmail.com> On 1/5/2015 10:48 AM, josef.pktd at gmail.com wrote: > Dtypes are a mess (in terms of code compatibility). Matlab is much nicer, it's all just doubles. 1. Thank goodness for dtypes. 2. http://www.mathworks.com/help/matlab/numeric-types.html 3. After translating Matlab code to much nicer NumPy, I cannot find any way to say MATLAB is "nicer". Cheers, Alan From josef.pktd at gmail.com Mon Jan 5 12:26:39 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 5 Jan 2015 12:26:39 -0500 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: <54AAB830.2040409@gmail.com> References: <54A968C6.1040502@fastmail.net> <54AA1399.4000201@fastmail.net> <1893055725442139350.794591sturla.molden-gmail.com@news.gmane.org> <34E233C6626E440F136224C0@Ordinateur-de-Catherine--Konrad.local> <54AAB830.2040409@gmail.com> Message-ID: On Mon, Jan 5, 2015 at 11:13 AM, Alan G Isaac wrote: > On 1/5/2015 10:48 AM, josef.pktd at gmail.com wrote: > > Dtypes are a mess (in terms of code compatibility). Matlab is much > nicer, it's all just doubles. > > > 1. Thank goodness for dtypes. > 2. http://www.mathworks.com/help/matlab/numeric-types.html > 3. After translating Matlab code to much nicer NumPy, > I cannot find any way to say MATLAB is "nicer". > Maybe it's my selection bias in matlab, I only wrote or read code in matlab that used exclusively double. Of course they are a necessary and great feature. However, life and code would be simpler if we could just do x = np.asarray(x, float) or even x = np.array(x, float) at the beginning of every function, instead of worrying why a user doesn't have float and trying to accommodate that choice. https://github.com/statsmodels/statsmodels/search?q=dtype&type=Issues&utf8=%E2%9C%93 AFAIK, matlab and R still have copy on write, so they don't have to worry about inplace modifications. 5 lines of code to implement an algorithm, and 50 lines of code for input checking. My response was to the issue of code as algorithmic documentation: There are packages or code supplements to books that come with the disclaimer that the code is written for educational purposes, to help understand the algorithm, but is not designed for efficiency or performance or generality. The more powerful the language and the "fancier" the code, the larger is the maintenance and wrapping work. another example: a dot product of a float/double 2d array is independent of any numpy version, and it will produce the same result in numpy 19.0 (except for different machine precision rounding errors) a dot product of an array (without dtype and shape restriction) might be anything and change within a few numpy versions. Josef > > Cheers, > Alan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjwilliams43 at gmail.com Mon Jan 5 13:40:44 2015 From: cjwilliams43 at gmail.com (Colin J. Williams) Date: Mon, 5 Jan 2015 13:40:44 -0500 Subject: [Numpy-discussion] Characteristic of a Matrix. Message-ID: One of the essential characteristics of a matrix is that it be rectangular. This is neither spelt out or checked currently. The Doc description refers to a class: - *class *numpy.matrix[source] Returns a matrix from an array-like object, or from a string of data. A matrix is a specialized 2-D array that retains its 2-D nature through operations. It has certain special operators, such as * (matrix multiplication) and ** (matrix power). This illustrates a failure, which is reported later in the calculation: A2= np.matrix([[1, 2, -2], [-3, -1, 4], [4, 2 -6]]) Here 2 - 6 is treated as an expression. Wikipedia offers: In mathematics , a *matrix* (plural *matrices*) is a rectangular *array *[1] of numbers , symbols , or expressions , arranged in *rows * and *columns *.[2] [3] The individual items in a matrix are called its *elements* or *entries*. An example of a matrix with 2 rows and 3 columns is [image: \begin{bmatrix}1 & 9 & -13 \\20 & 5 & -6 \end{bmatrix}.]In the Numpy context, the symbols or expressions need to be evaluable. Colin W. -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Mon Jan 5 13:56:41 2015 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Mon, 5 Jan 2015 20:56:41 +0200 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: References: Message-ID: On 5 January 2015 at 20:40, Colin J. Williams wrote: > This illustrates a failure, which is reported later in the calculation: > > A2= np.matrix([[1, 2, -2], [-3, -1, 4], [4, 2 -6]]) > > Here 2 - 6 is treated as an expression. > There should be a comma between 2 and -6. The rectangularity is checked, and in this case, it is not fulfilled. As such, NumPy creates a square matrix of size 1x1 of dtype object. If you want to make sure what you have manually inputed is correct, you should include a couple of assertions afterwards. /David. -------------- next part -------------- An HTML attachment was scrubbed... URL: From e.antero.tammi at gmail.com Mon Jan 5 13:57:43 2015 From: e.antero.tammi at gmail.com (eat) Date: Mon, 5 Jan 2015 20:57:43 +0200 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: References: Message-ID: Hi, On Mon, Jan 5, 2015 at 8:40 PM, Colin J. Williams wrote: > One of the essential characteristics of a matrix is that it be rectangular. > > This is neither spelt out or checked currently. > > The Doc description refers to a class: > > - *class *numpy.matrix[source] > > > Returns a matrix from an array-like object, or from a string of data. A > matrix is a specialized 2-D array that retains its 2-D > nature through operations. It has certain special operators, such as * > (matrix multiplication) and ** (matrix power). > > This illustrates a failure, which is reported later in the calculation: > > A2= np.matrix([[1, 2, -2], [-3, -1, 4], [4, 2 -6]]) > > Here 2 - 6 is treated as an expression. > FWIW, here A2 is definitely rectangular, with shape== (1, 3) and dtype== object, i.e elements are just python lists. > Wikipedia offers: > > In mathematics , a *matrix* > (plural *matrices*) is a rectangular > *array > *[1] > of > numbers , symbols > , or expressions > , arranged in *rows > * and *columns > *.[2] > [3] > > (and in this context also python objects). -eat > The individual items in a matrix are called its *elements* or *entries*. > An example of a matrix with 2 rows and 3 columns is > [image: \begin{bmatrix}1 & 9 & -13 \\20 & 5 & -6 \end{bmatrix}.]In the > Numpy context, the symbols or expressions need to be evaluable. > > Colin W. > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Jan 5 13:58:25 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 5 Jan 2015 18:58:25 +0000 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: References: Message-ID: I'm afraid that I really don't understand what you're trying to say. Is there something that you think numpy should be doing differently? On Mon, Jan 5, 2015 at 6:40 PM, Colin J. Williams wrote: > One of the essential characteristics of a matrix is that it be rectangular. > > This is neither spelt out or checked currently. > > The Doc description refers to a class: > > - *class *numpy.matrix[source] > > > Returns a matrix from an array-like object, or from a string of data. A > matrix is a specialized 2-D array that retains its 2-D > nature through operations. It has certain special operators, such as * > (matrix multiplication) and ** (matrix power). > > This illustrates a failure, which is reported later in the calculation: > > A2= np.matrix([[1, 2, -2], [-3, -1, 4], [4, 2 -6]]) > > Here 2 - 6 is treated as an expression. > > Wikipedia offers: > > In mathematics , a *matrix* > (plural *matrices*) is a rectangular > *array > *[1] > of > numbers , symbols > , or expressions > , arranged in *rows > * and *columns > *.[2] > [3] > The > individual items in a matrix are called its *elements* or *entries*. An > example of a matrix with 2 rows and 3 columns is > [image: \begin{bmatrix}1 & 9 & -13 \\20 & 5 & -6 \end{bmatrix}.]In the > Numpy context, the symbols or expressions need to be evaluable. > > Colin W. > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Mon Jan 5 14:16:54 2015 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Mon, 5 Jan 2015 14:16:54 -0500 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: References: Message-ID: On Mon, Jan 5, 2015 at 1:58 PM, Nathaniel Smith wrote: > I'm afraid that I really don't understand what you're trying to say. Is > there something that you think numpy should be doing differently? > > This is a case similar to the issue discussed in https://github.com/numpy/numpy/issues/5303. Instead of getting an error (because the arguments don't create the expected 2-d matrix), a matrix with dtype object and shape (1, 3) is created. Warren > On Mon, Jan 5, 2015 at 6:40 PM, Colin J. Williams > wrote: > >> One of the essential characteristics of a matrix is that it be >> rectangular. >> >> This is neither spelt out or checked currently. >> >> The Doc description refers to a class: >> >> - *class *numpy.matrix[source] >> >> >> Returns a matrix from an array-like object, or from a string of data. A >> matrix is a specialized 2-D array that retains its 2-D >> nature through operations. It has certain special operators, such as * >> (matrix multiplication) and ** (matrix power). >> >> This illustrates a failure, which is reported later in the calculation: >> >> A2= np.matrix([[1, 2, -2], [-3, -1, 4], [4, 2 -6]]) >> >> Here 2 - 6 is treated as an expression. >> >> Wikipedia offers: >> >> In mathematics , a *matrix* >> (plural *matrices*) is a rectangular >> *array >> *[1] >> of >> numbers , symbols >> , or expressions >> , arranged in *rows >> * and *columns >> *.[2] >> [3] >> The >> individual items in a matrix are called its *elements* or *entries*. An >> example of a matrix with 2 rows and 3 columns is >> [image: \begin{bmatrix}1 & 9 & -13 \\20 & 5 & -6 \end{bmatrix}.]In the >> Numpy context, the symbols or expressions need to be evaluable. >> >> Colin W. >> >> >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Jan 5 14:18:06 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 5 Jan 2015 14:18:06 -0500 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: References: Message-ID: On Mon, Jan 5, 2015 at 1:58 PM, Nathaniel Smith wrote: > I'm afraid that I really don't understand what you're trying to say. Is > there something that you think numpy should be doing differently? > I liked it better when this raised an exception, instead of creating a rectangular object array. Josef > > On Mon, Jan 5, 2015 at 6:40 PM, Colin J. Williams > wrote: > >> One of the essential characteristics of a matrix is that it be >> rectangular. >> >> This is neither spelt out or checked currently. >> >> The Doc description refers to a class: >> >> - *class *numpy.matrix[source] >> >> >> Returns a matrix from an array-like object, or from a string of data. A >> matrix is a specialized 2-D array that retains its 2-D >> nature through operations. It has certain special operators, such as * >> (matrix multiplication) and ** (matrix power). >> >> This illustrates a failure, which is reported later in the calculation: >> >> A2= np.matrix([[1, 2, -2], [-3, -1, 4], [4, 2 -6]]) >> >> Here 2 - 6 is treated as an expression. >> >> Wikipedia offers: >> >> In mathematics , a *matrix* >> (plural *matrices*) is a rectangular >> *array >> *[1] >> of >> numbers , symbols >> , or expressions >> , arranged in *rows >> * and *columns >> *.[2] >> [3] >> The >> individual items in a matrix are called its *elements* or *entries*. An >> example of a matrix with 2 rows and 3 columns is >> [image: \begin{bmatrix}1 & 9 & -13 \\20 & 5 & -6 \end{bmatrix}.]In the >> Numpy context, the symbols or expressions need to be evaluable. >> >> Colin W. >> >> >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Jan 5 14:36:20 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 5 Jan 2015 19:36:20 +0000 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: References: Message-ID: On Mon, Jan 5, 2015 at 7:18 PM, wrote: > > > > On Mon, Jan 5, 2015 at 1:58 PM, Nathaniel Smith wrote: >> >> I'm afraid that I really don't understand what you're trying to say. Is there something that you think numpy should be doing differently? > > > I liked it better when this raised an exception, instead of creating a rectangular object array. Did it really used to raise an exception? Patches accepted :-) (#5303 is the relevant bug, like Warren points out. From the discussion there it doesn't look like np.array's handling of non-conformable lists has any defenders.) -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From josef.pktd at gmail.com Mon Jan 5 14:48:03 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 5 Jan 2015 14:48:03 -0500 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: References: Message-ID: On Mon, Jan 5, 2015 at 2:36 PM, Nathaniel Smith wrote: > On Mon, Jan 5, 2015 at 7:18 PM, wrote: > > > > > > > > On Mon, Jan 5, 2015 at 1:58 PM, Nathaniel Smith wrote: > >> > >> I'm afraid that I really don't understand what you're trying to say. Is > there something that you think numpy should be doing differently? > > > > > > I liked it better when this raised an exception, instead of creating a > rectangular object array. > > Did it really used to raise an exception? Patches accepted :-) (#5303 > is the relevant bug, like Warren points out. From the discussion there > it doesn't look like np.array's handling of non-conformable lists has > any defenders.) > Since I'm usually late in updating numpy, I was for a long time very familiar with the frequent occurence of `ValueError: setting an array element with a sequence.` based on this, it was up to numpy 1.5 https://github.com/scipy/scipy/pull/2631#issuecomment-20898809 "ugly but backwards compatible" :) Josef > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From e.antero.tammi at gmail.com Mon Jan 5 14:53:06 2015 From: e.antero.tammi at gmail.com (eat) Date: Mon, 5 Jan 2015 21:53:06 +0200 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: References: Message-ID: On Mon, Jan 5, 2015 at 9:36 PM, Nathaniel Smith wrote: > On Mon, Jan 5, 2015 at 7:18 PM, wrote: > > > > > > > > On Mon, Jan 5, 2015 at 1:58 PM, Nathaniel Smith wrote: > >> > >> I'm afraid that I really don't understand what you're trying to say. Is > there something that you think numpy should be doing differently? > > > > > > I liked it better when this raised an exception, instead of creating a > rectangular object array. > > Did it really used to raise an exception? Patches accepted :-) (#5303 > is the relevant bug, like Warren points out. From the discussion there > it doesn't look like np.array's handling of non-conformable lists has > any defenders.) > +1 for 'object array [and matrix] construction should require explicitly specifying dtype= object' -eat > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjw at ncf.ca Mon Jan 5 19:52:33 2015 From: cjw at ncf.ca (cjw) Date: Mon, 05 Jan 2015 19:52:33 -0500 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: References: Message-ID: <54AB31D1.1040002@ncf.ca> On 05-Jan-15 1:56 PM, David?id wrote: > On 5 January 2015 at 20:40, Colin J. Williams > wrote: > >> This illustrates a failure, which is reported later in the calculation: >> >> A2= np.matrix([[1, 2, -2], [-3, -1, 4], [4, 2 -6]]) >> >> Here 2 - 6 is treated as an expression. >> > There should be a comma between 2 and -6. The rectangularity is checked, > and in this case, it is not fulfilled. As such, NumPy creates a square > matrix of size 1x1 of dtype object. > > If you want to make sure what you have manually inputed is correct, you > should include a couple of assertions afterwards. > > /David. David, Thanks. My suggestion was that numpy should do that checking, Colin W. > From cjw at ncf.ca Mon Jan 5 20:08:24 2015 From: cjw at ncf.ca (cjw) Date: Mon, 05 Jan 2015 20:08:24 -0500 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: References: Message-ID: <54AB3588.1090902@ncf.ca> An HTML attachment was scrubbed... URL: From antony.lee at berkeley.edu Mon Jan 5 20:53:30 2015 From: antony.lee at berkeley.edu (Antony Lee) Date: Mon, 5 Jan 2015 18:53:30 -0700 Subject: [Numpy-discussion] edge-cases of ellipsis indexing In-Reply-To: <1420449296.31170.3.camel@sebastian-t440> References: <1420449296.31170.3.camel@sebastian-t440> Message-ID: I see, thanks! 2015-01-05 2:14 GMT-07:00 Sebastian Berg : > On Mo, 2015-01-05 at 14:13 +0530, Maniteja Nandana wrote: > > Hi Anthony, > > > > > > I am not sure whether the following section in documentation is > > relevant to the behavior you were referring to. > > > > > > When an ellipsis (...) is present but has no size (i.e. replaces > > zero :) the result will still always be an array. A view if no > > advanced index is present, otherwise a copy. > > > > Exactly. There are actually three forms of indexing to distinguish. > > 1. Indexing with integers (also scalar arrays) matching the number of > dimensions. This will return a *scalar*. > 2. Slicing, etc. which returns a view. This also occurs as soon there is > an ellipsis in there (even if it replaces 0 `:`). You should see it as a > feature to get a view if the result might be a scalar otherwise ;)! > 3. Advanced indexing which cannot be view based and returns a copy. > > - Sebastian > > > > Here, ...replaces zero : > > > > > > > > Advanced indexing always returns a copy of the data (contrast with > > basic slicing that returns a view). > > And I think it is a view that is returned in this case. > > > > > > >>> a = array([1]) > > >>>a > > array([1]) > > >>>a[:,0] # zero : are present > > Traceback (most recent call last): > > File "", line 1, in > > IndexError: too many indices for array > > >>>a[...,0]=2 > > >>>a > > array([2]) > > >>>a[0] = 3 > > >>>a > > array([3]) > > >>>a[(0,)] = 4 > > >>>a > > array([4]) > > >>>a[: > > array([1]) > > > > > > Hope I helped. > > > > > > Cheers, > > N.Maniteja. > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Tue Jan 6 02:14:22 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 6 Jan 2015 08:14:22 +0100 Subject: [Numpy-discussion] Proceedings of EuroSciPy 2014 Message-ID: On Tue, Dec 23, 2014 at 7:06 PM, wrote: > Dear scientist using Python, > > We are glad to announce the publication of the proceedings of the 7th > European > Conference on Python in Science, EuroSciPy 2014, still in 2014! > > The proceedings cover various scientific fields in which Python and its > scientific libraries are used. You may obtain the table of contents and > all the > articles on the arXiv at http://arxiv.org/abs/1412.7030 > For convenience, the articles' titles are listed below. > > It is a useful reference to have as the publication of software-related > scientific work is not always straightforward. > > Thanks go to all authors and reviewers for their contributions. The reviews > were conducted publicly at > https://github.com/euroscipy/euroscipy_proceedings > > Pierre de Buyl & Nelle Varoquaux, editors > > PS: there was no large announcement of the proceedings of EuroSciPy 2013. > In the > hope that this can increase their visibility, here is the URL > Proceedings of EuroSciPy 2013: http://arxiv.org/abs/1405.0166 > > Pierre de Buyl, Nelle Varoquaux: Preface > J?r?me Kieffer, Giannis Ashiotis: PyFAI: a Python library for high > performance azimuthal integration on GPU > Andrew Leonard, Huw Morgan: Temperature diagnostics of the solar > atmosphere using SunPy > Bastian Venthur, Benjamin Blankertz: Wyrm, A Pythonic Toolbox for > Brain-Computer Interfacing > Christophe Pouzat, Georgios Is. Detorakis: SPySort: Neuronal Spike Sorting > with Python > Thomas Cokelaer, Julio Saez-Rodriguez: Using Python to Dive into > Signalling Data with CellNOpt and BioServices > Davide Monari, Francesco Cenni, Erwin Aertbeli?n, Kaat Desloovere: > Py3DFreeHandUS: a library for voxel-array reconstruction using > Ultrasonography and attitude sensors > Esteban Fuentes, Hector E. Martinez: SClib, a hack for straightforward > embedded C functions in Python > Jamie A Dean, Liam C Welsh, Kevin J Harrington, Christopher M Nutting, > Sarah L Gulliford: Predictive Modelling of Toxicity Resulting from > Radiotherapy Treatments of Head and Neck Cancer > Rebecca R. Murphy, Sophie E. Jackson, David Klenerman: pyFRET: A Python > Library for Single Molecule Fluorescence Data Analysis > Robert Cimrman: Enhancing SfePy with Isogeometric Analysis > Steve Brasier, Fred Pollard: A Python-based Post-processing Toolset For > Seismic Analyses > Vladim?r Luke?, Miroslav Ji??k, Alena Jon??ov?, Eduard Rohan, Ond?ej > Bubl?k, Robert Cimrman: Numerical simulation of liver perfusion: from CT > scans to FE model > > _______________________________________________ > euroscipy-org mailing list > euroscipy-org at python.org > https://mail.python.org/mailman/listinfo/euroscipy-org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Tue Jan 6 07:31:36 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 06 Jan 2015 13:31:36 +0100 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: <54AB3588.1090902@ncf.ca> References: <54AB3588.1090902@ncf.ca> Message-ID: On 06/01/15 02:08, cjw wrote: > This is not a comment on any present matrix support, but deals with the > matrix class, which existed back when Todd Miller of the Space Telescope > Group supported numpy. > > Matrix is a sub-class of ndarray. Since this Matrix class is (more or less) deprecated and its use discouraged, I think it should just be left as it is. Sturla From cjw at ncf.ca Tue Jan 6 19:58:04 2015 From: cjw at ncf.ca (cjw) Date: Tue, 06 Jan 2015 19:58:04 -0500 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: References: <54AB3588.1090902@ncf.ca> Message-ID: <54AC849C.20902@ncf.ca> An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Jan 6 20:20:48 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 7 Jan 2015 01:20:48 +0000 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: <54AC849C.20902@ncf.ca> References: <54AB3588.1090902@ncf.ca> <54AC849C.20902@ncf.ca> Message-ID: Hi Colin, On Wed, Jan 7, 2015 at 12:58 AM, cjw wrote: > > My recollection, from discussions, at the time of the introduction of the @ > operator, was that there was no intention to disturb the existing Matrix > class. Yeah, we're not going to be making any major changes to the numpy.matrix class -- e.g. we certainly aren't going to disallow non-numeric data types at this point. > I see the matrix as a long recognized mathematical entity. On the other > hand, the array is a very useful computational construct, used in a number > of computer languages. > > Since matrices are now part of some high school curricula, I urge that they > be treated appropriately in Numpy. Further, I suggest that consideration be > given to establishing V and VT sub-classes, to cover vectors and transposed > vectors. The numpy devs don't really have the interest or the skills to create a great library for pedagogical use in high schools. If you're interested in an interface like this, then I'd suggest creating a new package focused specifically on that (which might use numpy internally). There's really no advantage in glomming this into numpy proper. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From ndarray at mac.com Tue Jan 6 20:38:42 2015 From: ndarray at mac.com (Alexander Belopolsky) Date: Tue, 6 Jan 2015 20:38:42 -0500 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: References: <54AB3588.1090902@ncf.ca> <54AC849C.20902@ncf.ca> Message-ID: On Tue, Jan 6, 2015 at 8:20 PM, Nathaniel Smith wrote: > > Since matrices are now part of some high school curricula, I urge that > they > > be treated appropriately in Numpy. Further, I suggest that > consideration be > > given to establishing V and VT sub-classes, to cover vectors and > transposed > > vectors. > > The numpy devs don't really have the interest or the skills to create > a great library for pedagogical use in high schools. If you're > interested in an interface like this, then I'd suggest creating a new > package focused specifically on that (which might use numpy > internally). There's really no advantage in glomming this into numpy > proper. Sorry for taking this further off-topic, but I recently discovered an excellent SAGE package, . While it's targeted audience includes math graduate students and research mathematicians, parts of it are accessible to schoolchildren. SAGE is written in Python and integrates a number of packages including numpy. I would highly recommend to anyone interested in using Python for education to take a look at SAGE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Jan 7 10:41:07 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 7 Jan 2015 15:41:07 +0000 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: <54A968C6.1040502@fastmail.net> References: <54A968C6.1040502@fastmail.net> Message-ID: Hi, On Sun, Jan 4, 2015 at 4:22 PM, Konrad Hinsen wrote: > 2) Changes that yield to different results for unmodified legacy code > should never be allowed. I think this is a very reasonable rule. Case in point - I have some fairly old code in https://github.com/matthew-brett/transforms3d . I haven't updated this code since 2011. Now I test it, I get the following warning: ====================================================================== ERROR: Failure: ValueError (assignment destination is read-only) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/mb312/.virtualenvs/scipy-devel/lib/python2.7/site-packages/nose/loader.py", line 251, in generate for test in g(): File "/Users/mb312/dev_trees/transforms3d/transforms3d/tests/test_affines.py", line 74, in test_rand_de_compose T, R, Z, S = func(M) File "/Users/mb312/dev_trees/transforms3d/transforms3d/affines.py", line 298, in decompose Z[0] *= -1 ValueError: assignment destination is read-only If I had waited until 1.10 (or whatever) - I would have had to hope that my tests were good enough to pick this up, otherwise anyone using this code would be subject to some very strange bugs. Cheers, Matthew From cjw at ncf.ca Wed Jan 7 14:35:32 2015 From: cjw at ncf.ca (cjw) Date: Wed, 07 Jan 2015 14:35:32 -0500 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: References: <54AB3588.1090902@ncf.ca> <54AC849C.20902@ncf.ca> Message-ID: <54AD8A84.6010403@ncf.ca> Nathaniel, Of the two characteristics to which I pointed, I feel that the rectangularity check is the more important. I gave an example of a typo which demonstrated this problem. The error message reported that pinv does not have a conjugate function which, I suggest, is a totally misleading error message. In these circumstances, I hope that the Development Team will wish to treat this as a bug. Regards, Colin W. On 06-Jan-15 8:20 PM, Nathaniel Smith wrote: > Hi Colin, > > On Wed, Jan 7, 2015 at 12:58 AM, cjw wrote: >> My recollection, from discussions, at the time of the introduction of the @ >> operator, was that there was no intention to disturb the existing Matrix >> class. > Yeah, we're not going to be making any major changes to the > numpy.matrix class -- e.g. we certainly aren't going to disallow > non-numeric data types at this point. > >> I see the matrix as a long recognized mathematical entity. On the other >> hand, the array is a very useful computational construct, used in a number >> of computer languages. >> >> Since matrices are now part of some high school curricula, I urge that they >> be treated appropriately in Numpy. Further, I suggest that consideration be >> given to establishing V and VT sub-classes, to cover vectors and transposed >> vectors. > The numpy devs don't really have the interest or the skills to create > a great library for pedagogical use in high schools. If you're > interested in an interface like this, then I'd suggest creating a new > package focused specifically on that (which might use numpy > internally). There's really no advantage in glomming this into numpy > proper. > > -n > From cjw at ncf.ca Wed Jan 7 14:44:47 2015 From: cjw at ncf.ca (cjw) Date: Wed, 07 Jan 2015 14:44:47 -0500 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: References: <54AB3588.1090902@ncf.ca> <54AC849C.20902@ncf.ca> Message-ID: <54AD8CAF.5090207@ncf.ca> An HTML attachment was scrubbed... URL: From rnelsonchem at gmail.com Thu Jan 8 13:19:33 2015 From: rnelsonchem at gmail.com (Ryan Nelson) Date: Thu, 8 Jan 2015 13:19:33 -0500 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: <54AD8CAF.5090207@ncf.ca> References: <54AB3588.1090902@ncf.ca> <54AC849C.20902@ncf.ca> <54AD8CAF.5090207@ncf.ca> Message-ID: Colin, I'll second the endorsement of Sage; however, for teaching purposes, I would suggest Sage Math Cloud. It is a free, web-based version of Sage, and it does not require you or the students to install any software (besides a new-ish web browser). It also make sharing/collaborative work quite easy as well. I've used this a bit for demos, and it's great. The author William Stein is good at correcting bugs/issues very quickly. Sage implements it's own Matrix and Vector classes, and the Vector class has a "column" method that returns a column vector (transpose). http://www.sagemath.org/doc/tutorial/tour_linalg.html For what it's worth, I agree with others about the benefits of avoiding a Matrix class in Numpy. In my experience, it certainly makes things cleaner in larger projects when I always use NDArray and just call the appropriate linear algebra functions (e.g. np.dot, etc) when that is context I need. Anyway, just my two cents. Ryan On Wed, Jan 7, 2015 at 2:44 PM, cjw wrote: > Thanks Alexander, > > I'll look at Sage. > > Colin W. > > > On 06-Jan-15 8:38 PM, Alexander Belopolsky wrote: > > On Tue, Jan 6, 2015 at 8:20 PM, Nathaniel Smith wrote: > > > Since matrices are now part of some high school curricula, I urge that > > they > > be treated appropriately in Numpy. Further, I suggest that > > consideration be > > given to establishing V and VT sub-classes, to cover vectors and > > transposed > > vectors. > > The numpy devs don't really have the interest or the skills to create > a great library for pedagogical use in high schools. If you're > interested in an interface like this, then I'd suggest creating a new > package focused specifically on that (which might use numpy > internally). There's really no advantage in glomming this into numpy > proper. > > > Sorry for taking this further off-topic, but I recently discovered an > excellent SAGE package, . While it's targeted > audience includes math graduate students and research mathematicians, parts > of it are accessible to schoolchildren. SAGE is written in Python and > integrates a number of packages including numpy. > > I would highly recommend to anyone interested in using Python for education > to take a look at SAGE. > > > > > _______________________________________________ > NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From n59_ru at hotmail.com Thu Jan 8 13:31:11 2015 From: n59_ru at hotmail.com (Nikolay Mayorov) Date: Thu, 8 Jan 2015 23:31:11 +0500 Subject: [Numpy-discussion] Build doesn't pass tests Message-ID: Hi all! I'm trying to build numpy on Windows 64 bit, Python 3.4.2 64 bit. I do environment setup by the following command: CMD /K "SET MSSdk=1 && SET DISTUTILS_USE_SDK=1 && "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\vcvarsall.bat" x86_amd64" Then I cd to the newly cloned numpy folder and do: python setup.py build_ext --inplace It looks like the build process finishes correctly. But then python -c "import numpy; numpy.test()" crashes the interpreter (some tests pass before the crash). I found out that it is caused by numpy.fromfile function call. What might be the reason of that? Do I use wrong msvc compiler? -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Jan 8 13:32:36 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 8 Jan 2015 18:32:36 +0000 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: <54AD8A84.6010403@ncf.ca> References: <54AB3588.1090902@ncf.ca> <54AC849C.20902@ncf.ca> <54AD8A84.6010403@ncf.ca> Message-ID: On Wed, Jan 7, 2015 at 7:35 PM, cjw wrote: > Nathaniel, > > Of the two characteristics to which I pointed, I feel that the > rectangularity check is the more important. I gave an example of a typo > which demonstrated this problem. The numpy matrix class does require rectangularity; the issue you ran into is more weird than that. It's legal to make a matrix of arbitrary python objects, e.g. np.matrix([["hello", None]]) (this can be useful e.g. if you want to work with extremely large integers using Python's long integer objects). In your case, b/c the lists were not the same length, the matrix constructor guessed that you wanted a matrix containing two Python list objects. This is pretty confusing, and fixing it is bug #5303. But it doesn't indicate any deeper problem with the matrix object. Notice: In [5]: A2 = np.matrix([[1, 2, -2], [-3, -1, 4], [4, 2 -6]]) In [6]: A2.shape Out[6]: (1, 3) In [7]: A2[0, 0] Out[7]: [1, 2, -2] > The error message reported that pinv does not have a conjugate function > which, I suggest, is a totally misleading error message. When working with arrays/matrices of objects, functions like 'pinv' will try to call special methods on the objects. This is a little weird and arguably a bug itself, but it does mean that it's at least possible in theory to have an array of arbitrary python objects and have pinv() work. Of course this requires objects that will cooperate. In this case, though, pinv() has no idea what to do with a matrix whose elements are themselves lists, so it gives an error. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From jtaylor.debian at googlemail.com Thu Jan 8 13:43:53 2015 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Thu, 08 Jan 2015 19:43:53 +0100 Subject: [Numpy-discussion] Build doesn't pass tests In-Reply-To: References: Message-ID: <54AECFE9.5030502@googlemail.com> On 01/08/2015 07:31 PM, Nikolay Mayorov wrote: > Hi all! > > I'm trying to build numpy on Windows 64 bit, Python 3.4.2 64 bit. > > I do environment setup by the following command: > > CMD /K "SET MSSdk=1 && SET DISTUTILS_USE_SDK=1 && "C:\Program Files > (x86)\Microsoft Visual Studio 12.0\VC\vcvarsall.bat" x86_amd64" > > Then I cd to the newly cloned numpy folder and do: python setup.py > build_ext --inplace > > It looks like the build process finishes correctly. > > But then python -c "import numpy; numpy.test()" crashes the interpreter > (some tests pass before the crash). I found out that it is caused by > numpy.fromfile function call. > > What might be the reason of that? Do I use wrong msvc compiler? > > I think compiling python3 extensions requires VS 2010, python 2 extensions VS2008. A crash in a fromfile test is what I would expect from using the wrong compiler. From n59_ru at hotmail.com Fri Jan 9 07:24:18 2015 From: n59_ru at hotmail.com (Nikolay Mayorov) Date: Fri, 9 Jan 2015 17:24:18 +0500 Subject: [Numpy-discussion] Build doesn't pass tests In-Reply-To: <54AECFE9.5030502@googlemail.com> References: , <54AECFE9.5030502@googlemail.com> Message-ID: Thank you, Julian. I just happen to build scikit-learn with VS2012 and thought it's OK to use it for other packages. > Date: Thu, 8 Jan 2015 19:43:53 +0100 > From: jtaylor.debian at googlemail.com > To: numpy-discussion at scipy.org > Subject: Re: [Numpy-discussion] Build doesn't pass tests > > On 01/08/2015 07:31 PM, Nikolay Mayorov wrote: > > Hi all! > > > > I'm trying to build numpy on Windows 64 bit, Python 3.4.2 64 bit. > > > > I do environment setup by the following command: > > > > CMD /K "SET MSSdk=1 && SET DISTUTILS_USE_SDK=1 && "C:\Program Files > > (x86)\Microsoft Visual Studio 12.0\VC\vcvarsall.bat" x86_amd64" > > > > Then I cd to the newly cloned numpy folder and do: python setup.py > > build_ext --inplace > > > > It looks like the build process finishes correctly. > > > > But then python -c "import numpy; numpy.test()" crashes the interpreter > > (some tests pass before the crash). I found out that it is caused by > > numpy.fromfile function call. > > > > What might be the reason of that? Do I use wrong msvc compiler? > > > > > > I think compiling python3 extensions requires VS 2010, python 2 > extensions VS2008. > A crash in a fromfile test is what I would expect from using the wrong > compiler. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From valentin at haenel.co Fri Jan 9 14:43:30 2015 From: valentin at haenel.co (Valentin Haenel) Date: Fri, 9 Jan 2015 20:43:30 +0100 Subject: [Numpy-discussion] [ANN] bcolz v0.8.0 Message-ID: <20150109194330.GA690@kudu.in-berlin.de> ====================== Announcing bcolz 0.8.0 ====================== What's new ========== This version adds a public API in the form of a Cython definitions file (``carray_ext.pxd``) for the ``carray`` class! This means, other libraries can use the Cython definitions to build more complex programs using the objects provided by bcolz. In fact, this feature was specifically requested and there already exists a nascent application called *bquery* (https://github.com/visualfabriq/bquery) which provides an efficient out-of-core groupby implementation for the ``ctable`` object Because this is a fairly sweeping change, the minor version number was incremented and no additional major features or bugfixes were added to this release. We kindly ask any users of bcolz to try this version carefully and report back any issues, bugs, or even slow-downs you experience. I.e. please, please be careful when deploying this version into production. Many, many kudos to Francesc Elies and Carst Vaartjes of Visualfabriq for their hard work, continued effort to push this feature and their work on bquery which makes use of it! What it is ========== *bcolz* provides columnar and compressed data containers that can live either on-disk or in-memory. Column storage allows for efficiently querying tables with a large number of columns. It also allows for cheap addition and removal of column. In addition, bcolz objects are compressed by default for reducing memory/disk I/O needs. The compression process is carried out internally by Blosc, an extremely fast meta-compressor that is optimized for binary data. Lastly, high-performance iterators (like ``iter()``, ``where()``) for querying the objects are provided. bcolz can use numexpr internally so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr optimizes the memory usage and use several cores for doing the computations, so it is blazing fast. Moreover, since the carray/ctable containers can be disk-based, and it is possible to use them for seamlessly performing out-of-memory computations. bcolz has minimal dependencies (NumPy), comes with an exhaustive test suite and fully supports both 32-bit and 64-bit platforms. Also, it is typically tested on both UNIX and Windows operating systems. Together, bcolz and the Blosc compressor, are finally fulfilling the promise of accelerating memory I/O, at least for some real scenarios: http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the Blaze project (http://blaze.pydata.org/) and Quantopian (https://www.quantopian.com/) which you can read more about by pointing your browser at the links below. * Visualfabriq: * *bquery*, A query and aggregation framework for Bcolz: * https://github.com/visualfabriq/bquery * Blaze: * Notebooks showing Blaze + Pandas + BColz interaction: * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb * Quantopian: * Using compressed data containers for faster backtesting at scale: * https://quantopian.github.io/talks/NeedForSpeed/slides.html Installing ========== bcolz is in the PyPI repository, so installing it is easy:: $ pip install -U bcolz Resources ========= Visit the main bcolz site repository at: http://github.com/Blosc/bcolz Manual: http://bcolz.blosc.org Home of Blosc compressor: http://blosc.org User's mail list: bcolz at googlegroups.com http://groups.google.com/group/bcolz License is the new BSD: https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt Release notes can be found in the Git repository: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst ---- **Enjoy data!** From pav at iki.fi Sun Jan 11 12:50:47 2015 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 11 Jan 2015 19:50:47 +0200 Subject: [Numpy-discussion] ANN: Scipy 0.15.0 release Message-ID: <54B2B7F7.4030708@iki.fi> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dear all, We are pleased to announce the Scipy 0.15.0 release. The 0.15.0 release contains bugfixes and new features, most important of which are mentioned in the excerpt from the release notes below. Source tarballs, binaries, and full release notes are available at https://sourceforge.net/projects/scipy/files/scipy/0.15.0/ Best regards, Pauli Virtanen ========================== SciPy 0.15.0 Release Notes ========================== SciPy 0.15.0 is the culmination of 6 months of hard work. It contains several new features, numerous bug-fixes, improved test coverage and better documentation. There have been a number of deprecations and API changes in this release, which are documented below. All users are encouraged to upgrade to this release, as there are a large number of bug-fixes and optimizations. Moreover, our development attention will now shift to bug-fix releases on the 0.16.x branch, and on adding new features on the master branch. This release requires Python 2.6, 2.7 or 3.2-3.4 and NumPy 1.5.1 or greater. New features ============ Linear Programming Interface - ---------------------------- The new function `scipy.optimize.linprog` provides a generic linear programming similar to the way `scipy.optimize.minimize` provides a generic interface to nonlinear programming optimizers. Currently the only method supported is *simplex* which provides a two-phase, dense-matrix-based simplex algorithm. Callbacks functions are supported, allowing the user to monitor the progress of the algorithm. Differential evolution, a global optimizer - ------------------------------------------ A new `scipy.optimize.differential_evolution` function has been added to the ``optimize`` module. Differential Evolution is an algorithm used for finding the global minimum of multivariate functions. It is stochastic in nature (does not use gradient methods), and can search large areas of candidate space, but often requires larger numbers of function evaluations than conventional gradient based techniques. ``scipy.signal`` improvements - ----------------------------- The function `scipy.signal.max_len_seq` was added, which computes a Maximum Length Sequence (MLS) signal. ``scipy.integrate`` improvements - -------------------------------- It is now possible to use `scipy.integrate` routines to integrate multivariate ctypes functions, thus avoiding callbacks to Python and providing better performance. ``scipy.linalg`` improvements - ----------------------------- The function `scipy.linalg.orthogonal_procrustes` for solving the procrustes linear algebra problem was added. BLAS level 2 functions ``her``, ``syr``, ``her2`` and ``syr2`` are now wrapped in ``scipy.linalg``. ``scipy.sparse`` improvements - ----------------------------- `scipy.sparse.linalg.svds` can now take a ``LinearOperator`` as its main input. ``scipy.special`` improvements - ------------------------------ Values of ellipsoidal harmonic (i.e. Lame) functions and associated normalization constants can be now computed using ``ellip_harm``, ``ellip_harm_2``, and ``ellip_normal``. New convenience functions ``entr``, ``rel_entr`` ``kl_div``, ``huber``, and ``pseudo_huber`` were added. ``scipy.sparse.csgraph`` improvements - ------------------------------------- Routines ``reverse_cuthill_mckee`` and ``maximum_bipartite_matching`` for computing reorderings of sparse graphs were added. ``scipy.stats`` improvements - ---------------------------- Added a Dirichlet multivariate distribution, `scipy.stats.dirichlet`. The new function `scipy.stats.median_test` computes Mood's median test. The new function `scipy.stats.combine_pvalues` implements Fisher's and Stouffer's methods for combining p-values. `scipy.stats.describe` returns a namedtuple rather than a tuple, allowing users to access results by index or by name. Deprecated features =================== The `scipy.weave` module is deprecated. It was the only module never ported to Python 3.x, and is not recommended to be used for new code - use Cython instead. In order to support existing code, ``scipy.weave`` has been packaged separately: https://github.com/scipy/weave. It is a pure Python package, and can easily be installed with ``pip install weave``. `scipy.special.bessel_diff_formula` is deprecated. It is a private function, and therefore will be removed from the public API in a following release. ``scipy.stats.nanmean``, ``nanmedian`` and ``nanstd`` functions are deprecated in favor of their numpy equivalents. Backwards incompatible changes ============================== scipy.ndimage - ------------- The functions `scipy.ndimage.minimum_positions`, `scipy.ndimage.maximum_positions`` and `scipy.ndimage.extrema` return positions as ints instead of floats. scipy.integrate - --------------- The format of banded Jacobians in `scipy.integrate.ode` solvers is changed. Note that the previous documentation of this feature was erroneous. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEARECAAYFAlSyt/cACgkQ6BQxb7O0pWA8SACfXmpUsJcXT5espj71OYpeaj5b JJwAoL10ud3q1f51A5Ij4lgqMeZGnHlj =ZmOl -----END PGP SIGNATURE----- From cjw at ncf.ca Sun Jan 11 22:50:38 2015 From: cjw at ncf.ca (cjw) Date: Sun, 11 Jan 2015 22:50:38 -0500 Subject: [Numpy-discussion] ANN: Scipy 0.15.0 release In-Reply-To: <54B2B7F7.4030708@iki.fi> References: <54B2B7F7.4030708@iki.fi> Message-ID: <54B3448E.3020100@ncf.ca> Paul, Wot, no AMD64? Colin W. On 11-Jan-15 12:50 PM, Paul Virtanen wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Dear all, > > We are pleased to announce the Scipy 0.15.0 release. > > The 0.15.0 release contains bugfixes and new features, most important > of which are mentioned in the excerpt from the release notes below. > > Source tarballs, binaries, and full release notes are available at > https://sourceforge.net/projects/scipy/files/scipy/0.15.0/ > > Best regards, > Pauli Virtanen > > > ========================== > SciPy 0.15.0 Release Notes > ========================== > > SciPy 0.15.0 is the culmination of 6 months of hard work. It contains > several new features, numerous bug-fixes, improved test coverage and > better documentation. There have been a number of deprecations and > API changes in this release, which are documented below. All users > are encouraged to upgrade to this release, as there are a large number > of bug-fixes and optimizations. Moreover, our development attention > will now shift to bug-fix releases on the 0.16.x branch, and on adding > new features on the master branch. > > This release requires Python 2.6, 2.7 or 3.2-3.4 and NumPy 1.5.1 or > greater. > > > New features > ============ > > Linear Programming Interface > - ---------------------------- > > The new function `scipy.optimize.linprog` provides a generic > linear programming similar to the way `scipy.optimize.minimize` > provides a generic interface to nonlinear programming optimizers. > Currently the only method supported is *simplex* which provides > a two-phase, dense-matrix-based simplex algorithm. Callbacks > functions are supported, allowing the user to monitor the progress > of the algorithm. > > Differential evolution, a global optimizer > - ------------------------------------------ > > A new `scipy.optimize.differential_evolution` function has been added > to the > ``optimize`` module. Differential Evolution is an algorithm used for > finding > the global minimum of multivariate functions. It is stochastic in > nature (does > not use gradient methods), and can search large areas of candidate > space, but > often requires larger numbers of function evaluations than conventional > gradient based techniques. > > ``scipy.signal`` improvements > - ----------------------------- > > The function `scipy.signal.max_len_seq` was added, which computes a > Maximum > Length Sequence (MLS) signal. > > ``scipy.integrate`` improvements > - -------------------------------- > > It is now possible to use `scipy.integrate` routines to integrate > multivariate ctypes functions, thus avoiding callbacks to Python and > providing better performance. > > ``scipy.linalg`` improvements > - ----------------------------- > > The function `scipy.linalg.orthogonal_procrustes` for solving the > procrustes > linear algebra problem was added. > > BLAS level 2 functions ``her``, ``syr``, ``her2`` and ``syr2`` are now > wrapped > in ``scipy.linalg``. > > ``scipy.sparse`` improvements > - ----------------------------- > > `scipy.sparse.linalg.svds` can now take a ``LinearOperator`` as its > main input. > > ``scipy.special`` improvements > - ------------------------------ > > Values of ellipsoidal harmonic (i.e. Lame) functions and associated > normalization constants can be now computed using ``ellip_harm``, > ``ellip_harm_2``, and ``ellip_normal``. > > New convenience functions ``entr``, ``rel_entr`` ``kl_div``, > ``huber``, and ``pseudo_huber`` were added. > > ``scipy.sparse.csgraph`` improvements > - ------------------------------------- > > Routines ``reverse_cuthill_mckee`` and ``maximum_bipartite_matching`` > for computing reorderings of sparse graphs were added. > > ``scipy.stats`` improvements > - ---------------------------- > > Added a Dirichlet multivariate distribution, `scipy.stats.dirichlet`. > > The new function `scipy.stats.median_test` computes Mood's median test. > > The new function `scipy.stats.combine_pvalues` implements Fisher's > and Stouffer's methods for combining p-values. > > `scipy.stats.describe` returns a namedtuple rather than a tuple, allowing > users to access results by index or by name. > > > Deprecated features > =================== > > The `scipy.weave` module is deprecated. It was the only module never > ported > to Python 3.x, and is not recommended to be used for new code - use Cython > instead. In order to support existing code, ``scipy.weave`` has been > packaged > separately: https://github.com/scipy/weave. It is a pure Python > package, and > can easily be installed with ``pip install weave``. > > `scipy.special.bessel_diff_formula` is deprecated. It is a private > function, > and therefore will be removed from the public API in a following release. > > ``scipy.stats.nanmean``, ``nanmedian`` and ``nanstd`` functions are > deprecated > in favor of their numpy equivalents. > > > Backwards incompatible changes > ============================== > > scipy.ndimage > - ------------- > > The functions `scipy.ndimage.minimum_positions`, > `scipy.ndimage.maximum_positions`` and `scipy.ndimage.extrema` return > positions as ints instead of floats. > > scipy.integrate > - --------------- > > The format of banded Jacobians in `scipy.integrate.ode` solvers is > changed. Note that the previous documentation of this feature was > erroneous. > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1 > > iEYEARECAAYFAlSyt/cACgkQ6BQxb7O0pWA8SACfXmpUsJcXT5espj71OYpeaj5b > JJwAoL10ud3q1f51A5Ij4lgqMeZGnHlj > =ZmOl > -----END PGP SIGNATURE----- > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From scopatz at gmail.com Sun Jan 11 23:30:23 2015 From: scopatz at gmail.com (Anthony Scopatz) Date: Sun, 11 Jan 2015 22:30:23 -0600 Subject: [Numpy-discussion] ANN: Scipy 0.15.0 release In-Reply-To: <54B3448E.3020100@ncf.ca> References: <54B2B7F7.4030708@iki.fi> <54B3448E.3020100@ncf.ca> Message-ID: Congrats all! On Sun, Jan 11, 2015 at 9:50 PM, cjw wrote: > Paul, > > Wot, no AMD64? > > Colin W. > On 11-Jan-15 12:50 PM, Paul Virtanen wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > Dear all, > > > > We are pleased to announce the Scipy 0.15.0 release. > > > > The 0.15.0 release contains bugfixes and new features, most important > > of which are mentioned in the excerpt from the release notes below. > > > > Source tarballs, binaries, and full release notes are available at > > https://sourceforge.net/projects/scipy/files/scipy/0.15.0/ > > > > Best regards, > > Pauli Virtanen > > > > > > ========================== > > SciPy 0.15.0 Release Notes > > ========================== > > > > SciPy 0.15.0 is the culmination of 6 months of hard work. It contains > > several new features, numerous bug-fixes, improved test coverage and > > better documentation. There have been a number of deprecations and > > API changes in this release, which are documented below. All users > > are encouraged to upgrade to this release, as there are a large number > > of bug-fixes and optimizations. Moreover, our development attention > > will now shift to bug-fix releases on the 0.16.x branch, and on adding > > new features on the master branch. > > > > This release requires Python 2.6, 2.7 or 3.2-3.4 and NumPy 1.5.1 or > > greater. > > > > > > New features > > ============ > > > > Linear Programming Interface > > - ---------------------------- > > > > The new function `scipy.optimize.linprog` provides a generic > > linear programming similar to the way `scipy.optimize.minimize` > > provides a generic interface to nonlinear programming optimizers. > > Currently the only method supported is *simplex* which provides > > a two-phase, dense-matrix-based simplex algorithm. Callbacks > > functions are supported, allowing the user to monitor the progress > > of the algorithm. > > > > Differential evolution, a global optimizer > > - ------------------------------------------ > > > > A new `scipy.optimize.differential_evolution` function has been added > > to the > > ``optimize`` module. Differential Evolution is an algorithm used for > > finding > > the global minimum of multivariate functions. It is stochastic in > > nature (does > > not use gradient methods), and can search large areas of candidate > > space, but > > often requires larger numbers of function evaluations than conventional > > gradient based techniques. > > > > ``scipy.signal`` improvements > > - ----------------------------- > > > > The function `scipy.signal.max_len_seq` was added, which computes a > > Maximum > > Length Sequence (MLS) signal. > > > > ``scipy.integrate`` improvements > > - -------------------------------- > > > > It is now possible to use `scipy.integrate` routines to integrate > > multivariate ctypes functions, thus avoiding callbacks to Python and > > providing better performance. > > > > ``scipy.linalg`` improvements > > - ----------------------------- > > > > The function `scipy.linalg.orthogonal_procrustes` for solving the > > procrustes > > linear algebra problem was added. > > > > BLAS level 2 functions ``her``, ``syr``, ``her2`` and ``syr2`` are now > > wrapped > > in ``scipy.linalg``. > > > > ``scipy.sparse`` improvements > > - ----------------------------- > > > > `scipy.sparse.linalg.svds` can now take a ``LinearOperator`` as its > > main input. > > > > ``scipy.special`` improvements > > - ------------------------------ > > > > Values of ellipsoidal harmonic (i.e. Lame) functions and associated > > normalization constants can be now computed using ``ellip_harm``, > > ``ellip_harm_2``, and ``ellip_normal``. > > > > New convenience functions ``entr``, ``rel_entr`` ``kl_div``, > > ``huber``, and ``pseudo_huber`` were added. > > > > ``scipy.sparse.csgraph`` improvements > > - ------------------------------------- > > > > Routines ``reverse_cuthill_mckee`` and ``maximum_bipartite_matching`` > > for computing reorderings of sparse graphs were added. > > > > ``scipy.stats`` improvements > > - ---------------------------- > > > > Added a Dirichlet multivariate distribution, `scipy.stats.dirichlet`. > > > > The new function `scipy.stats.median_test` computes Mood's median test. > > > > The new function `scipy.stats.combine_pvalues` implements Fisher's > > and Stouffer's methods for combining p-values. > > > > `scipy.stats.describe` returns a namedtuple rather than a tuple, allowing > > users to access results by index or by name. > > > > > > Deprecated features > > =================== > > > > The `scipy.weave` module is deprecated. It was the only module never > > ported > > to Python 3.x, and is not recommended to be used for new code - use > Cython > > instead. In order to support existing code, ``scipy.weave`` has been > > packaged > > separately: https://github.com/scipy/weave. It is a pure Python > > package, and > > can easily be installed with ``pip install weave``. > > > > `scipy.special.bessel_diff_formula` is deprecated. It is a private > > function, > > and therefore will be removed from the public API in a following release. > > > > ``scipy.stats.nanmean``, ``nanmedian`` and ``nanstd`` functions are > > deprecated > > in favor of their numpy equivalents. > > > > > > Backwards incompatible changes > > ============================== > > > > scipy.ndimage > > - ------------- > > > > The functions `scipy.ndimage.minimum_positions`, > > `scipy.ndimage.maximum_positions`` and `scipy.ndimage.extrema` return > > positions as ints instead of floats. > > > > scipy.integrate > > - --------------- > > > > The format of banded Jacobians in `scipy.integrate.ode` solvers is > > changed. Note that the previous documentation of this feature was > > erroneous. > > -----BEGIN PGP SIGNATURE----- > > Version: GnuPG v1 > > > > iEYEARECAAYFAlSyt/cACgkQ6BQxb7O0pWA8SACfXmpUsJcXT5espj71OYpeaj5b > > JJwAoL10ud3q1f51A5Ij4lgqMeZGnHlj > > =ZmOl > > -----END PGP SIGNATURE----- > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Jan 12 02:13:31 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 12 Jan 2015 08:13:31 +0100 Subject: [Numpy-discussion] ANN: Scipy 0.15.0 release In-Reply-To: <54B3448E.3020100@ncf.ca> References: <54B2B7F7.4030708@iki.fi> <54B3448E.3020100@ncf.ca> Message-ID: On Mon, Jan 12, 2015 at 4:50 AM, cjw wrote: > Paul, > > Wot, no AMD64? > Colin, this is well known from previous scipy and numpy releases. It's due to not having a freely available 64-bit compiler chain available at the moment with which we can build official binaries. You can get 64-bit Windows installers of most scientific Python distributions (like Anaconda, Enthough Canopy, WinPython) and for just Scipy from the site of Christoph Gohlke. Your tone on this list is not appreciated by the way, it borders on trolling. If you have a serious question to which you really don't know the answer, please pose it in a less disrespectful way. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Jan 12 07:04:33 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 12 Jan 2015 12:04:33 +0000 Subject: [Numpy-discussion] ANN: Scipy 0.15.0 release In-Reply-To: References: <54B2B7F7.4030708@iki.fi> <54B3448E.3020100@ncf.ca> Message-ID: Hi, On Mon, Jan 12, 2015 at 7:13 AM, Ralf Gommers wrote: > > > On Mon, Jan 12, 2015 at 4:50 AM, cjw wrote: >> >> Paul, >> >> Wot, no AMD64? > > > Colin, this is well known from previous scipy and numpy releases. It's due > to not having a freely available 64-bit compiler chain available at the > moment with which we can build official binaries. You can get 64-bit Windows > installers of most scientific Python distributions (like Anaconda, Enthough > Canopy, WinPython) and for just Scipy from the site of Christoph Gohlke. > > Your tone on this list is not appreciated by the way, it borders on > trolling. If you have a serious question to which you really don't know the > answer, please pose it in a less disrespectful way. Ralf - honestly I think it's best not to use the term 'trolling' under any circumstances - it can be a heavy weapon [1], although it's perfectly reasonable to say to Colin that you would find it helpful if he used a different tone. In this case, I couldn't be sure whether Colin meant his email to be light-hearted or not. Colin - just to add to Ralf's reply on the 64-bit issue - here are a few links: * Stackoverflow answer with some references [2] * Numpy mailing list question about 64-bit installers in 2011 [3] * Another discussion I started in 2013 [4] * Commentary on problems for Numpy etc on Windows [5] * Our current best hope : Carl Kleffner's mingw-w64 build chain [6] If Carl K is listening here - Carl - what's the current best way to help? Cheers, Matthew [1] http://nipyworld.blogspot.co.uk/2012/06/define-troll.html [2] http://stackoverflow.com/questions/11200137/installing-numpy-on-64bit-windows-7-with-python-2-7-3 [3] http://comments.gmane.org/gmane.comp.python.numeric.general/42118 [4] http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065339.html [5] https://github.com/numpy/numpy/wiki/Numerical-software-on-Windows [6] https://github.com/numpy/numpy/wiki/Mingw-static-toolchain From ndarray at mac.com Mon Jan 12 13:33:09 2015 From: ndarray at mac.com (Alexander Belopolsky) Date: Mon, 12 Jan 2015 13:33:09 -0500 Subject: [Numpy-discussion] Equality of dtypes does not imply equality of type kinds Message-ID: Consider this (on a 64-bit platform): >>> numpy.dtype('q') == numpy.dtype('l') True but >>> numpy.dtype('q').char == numpy.dtype('l').char False Is that intended? Shouldn't dtype constructor "normalize" 'l' to 'q' (or 'i')? -------------- next part -------------- An HTML attachment was scrubbed... URL: From maniteja.modesty067 at gmail.com Mon Jan 12 14:14:11 2015 From: maniteja.modesty067 at gmail.com (Maniteja Nandana) Date: Tue, 13 Jan 2015 00:44:11 +0530 Subject: [Numpy-discussion] Equality of dtypes does not imply equality of type kinds In-Reply-To: References: Message-ID: Hi, On Tue, Jan 13, 2015 at 12:03 AM, Alexander Belopolsky wrote: > Consider this (on a 64-bit platform): > > >>> numpy.dtype('q') == numpy.dtype('l') > True > > >>> numpy.dtype('q').char == numpy.dtype('l').char > False > > Is that intended? Shouldn't dtype constructor "normalize" 'l' to 'q' (or > 'i')? > > 'q' is defined as NPY_LONGLONGLTR, while 'l' is NPY_LONGLTR [here] . Similar issue was raised on Issue 5426 . Even I am not aware of the exact reason, but hope it helps. Also, >>> numpy.dtype('q').num 9 >>> numpy.dtype('l').char 7 _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Jan 12 20:48:56 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 12 Jan 2015 18:48:56 -0700 Subject: [Numpy-discussion] Equality of dtypes does not imply equality of type kinds In-Reply-To: References: Message-ID: On Mon, Jan 12, 2015 at 12:14 PM, Maniteja Nandana < maniteja.modesty067 at gmail.com> wrote: > Hi, > > On Tue, Jan 13, 2015 at 12:03 AM, Alexander Belopolsky > wrote: > >> Consider this (on a 64-bit platform): >> >> >>> numpy.dtype('q') == numpy.dtype('l') >> True >> >> > >>> numpy.dtype('q').char == numpy.dtype('l').char >> False >> >> Is that intended? Shouldn't dtype constructor "normalize" 'l' to 'q' (or >> 'i')? >> >> > 'q' is defined as NPY_LONGLONGLTR, while 'l' is NPY_LONGLTR [here] > . > Similar issue was raised on Issue 5426 > . Even > I am not aware of the exact reason, but hope it helps. > Also, > > >>> numpy.dtype('q').num > 9 > >>> numpy.dtype('l').char > 7 > Numpy basically has two different type systems. The basic system is based on C types -- int, long, etc. -- and on top of that there is a precision based system. The letters and number versions are C, while the dtype equality is precision. That is to say, in this case C long has the same precision as C long long. That varies depending on the platform, which is one reason the precision nomenclature came in. It can be confusing, and I've often fantasized getting rid of the long type altogether ;) So it isn't exactly intended, but there is a reason... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndarray at mac.com Mon Jan 12 22:23:22 2015 From: ndarray at mac.com (Alexander Belopolsky) Date: Mon, 12 Jan 2015 22:23:22 -0500 Subject: [Numpy-discussion] Equality of dtypes does not imply equality of type kinds In-Reply-To: References: Message-ID: On Mon, Jan 12, 2015 at 8:48 PM, Charles R Harris wrote: > > That is to say, in this case C long has the same precision as C long long. That varies depending on the platform, which is one reason the precision nomenclature came in. It can be confusing, and I've often fantasized getting rid of the long type altogether ;) So it isn't exactly intended, but there is a reason... It is also confusing that numpy has two constructors that produce 32-bit integers on 32-bit platforms and 64-bit integers on 64-bit platforms, but neither of these constructors is called "long". Instead, they are called numpy.int_ and numpy.intp. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Jan 13 01:20:50 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 13 Jan 2015 06:20:50 +0000 Subject: [Numpy-discussion] Equality of dtypes does not imply equality of type kinds In-Reply-To: References: Message-ID: <8ecxoh.ni3qys.2tbkqt-qmf@sipsolutions.net> On Tue Jan 13 04:23:22 2015 GMT+0100, Alexander Belopolsky wrote: > On Mon, Jan 12, 2015 at 8:48 PM, Charles R Harris > wrote: > > > > That is to say, in this case C long has the same precision as C long > long. That varies depending on the platform, which is one reason the > precision nomenclature came in. It can be confusing, and I've often > fantasized getting rid of the long type altogether ;) So it isn't exactly > intended, but there is a reason... > > > It is also confusing that numpy has two constructors that produce 32-bit > integers on 32-bit platforms and 64-bit integers on 64-bit platforms, but > neither of these constructors is called "long". Instead, they are called > numpy.int_ and numpy.intp. > There is np.long. int_ is python int which is long. intp is actually ssizet. - Sebastian From jaime.frio at gmail.com Tue Jan 13 10:15:17 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Tue, 13 Jan 2015 07:15:17 -0800 Subject: [Numpy-discussion] linspace handling of extra return Message-ID: While working on something else, I realized that linspace is not handling requests for returning the sampling spacing consistently: >>> np.linspace(0, 1, 3, retstep=True) (array([ 0. , 0.5, 1. ]), 0.5) >>> np.linspace(0, 1, 1, retstep=True) array([ 0.]) >>> np.linspace(0, 1, 0, retstep=True) array([], dtype=float64) Basically, retstep is ignored if the number of samples is 0 or 1. One could argue that it makes sense, because those sequences do not have a spacing defined. But at the very least it should be documented as doing so, and the following inconsistency removed: >>> np.linspace(0, 1, 1, endpoint=True, retstep=True) array([ 0.]) >>> np.linspace(0, 1, 1, endpoint=False, retstep=True) (array([ 0.]), 1.0) I am personally inclined to think that if a step is requested, then a step should be returned, and if it cannot be calculated in a reasonable manner, then a placeholder such as None, nan, 0 or stop - start should be returned. What does the collective wisdom think is the best approach for this? Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Tue Jan 13 10:23:50 2015 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 13 Jan 2015 10:23:50 -0500 Subject: [Numpy-discussion] linspace handling of extra return In-Reply-To: References: Message-ID: Oh, wow. I never noticed that before. Yeah, if I state that retstep=True, then I am coding my handling to expect two values to be returned, not 1. I think it should be nan, but I could also agree with zero. It should definitely remain a float value, though. Cheers! Ben Root On Tue, Jan 13, 2015 at 10:15 AM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > While working on something else, I realized that linspace is not handling > requests for returning the sampling spacing consistently: > > >>> np.linspace(0, 1, 3, retstep=True) > (array([ 0. , 0.5, 1. ]), 0.5) > >>> np.linspace(0, 1, 1, retstep=True) > array([ 0.]) > >>> np.linspace(0, 1, 0, retstep=True) > array([], dtype=float64) > > Basically, retstep is ignored if the number of samples is 0 or 1. One > could argue that it makes sense, because those sequences do not have a > spacing defined. But at the very least it should be documented as doing so, > and the following inconsistency removed: > > >>> np.linspace(0, 1, 1, endpoint=True, retstep=True) > array([ 0.]) > >>> np.linspace(0, 1, 1, endpoint=False, retstep=True) > (array([ 0.]), 1.0) > > I am personally inclined to think that if a step is requested, then a step > should be returned, and if it cannot be calculated in a reasonable manner, > then a placeholder such as None, nan, 0 or stop - start should be returned. > > What does the collective wisdom think is the best approach for this? > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Tue Jan 13 17:57:24 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Tue, 13 Jan 2015 14:57:24 -0800 Subject: [Numpy-discussion] linspace handling of extra return In-Reply-To: References: Message-ID: On Tue, Jan 13, 2015 at 7:23 AM, Benjamin Root wrote: > Oh, wow. I never noticed that before. Yeah, if I state that retstep=True, > then I am coding my handling to expect two values to be returned, not 1. I > think it should be nan, but I could also agree with zero. It should > definitely remain a float value, though. > NaN it is then: the change and supporting tests are now part of gh-5446. Jaime > > Cheers! > Ben Root > > On Tue, Jan 13, 2015 at 10:15 AM, Jaime Fern?ndez del R?o < > jaime.frio at gmail.com> wrote: > >> While working on something else, I realized that linspace is not handling >> requests for returning the sampling spacing consistently: >> >> >>> np.linspace(0, 1, 3, retstep=True) >> (array([ 0. , 0.5, 1. ]), 0.5) >> >>> np.linspace(0, 1, 1, retstep=True) >> array([ 0.]) >> >>> np.linspace(0, 1, 0, retstep=True) >> array([], dtype=float64) >> >> Basically, retstep is ignored if the number of samples is 0 or 1. One >> could argue that it makes sense, because those sequences do not have a >> spacing defined. But at the very least it should be documented as doing so, >> and the following inconsistency removed: >> >> >>> np.linspace(0, 1, 1, endpoint=True, retstep=True) >> array([ 0.]) >> >>> np.linspace(0, 1, 1, endpoint=False, retstep=True) >> (array([ 0.]), 1.0) >> >> I am personally inclined to think that if a step is requested, then a >> step should be returned, and if it cannot be calculated in a reasonable >> manner, then a placeholder such as None, nan, 0 or stop - start should be >> returned. >> >> What does the collective wisdom think is the best approach for this? >> >> Jaime >> >> -- >> (\__/) >> ( O.o) >> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes >> de dominaci?n mundial. >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Jan 13 17:56:56 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 13 Jan 2015 14:56:56 -0800 Subject: [Numpy-discussion] Equality of dtypes does not imply equality of type kinds In-Reply-To: References: Message-ID: On Mon, Jan 12, 2015 at 7:23 PM, Alexander Belopolsky wrote: > > On Mon, Jan 12, 2015 at 8:48 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > > I've often fantasized getting rid of the long type altogether ;) So it > isn't exactly intended, but there is a reason... > > > It is also confusing that numpy has two constructors that produce 32-bit > integers on 32-bit platforms and 64-bit integers on 64-bit platforms, but > neither of these constructors is called "long". Instead, they are called > numpy.int_ and numpy.intp. > I'm pretty sure that numpy.int_ will produce a 32 bit type in Windows64 -- because a long on Windows64 is 32 bit (at least with the MS compiler). Which sucks, I'm pretty amazed that python went with "platformlong" for it's int, rather than "32 bit int" or "64 bit int". Sigh. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Jan 13 17:58:47 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 13 Jan 2015 14:58:47 -0800 Subject: [Numpy-discussion] linspace handling of extra return In-Reply-To: References: Message-ID: On Tue, Jan 13, 2015 at 7:23 AM, Benjamin Root wrote: > Oh, wow. I never noticed that before. Yeah, if I state that retstep=True, > then I am coding my handling to expect two values to be returned, not 1. I > think it should be nan, but I could also agree with zero. It should > definitely remain a float value, though. > How about a ValueError? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Tue Jan 13 21:02:40 2015 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 13 Jan 2015 21:02:40 -0500 Subject: [Numpy-discussion] linspace handling of extra return In-Reply-To: References: Message-ID: So, raise a ValueError when it used to return (mostly) correct results? For some reason, I don't think people would appreciate that. Ben Root On Jan 13, 2015 5:59 PM, "Chris Barker" wrote: > On Tue, Jan 13, 2015 at 7:23 AM, Benjamin Root wrote: > >> Oh, wow. I never noticed that before. Yeah, if I state that retstep=True, >> then I am coding my handling to expect two values to be returned, not 1. I >> think it should be nan, but I could also agree with zero. It should >> definitely remain a float value, though. >> > > How about a ValueError? > > -CHB > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Jan 14 14:11:58 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 14 Jan 2015 12:11:58 -0700 Subject: [Numpy-discussion] linspace handling of extra return In-Reply-To: References: Message-ID: On Tue, Jan 13, 2015 at 7:02 PM, Benjamin Root wrote: > So, raise a ValueError when it used to return (mostly) correct results? > For some reason, I don't think people would appreciate that. > > Ben Root > On Jan 13, 2015 5:59 PM, "Chris Barker" wrote: > >> On Tue, Jan 13, 2015 at 7:23 AM, Benjamin Root wrote: >> >>> Oh, wow. I never noticed that before. Yeah, if I state that >>> retstep=True, then I am coding my handling to expect two values to be >>> returned, not 1. I think it should be nan, but I could also agree with >>> zero. It should definitely remain a float value, though. >>> >> >> How about a ValueError? >> >> -CHB >> >> > How about raising ValueError if num < 0 or num == 1, endpoint=True, and start != stop? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Jan 14 15:43:58 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 14 Jan 2015 12:43:58 -0800 Subject: [Numpy-discussion] linspace handling of extra return In-Reply-To: References: Message-ID: On Tue, Jan 13, 2015 at 6:02 PM, Benjamin Root wrote: > So, raise a ValueError when it used to return (mostly) correct results? > my understanding is that it was NOT returning mostly correct results, it was returning completely different results for those special values: a 2-tuple in most cases, a single array if there delta didn't make sense. That may be correct, but it's not a result ;-) That being said, I'm sure some folks have written work-arounds that would break if this were changed. A bug is always incorporated in someones workflow. http://xkcd.com/1172/ Though if you do have a work-around for when the step is not returned, it will likely break if suddenly zero or NaN is returned, as well. So I'm not sure there is a fully backward compatible fix. -CHB > For some reason, I don't think people would appreciate that. > > Ben Root > On Jan 13, 2015 5:59 PM, "Chris Barker" wrote: > >> On Tue, Jan 13, 2015 at 7:23 AM, Benjamin Root wrote: >> >>> Oh, wow. I never noticed that before. Yeah, if I state that >>> retstep=True, then I am coding my handling to expect two values to be >>> returned, not 1. I think it should be nan, but I could also agree with >>> zero. It should definitely remain a float value, though. >>> >> >> How about a ValueError? >> >> -CHB >> >> -- >> >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R (206) 526-6959 voice >> 7600 Sand Point Way NE (206) 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> Chris.Barker at noaa.gov >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Wed Jan 14 18:08:20 2015 From: allanhaldane at gmail.com (Allan Haldane) Date: Wed, 14 Jan 2015 18:08:20 -0500 Subject: [Numpy-discussion] proposed change to recarray access Message-ID: <54B6F6E4.2000700@gmail.com> Hello all, I've submitted a pull request on github which changes how string values in recarrays are returned, which may break old code. https://github.com/numpy/numpy/pull/5454 See also: https://github.com/numpy/numpy/issues/3993 Previously, recarray fields of type 'S' or 'U' (ie, strings) would be returned as chararrays when accessed by attribute, but ndarrays when accessed by indexing: >>> arr = np.array([('abc ', 1), ('abc', 2)], dtype=[('str', 'S4'), ('id', int)]) >>> arr = arr.view(np.recarray) >>> type(arr.str) numpy.core.defchararray.chararray >>> type(arr['str']) numpy.core.records.recarray Chararray is deprecated, and furthermore this led to bugs in my code since chararrays trim trailing whitespace but but ndarrays do not (and I was not aware of conversion to chararray). For example: >>> arr.str[0] == arr.str[1] True >>> arr['str'][0] == arr['str'][1] False In the pull request I have changed recarray attribute access so ndarrays are always returned. I think this is a sensible thing to do but it may break code which depends on chararray features (including the trimmed whitespace). Does this sound reasonable? Best, Allan From maniteja.modesty067 at gmail.com Thu Jan 15 11:44:36 2015 From: maniteja.modesty067 at gmail.com (Maniteja Nandana) Date: Thu, 15 Jan 2015 22:14:36 +0530 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: <54A968C6.1040502@fastmail.net> Message-ID: Hello everyone, I just wanted to highlight the point made by Charles, it would be great if he would clarify any mistakes in the points that I put forward. Quoting the documentation, In versions of NumPy prior to 1.7, this function always returned a new,independent array containing a copy of the values in the diagonal. In NumPy 1.7 and 1.8, it continues to return a copy of the diagonal,but depending on this fact is deprecated. Writing to the resulting array continues to work as it used to, but a FutureWarning is issued. In NumPy 1.9 it returns a read-only view on the original array. Attempting to write to the resulting array will produce an error. In NumPy 1.10, it will return a read/write view, Writing to the returned array will alter your original array. Though the expected behaviour has its pros and cons,the points put forward are : 1. revert the changes so that *PyArray_Diagonal *returns a *copy.* 2. introduce new API function *PyArray_Diagonal2, *which has a *copy *argument, so that copy or view can be returned. 3. if a *view* is to be returned, its *write-ability *depends on whether the *input* is writeable. 4. implement *PyArray_Diagonal *in terms of the new function, thought the default value of *copy *is unsure. 5. Raise a *FutureWarning*, when trying to write to the result. 6. add *copy *argument to the *diagonal *function and method, updating the function in *methods.c *and *fromnumeric.py, *probably in other places also. 7. Also update the release notes and documentation. I would love to do the PR once a decision is reached. Cheers, N.Maniteja. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Jan 15 12:01:14 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 15 Jan 2015 18:01:14 +0100 Subject: [Numpy-discussion] Memory allocation of arrays and tracemalloc Message-ID: <20150115180114.3119bbd5@fsol> Hello, I see that the PyDataMem_* APIs call malloc()/free()/etc. directly, instead of going through PyMem_Malloc, etc. This means the memory allocated by those APIs won't be seen by tracemalloc. Is it deliberate? Am I missing something? Regards Antoine. From charlesr.harris at gmail.com Thu Jan 15 12:11:07 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 15 Jan 2015 10:11:07 -0700 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: <54A968C6.1040502@fastmail.net> Message-ID: On Thu, Jan 15, 2015 at 9:44 AM, Maniteja Nandana < maniteja.modesty067 at gmail.com> wrote: > Hello everyone, > > I just wanted to highlight the point made by Charles, it would be great if > he would clarify any mistakes in the points that I put forward. > > Quoting the documentation, > > In versions of NumPy prior to 1.7, this function always returned a new,independent array containing a copy of the values in the diagonal. > > In NumPy 1.7 and 1.8, it continues to return a copy of the diagonal,but depending on this fact is deprecated. Writing to the resulting array continues to work as it used to, but a FutureWarning is issued. > > In NumPy 1.9 it returns a read-only view on the original array. Attempting to write to the resulting array will produce an error. > > In NumPy 1.10, it will return a read/write view, Writing to the returned array will alter your original array. > > Though the expected behaviour has its pros and cons,the points put forward are : > > > 1. revert the changes so that *PyArray_Diagonal *returns a *copy.* > 2. introduce new API function *PyArray_Diagonal2, *which has a *copy *argument, so that copy or view can be returned. > 3. if a *view* is to be returned, its *write-ability *depends on whether the *input* is writeable. > 4. implement *PyArray_Diagonal *in terms of the new function, thought the default value of *copy *is unsure. > > copy=True > > 1. Raise a *FutureWarning*, when trying to write to the result. > > Old function should behave exactly as before, returning a writable copy. > > 1. add *copy *argument to the *diagonal *function and method, updating the function in *methods.c *and *fromnumeric.py, *probably in other places also. > 2. Also update the release notes and documentation. > > I would love to do the PR once a decision is reached. > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Jan 15 12:24:13 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 15 Jan 2015 17:24:13 +0000 Subject: [Numpy-discussion] Memory allocation of arrays and tracemalloc In-Reply-To: <20150115180114.3119bbd5@fsol> References: <20150115180114.3119bbd5@fsol> Message-ID: On Thu, Jan 15, 2015 at 5:01 PM, Antoine Pitrou wrote: > > Hello, > > I see that the PyDataMem_* APIs call malloc()/free()/etc. directly, > instead of going through PyMem_Malloc, etc. This means the memory > allocated by those APIs won't be seen by tracemalloc. Is it deliberate? > Am I missing something? There are two reasons: 1) We need PyMem_Calloc, which doesn't exist in any released version of Python. It *will* exist in 3.5, though (thanks to Victor Stinner for adding it for us!). 2) We *might* in the future need further API extensions (e.g. for aligned memory, or who knows what), and this makes us nervous. If we start supporting tracemalloc, then that becomes a user-facing feature that we'll be committed to supporting indefinitely, which means that we'll be locked into using Python's allocation API forever. (See https://github.com/numpy/numpy/issues/4663.) So we're reluctant to accept that lock-in without having some sort of escape hatch, e.g. the one described at the end of this message: http://bugs.python.org/issue18835#msg232221 -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From maniteja.modesty067 at gmail.com Thu Jan 15 12:25:18 2015 From: maniteja.modesty067 at gmail.com (Maniteja Nandana) Date: Thu, 15 Jan 2015 22:55:18 +0530 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: <54A968C6.1040502@fastmail.net> Message-ID: Thank you Charles for the corrections. Cheers, N.Maniteja _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion On Thu, Jan 15, 2015 at 10:41 PM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Thu, Jan 15, 2015 at 9:44 AM, Maniteja Nandana < > maniteja.modesty067 at gmail.com> wrote: > >> Hello everyone, >> >> I just wanted to highlight the point made by Charles, it would be great >> if he would clarify any mistakes in the points that I put forward. >> >> Quoting the documentation, >> >> In versions of NumPy prior to 1.7, this function always returned a new,independent array containing a copy of the values in the diagonal. >> >> In NumPy 1.7 and 1.8, it continues to return a copy of the diagonal,but depending on this fact is deprecated. Writing to the resulting array continues to work as it used to, but a FutureWarning is issued. >> >> In NumPy 1.9 it returns a read-only view on the original array. Attempting to write to the resulting array will produce an error. >> >> In NumPy 1.10, it will return a read/write view, Writing to the returned array will alter your original array. >> >> Though the expected behaviour has its pros and cons,the points put forward are : >> >> >> 1. revert the changes so that *PyArray_Diagonal *returns a *copy.* >> 2. introduce new API function *PyArray_Diagonal2, *which has a *copy *argument, so that copy or view can be returned. >> 3. if a *view* is to be returned, its *write-ability *depends on whether the *input* is writeable. >> 4. implement *PyArray_Diagonal *in terms of the new function, thought the default value of *copy *is unsure. >> >> copy=True > >> >> 1. Raise a *FutureWarning*, when trying to write to the result. >> >> Old function should behave exactly as before, returning a writable copy. > >> >> 1. add *copy *argument to the *diagonal *function and method, updating the function in *methods.c *and *fromnumeric.py, *probably in other places also. >> 2. Also update the release notes and documentation. >> >> I would love to do the PR once a decision is reached. >> >> > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dgasmith at icloud.com Thu Jan 15 18:30:28 2015 From: dgasmith at icloud.com (Daniel Smith) Date: Thu, 15 Jan 2015 17:30:28 -0600 Subject: [Numpy-discussion] Optimizing numpy's einsum expression (again) Message-ID: <0D323EDF-1B88-4638-AF78-67824D478FB8@icloud.com> Hello everyone, I originally brought an optimized einsum routine forward a few weeks back that attempts to contract numpy arrays together in an optimal way. This can greatly reduce the scaling and overall cost of the einsum expression for the cost of a few intermediate arrays. The current version (and more details) can be found here: https://github.com/dgasmith/opt_einsum I think this routine is close to a finalized version, but there are two problems which I would like the community to weigh in on: Memory usage- Currently the routine only considers the maximum size of intermediates created rather than cumulative memory usage. If we only use np.einsum it is straightforward to calculate cumulative memory usage as einsum does not require any copies of inputs to be made; however, if we attempt to use a vendor BLAS the memory usage can becomes quite complex. For example, the np.tensordot routine always forces a copy for ndarrays because it uses ndarray.transpose(?).reshape(?). A more memory-conscious way to do this is to first try and do ndarray.reshape(?).T which does not copy the data and numpy can just pass a transpose flag to the vendor BLAS. The caveat here is that the summed indices must be in the correct order- if not a copy is required. Maximum array size is usually close to the total overhead of the opt_einsum function, but can occasionally be 2-5 times this size. I see the following ways forward: Ignore cumulative memory and stick with maximum array size as the limiting memory factor. Implement logic to figure out if the input arrays needs to be copied to use BLAS, compute the extra memory required, and add an extra dimension to the optimization algorithms (use BLAS or do not use BLAS at each step). Some of this is already done, but may get fairly complex. Build an in-place nd-transpose algorithm. Cut out BLAS entirely. Keeping in mind that vendor BLAS can be orders of magnitude faster than a pure einsum implementation, especially if the BLAS threading is used. Path algorithms- There are two algorithms ?optimal? (a brute force algorithm, scales like N!) and ?opportunistic? (a greedy algorithm, scales like N^3). The optimal path can take seconds to minutes to calculate for a 7-10 term expression while the opportunistic path takes microseconds even for 20+ term expressions. The opportunistic algorithm works surprisingly well and appears to obtain the correct scaling in all test cases that I can think of. Both algorithms use the maximum array size as a sieve- this is very beneficial from several aspects. The problem occurs when a much needed intermediate cannot be created due to this limit- on occasions not making this intermediate can have slowdowns of orders of magnitude even for small systems. This leads to certain (and sometimes unexpected) performance walls. Possible solutions: Warn the user if the ratio between an unlimited memory solution and a limited memory solution becomes large. Do not worry about it. Thank you for your time, -Daniel Smith -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Fri Jan 16 00:24:00 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Thu, 15 Jan 2015 21:24:00 -0800 Subject: [Numpy-discussion] Sorting refactor Message-ID: Hi all, I have been taking a deep look at the sorting functionality in numpy, and I think it could use a face lift in the form of a big code refactor, to get rid of some of the ugliness in the code and make it easier to maintain. What I have in mind basically amounts to: 1. Refactor _new_argsortlike to get rid of code duplication (there are two branches, one with buffering, one without, virtually identical, that could be merged into a single one). 2. Modify _new_argsortlike so that it can properly handle byte-swapped inputs of any dtype, see gh-5441. Add proper handling of types with references, in preparation for the rest of changes. 3. Add three functions to the npy_sort library: npy_aquicksort, npy_aheapsort, npy_amergesort, with a signature compatible with PyArray_ArgSortFunc , i.e. (char* start, npy_intp* result, npy_intp length, PyArrayObject *arr). These turn out to be almost identical to the string and unicode sort functions, but using the dtype's compare function to handle comparisons. 4. Modify PyArray_ArgSort (and PyArray_ArgPartition) to always call _new_argsortlike, even when there is no type specific argsort function, by using the newly added npy_axxx functions. This simplifies PyArray_ArgSort a lot, and gets rid of some of the global variable ugliness in the current code. And makes argsorting over non-contiguous axis more memory efficient. 5. Refactor _new_sortlike similarly to _new_argsortlike 6. Modify the npy_quicksort, npy_mergesort and npy_heapsort functions in npy_sort to have a signature compatible with PyArray_SortFunc, i.e. (char* start, npy_intp length, PyArrayObject *arr). npy_quicksort will no longer rely on libc's qsort, but be very similar to the string or unicode quicksort functions 7. Modify PyArray_Sort (and PyArray_Partition) to always call _new_sortlike, similarly to what was done with PyArray_ArgSort. This allows completing the removal of the remaining global variable ugliness, as well as similar benefits as for argsort before. This changes will make it easier for me to add a Timsort generic type function to numpy's arsenal of sorting routines. And I think they have value by cleaning the source code on their own. So my questions, mostly to the poor souls that will have to code review changes to several hundred lines of code: 1. Does this make sense, or is it better left alone? A subset of 1, 2 and 5 are a must to address the issues in gh-5441, the rest could arguably be left as is. 2. Would you rather see it submitted as one ginormous PR? Or split into 4 or 5 incremental ones? Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Fri Jan 16 04:54:32 2015 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Fri, 16 Jan 2015 10:54:32 +0100 Subject: [Numpy-discussion] Optimizing numpy's einsum expression (again) In-Reply-To: <0D323EDF-1B88-4638-AF78-67824D478FB8@icloud.com> References: <0D323EDF-1B88-4638-AF78-67824D478FB8@icloud.com> Message-ID: Thanks for taking the time to think about this; good work. Personally, I don't think a factor 5 memory overhead is much to sweat over. The most complex einsum I have ever needed in a production environment was 5/6 terms, and for what this anecdote is worth, speed was a far bigger concern to me than memory. On Fri, Jan 16, 2015 at 12:30 AM, Daniel Smith wrote: > Hello everyone, > I originally brought an optimized einsum routine forward a few weeks back > that attempts to contract numpy arrays together in an optimal way. This can > greatly reduce the scaling and overall cost of the einsum expression for > the cost of a few intermediate arrays. The current version (and more > details) can be found here: > https://github.com/dgasmith/opt_einsum > > I think this routine is close to a finalized version, but there are two > problems which I would like the community to weigh in on: > > Memory usage- > Currently the routine only considers the maximum size of intermediates > created rather than cumulative memory usage. If we only use np.einsum it is > straightforward to calculate cumulative memory usage as einsum does not > require any copies of inputs to be made; however, if we attempt to use a > vendor BLAS the memory usage can becomes quite complex. For example, the > np.tensordot routine always forces a copy for ndarrays because it uses > ndarray.transpose(?).reshape(?). A more memory-conscious way to do this is > to first try and do ndarray.reshape(?).T which does not copy the data and > numpy can just pass a transpose flag to the vendor BLAS. The caveat here is > that the summed indices must be in the correct order- if not a copy is > required. Maximum array size is usually close to the total overhead of the > opt_einsum function, but can occasionally be 2-5 times this size. I see the > following ways forward: > > - Ignore cumulative memory and stick with maximum array size as the > limiting memory factor. > - Implement logic to figure out if the input arrays needs to be copied > to use BLAS, compute the extra memory required, and add an extra dimension > to the optimization algorithms (use BLAS or do not use BLAS at each step). > Some of this is already done, but may get fairly complex. > - Build an in-place nd-transpose algorithm. > - Cut out BLAS entirely. Keeping in mind that vendor BLAS can be > orders of magnitude faster than a pure einsum implementation, especially if > the BLAS threading is used. > > > Path algorithms- > There are two algorithms ?optimal? (a brute force algorithm, scales like > N!) and ?opportunistic? (a greedy algorithm, scales like N^3). The optimal > path can take seconds to minutes to calculate for a 7-10 term expression > while the opportunistic path takes microseconds even for 20+ term > expressions. The opportunistic algorithm works surprisingly well and > appears to obtain the correct scaling in all test cases that I can think > of. Both algorithms use the maximum array size as a sieve- this is very > beneficial from several aspects. The problem occurs when a much needed > intermediate cannot be created due to this limit- on occasions not making > this intermediate can have slowdowns of orders of magnitude even for small > systems. This leads to certain (and sometimes unexpected) performance > walls. Possible solutions: > > - Warn the user if the ratio between an unlimited memory solution and > a limited memory solution becomes large. > - Do not worry about it. > > > Thank you for your time, > -Daniel Smith > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.hirschfeld at gmail.com Fri Jan 16 05:54:41 2015 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Fri, 16 Jan 2015 10:54:41 +0000 (UTC) Subject: [Numpy-discussion] Optimizing numpy's einsum expression (again) References: <0D323EDF-1B88-4638-AF78-67824D478FB8@icloud.com> Message-ID: Daniel Smith icloud.com> writes: > > Hello everyone,I originally brought an optimized einsum routine forward a few weeks back that attempts to contract numpy arrays together in an optimal way. This can greatly reduce the scaling and overall cost of the einsum expression for the cost of a few intermediate arrays. The current version (and more details) can be found here: > https://github.com/dgasmith/opt_einsum > > I think this routine is close to a finalized version, but there are two problems which I would like the community to weigh in on: > > Thank you for your time, > > > -Daniel Smith > I wasn't aware of this work, but it's very interesting to me as a user of einsum whose principal reason for doing so is speed. Even though I use it on largish arrays I'm only concerned with the performance as I'm on x64 with plenty of memory even were it to require temporaries 5x the original size. I don't use einsum that much because I've noticed the performance can be very problem dependant so I've always profiled it to check. Hopefully this work will make the performance more consistent, allowing it to be used more generally throughout my code. Thanks, Dave * An anecdotal example from a user only. From larsmans at gmail.com Fri Jan 16 06:33:32 2015 From: larsmans at gmail.com (Lars Buitinck) Date: Fri, 16 Jan 2015 12:33:32 +0100 Subject: [Numpy-discussion] Sorting refactor Message-ID: 2015-01-16 11:55 GMT+01:00 : > Message: 2 > Date: Thu, 15 Jan 2015 21:24:00 -0800 > From: Jaime Fern?ndez del R?o > Subject: [Numpy-discussion] Sorting refactor > To: Discussion of Numerical Python > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > This changes will make it easier for me to add a Timsort generic type > function to numpy's arsenal of sorting routines. And I think they have > value by cleaning the source code on their own. Yes, they do. I've been looking at the sorting functions as well and I've found the following: * The code is generally hard to read because it prefers pointer over indices. I'm wondering if it would get slower using indices. The closer these algorithms are to the textbook, the easier to insert fancy optimizations. * The heap sort exploits undefined behavior by using a pointer that points before the start of the array. However, rewriting it to always point within the array made it slower. I haven't tried rewriting it using indices. * Quicksort has a quadratic time worst case. I think it should be turned into an introsort [1] for O(n log n) worst case; we have the heapsort needed to do that. * Quicksort is robust to repeated elements, but doesn't exploit them. It can be made to run in linear time if the input array has only O(1) distinct elements [2]. This may come at the expense of some performance on arrays with no repeated elements. * Using optimal sorting networks instead of insertion sort as the base case can speed up quicksort on float arrays by 5-10%, but only if NaNs are moved out of the way first so that comparisons become cheaper [3]. This has consequences for the selection algorithms that I haven't fully worked out yet. * Using Cilk Plus to parallelize merge sort can make it significantly faster than quicksort at the expense of only a few lines of code, but I haven't checked whether Cilk Plus plays nicely with multiprocessing and other parallelism options (remember the trouble with OpenMP-ified OpenBLAS?). This isn't really an answer to your questions, more like a brain dump from someone who's stared at the same code for a while and did some experiments. I'm not saying we should implement all of this, but keep in mind that there are some interesting options besides implementing timsort. [1] https://en.wikipedia.org/wiki/Introsort [2] http://www.sorting-algorithms.com/static/QuicksortIsOptimal.pdf [3] https://github.com/larsmans/numpy/tree/sorting-nets From jtaylor.debian at googlemail.com Fri Jan 16 06:43:43 2015 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 16 Jan 2015 12:43:43 +0100 Subject: [Numpy-discussion] Sorting refactor In-Reply-To: References: Message-ID: <54B8F96F.7090903@googlemail.com> On 16.01.2015 12:33, Lars Buitinck wrote: > 2015-01-16 11:55 GMT+01:00 : >> Message: 2 >> Date: Thu, 15 Jan 2015 21:24:00 -0800 >> From: Jaime Fern?ndez del R?o >> Subject: [Numpy-discussion] Sorting refactor >> To: Discussion of Numerical Python >> Message-ID: >> >> Content-Type: text/plain; charset="utf-8" >> >> This changes will make it easier for me to add a Timsort generic type >> function to numpy's arsenal of sorting routines. And I think they have >> value by cleaning the source code on their own. > > Yes, they do. I've been looking at the sorting functions as well and > I've found the following: > > * The code is generally hard to read because it prefers pointer over > indices. I'm wondering if it would get slower using indices. The > closer these algorithms are to the textbook, the easier to insert > fancy optimizations. > > * The heap sort exploits undefined behavior by using a pointer that > points before the start of the array. However, rewriting it to always > point within the array made it slower. I haven't tried rewriting it > using indices. > > * Quicksort has a quadratic time worst case. I think it should be > turned into an introsort [1] for O(n log n) worst case; we have the > heapsort needed to do that. This probably rarely happens in numeric data, and we do have guaranteed nlog runtime algorithms available. But it also is not costly to do, e.g. the selection code is a introselect instead of a normal quickselect. I'd say not high priority, but if someone wants to do it I don't see why not. > > * Quicksort is robust to repeated elements, but doesn't exploit them. > It can be made to run in linear time if the input array has only O(1) > distinct elements [2]. This may come at the expense of some > performance on arrays with no repeated elements. > > * Using optimal sorting networks instead of insertion sort as the base > case can speed up quicksort on float arrays by 5-10%, but only if NaNs > are moved out of the way first so that comparisons become cheaper [3]. > This has consequences for the selection algorithms that I haven't > fully worked out yet. I was also thinking about this, an advantage of a sorting network is that it can be vectorized to be significantly faster than an insertion sort. Handling NaN's should also be directly possible. The issue is that its probably too much complicated code for only a very small gain. > > * Using Cilk Plus to parallelize merge sort can make it significantly > faster than quicksort at the expense of only a few lines of code, but > I haven't checked whether Cilk Plus plays nicely with multiprocessing > and other parallelism options (remember the trouble with OpenMP-ified > OpenBLAS?). you should also be able to do this with openmp tasks, though it will be a little less efficient as cilk+ has a better scheduler for this type of work. But I assume you will get the same trouble as openmp but that needs testing, also cilk+ in gcc is not really production ready yet, I got lots of crashes when I last tried it (it will be in 5.0 though). > > This isn't really an answer to your questions, more like a brain dump > from someone who's stared at the same code for a while and did some > experiments. I'm not saying we should implement all of this, but keep > in mind that there are some interesting options besides implementing > timsort. > > [1] https://en.wikipedia.org/wiki/Introsort > [2] http://www.sorting-algorithms.com/static/QuicksortIsOptimal.pdf > [3] https://github.com/larsmans/numpy/tree/sorting-nets > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From hoogendoorn.eelco at gmail.com Fri Jan 16 07:15:46 2015 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Fri, 16 Jan 2015 13:15:46 +0100 Subject: [Numpy-discussion] Sorting refactor In-Reply-To: <54B8F96F.7090903@googlemail.com> References: <54B8F96F.7090903@googlemail.com> Message-ID: I don't know if there is a general consensus or guideline on these matters, but I am personally not entirely charmed by the use of behind-the-scenes parallelism, unless explicitly requested. Perhaps an algorithm can be made faster, but often these multicore algorithms are also less efficient, and a less data-dependent way of putting my cores to good use would have been preferable. Potentially, other code could slow down due to cache trashing if too many parallel tasks run in parallel. Id rather be in charge of such matters myself; but I imagine adding a keyword arg for these matters would not be much trouble? On Fri, Jan 16, 2015 at 12:43 PM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > On 16.01.2015 12:33, Lars Buitinck wrote: > > 2015-01-16 11:55 GMT+01:00 : > >> Message: 2 > >> Date: Thu, 15 Jan 2015 21:24:00 -0800 > >> From: Jaime Fern?ndez del R?o > >> Subject: [Numpy-discussion] Sorting refactor > >> To: Discussion of Numerical Python > >> Message-ID: > >> < > CAPOWHWkF6RnWcrGmcwsmq_LO3hShjgBVLsrN19z-MDPe25E2Aw at mail.gmail.com> > >> Content-Type: text/plain; charset="utf-8" > >> > >> This changes will make it easier for me to add a Timsort generic type > >> function to numpy's arsenal of sorting routines. And I think they have > >> value by cleaning the source code on their own. > > > > Yes, they do. I've been looking at the sorting functions as well and > > I've found the following: > > > > * The code is generally hard to read because it prefers pointer over > > indices. I'm wondering if it would get slower using indices. The > > closer these algorithms are to the textbook, the easier to insert > > fancy optimizations. > > > > * The heap sort exploits undefined behavior by using a pointer that > > points before the start of the array. However, rewriting it to always > > point within the array made it slower. I haven't tried rewriting it > > using indices. > > > > * Quicksort has a quadratic time worst case. I think it should be > > turned into an introsort [1] for O(n log n) worst case; we have the > > heapsort needed to do that. > > This probably rarely happens in numeric data, and we do have guaranteed > nlog runtime algorithms available. > But it also is not costly to do, e.g. the selection code is a > introselect instead of a normal quickselect. > I'd say not high priority, but if someone wants to do it I don't see why > not. > > > > > * Quicksort is robust to repeated elements, but doesn't exploit them. > > It can be made to run in linear time if the input array has only O(1) > > distinct elements [2]. This may come at the expense of some > > performance on arrays with no repeated elements. > > > > * Using optimal sorting networks instead of insertion sort as the base > > case can speed up quicksort on float arrays by 5-10%, but only if NaNs > > are moved out of the way first so that comparisons become cheaper [3]. > > This has consequences for the selection algorithms that I haven't > > fully worked out yet. > > I was also thinking about this, an advantage of a sorting network is > that it can be vectorized to be significantly faster than an insertion > sort. Handling NaN's should also be directly possible. > The issue is that its probably too much complicated code for only a very > small gain. > > > > > * Using Cilk Plus to parallelize merge sort can make it significantly > > faster than quicksort at the expense of only a few lines of code, but > > I haven't checked whether Cilk Plus plays nicely with multiprocessing > > and other parallelism options (remember the trouble with OpenMP-ified > > OpenBLAS?). > > you should also be able to do this with openmp tasks, though it will be > a little less efficient as cilk+ has a better scheduler for this type of > work. > But I assume you will get the same trouble as openmp but that needs > testing, also cilk+ in gcc is not really production ready yet, I got > lots of crashes when I last tried it (it will be in 5.0 though). > > > > > > This isn't really an answer to your questions, more like a brain dump > > from someone who's stared at the same code for a while and did some > > experiments. I'm not saying we should implement all of this, but keep > > in mind that there are some interesting options besides implementing > > timsort. > > > > [1] https://en.wikipedia.org/wiki/Introsort > > [2] http://www.sorting-algorithms.com/static/QuicksortIsOptimal.pdf > > [3] https://github.com/larsmans/numpy/tree/sorting-nets > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Jan 16 07:19:25 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 16 Jan 2015 12:19:25 +0000 Subject: [Numpy-discussion] Sorting refactor In-Reply-To: References: Message-ID: Hi, On Fri, Jan 16, 2015 at 5:24 AM, Jaime Fern?ndez del R?o wrote: > Hi all, > > I have been taking a deep look at the sorting functionality in numpy, and I > think it could use a face lift in the form of a big code refactor, to get > rid of some of the ugliness in the code and make it easier to maintain. What > I have in mind basically amounts to: > > Refactor _new_argsortlike to get rid of code duplication (there are two > branches, one with buffering, one without, virtually identical, that could > be merged into a single one). > Modify _new_argsortlike so that it can properly handle byte-swapped inputs > of any dtype, see gh-5441. Add proper handling of types with references, in > preparation for the rest of changes. > Add three functions to the npy_sort library: npy_aquicksort, npy_aheapsort, > npy_amergesort, with a signature compatible with PyArray_ArgSortFunc , i.e. > (char* start, npy_intp* result, npy_intp length, PyArrayObject *arr). These > turn out to be almost identical to the string and unicode sort functions, > but using the dtype's compare function to handle comparisons. > Modify PyArray_ArgSort (and PyArray_ArgPartition) to always call > _new_argsortlike, even when there is no type specific argsort function, by > using the newly added npy_axxx functions. This simplifies PyArray_ArgSort a > lot, and gets rid of some of the global variable ugliness in the current > code. And makes argsorting over non-contiguous axis more memory efficient. > Refactor _new_sortlike similarly to _new_argsortlike > Modify the npy_quicksort, npy_mergesort and npy_heapsort functions in > npy_sort to have a signature compatible with PyArray_SortFunc, i.e. (char* > start, npy_intp length, PyArrayObject *arr). npy_quicksort will no longer > rely on libc's qsort, but be very similar to the string or unicode quicksort > functions > Modify PyArray_Sort (and PyArray_Partition) to always call _new_sortlike, > similarly to what was done with PyArray_ArgSort. This allows completing the > removal of the remaining global variable ugliness, as well as similar > benefits as for argsort before. > > This changes will make it easier for me to add a Timsort generic type > function to numpy's arsenal of sorting routines. And I think they have value > by cleaning the source code on their own. So my questions, mostly to the > poor souls that will have to code review changes to several hundred lines > of code: > > Does this make sense, or is it better left alone? A subset of 1, 2 and 5 are > a must to address the issues in gh-5441, the rest could arguably be left as > is. > Would you rather see it submitted as one ginormous PR? Or split into 4 or 5 > incremental ones? Do you think it would be possible to split this into several PRs, with the initial one being the refactoring, and the subsequent ones being additions to sorting functionality? I'm guessing that the refactoring is something everyone wants (sounds great to me), whereas changes to the sorting needs more specific discussion. Cheers, Matthew From davidmenhur at gmail.com Fri Jan 16 07:28:56 2015 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Fri, 16 Jan 2015 13:28:56 +0100 Subject: [Numpy-discussion] Sorting refactor In-Reply-To: References: <54B8F96F.7090903@googlemail.com> Message-ID: On 16 January 2015 at 13:15, Eelco Hoogendoorn wrote: > Perhaps an algorithm can be made faster, but often these multicore > algorithms are also less efficient, and a less data-dependent way of putting > my cores to good use would have been preferable. Potentially, other code > could slow down due to cache trashing if too many parallel tasks run in > parallel. Id rather be in charge of such matters myself; but I imagine > adding a keyword arg for these matters would not be much trouble? As I understand it, that is where the strength of Cilk+ lies. It does not force parallelisation, just suggests it. The decision to actually spawn parallel is decided at runtime depending on the load of the other cores. /David. From jaime.frio at gmail.com Fri Jan 16 08:29:18 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Fri, 16 Jan 2015 05:29:18 -0800 Subject: [Numpy-discussion] Sorting refactor In-Reply-To: References: Message-ID: On Fri, Jan 16, 2015 at 4:19 AM, Matthew Brett wrote: > Hi, > > On Fri, Jan 16, 2015 at 5:24 AM, Jaime Fern?ndez del R?o > wrote: > > Hi all, > > > > I have been taking a deep look at the sorting functionality in numpy, > and I > > think it could use a face lift in the form of a big code refactor, to get > > rid of some of the ugliness in the code and make it easier to maintain. > What > > I have in mind basically amounts to: > > > > Refactor _new_argsortlike to get rid of code duplication (there are two > > branches, one with buffering, one without, virtually identical, that > could > > be merged into a single one). > > Modify _new_argsortlike so that it can properly handle byte-swapped > inputs > > of any dtype, see gh-5441. Add proper handling of types with references, > in > > preparation for the rest of changes. > > Add three functions to the npy_sort library: npy_aquicksort, > npy_aheapsort, > > npy_amergesort, with a signature compatible with PyArray_ArgSortFunc , > i.e. > > (char* start, npy_intp* result, npy_intp length, PyArrayObject *arr). > These > > turn out to be almost identical to the string and unicode sort functions, > > but using the dtype's compare function to handle comparisons. > > Modify PyArray_ArgSort (and PyArray_ArgPartition) to always call > > _new_argsortlike, even when there is no type specific argsort function, > by > > using the newly added npy_axxx functions. This simplifies > PyArray_ArgSort a > > lot, and gets rid of some of the global variable ugliness in the current > > code. And makes argsorting over non-contiguous axis more memory > efficient. > > Refactor _new_sortlike similarly to _new_argsortlike > > Modify the npy_quicksort, npy_mergesort and npy_heapsort functions in > > npy_sort to have a signature compatible with PyArray_SortFunc, i.e. > (char* > > start, npy_intp length, PyArrayObject *arr). npy_quicksort will no longer > > rely on libc's qsort, but be very similar to the string or unicode > quicksort > > functions > > Modify PyArray_Sort (and PyArray_Partition) to always call _new_sortlike, > > similarly to what was done with PyArray_ArgSort. This allows completing > the > > removal of the remaining global variable ugliness, as well as similar > > benefits as for argsort before. > > > > This changes will make it easier for me to add a Timsort generic type > > function to numpy's arsenal of sorting routines. And I think they have > value > > by cleaning the source code on their own. So my questions, mostly to the > > poor souls that will have to code review changes to several hundred > lines > > of code: > > > > Does this make sense, or is it better left alone? A subset of 1, 2 and 5 > are > > a must to address the issues in gh-5441, the rest could arguably be left > as > > is. > > Would you rather see it submitted as one ginormous PR? Or split into 4 > or 5 > > incremental ones? > > Do you think it would be possible to split this into several PRs, with > the initial one being the refactoring, and the subsequent ones being > additions to sorting functionality? > Just to be clear, nothing in the long list of changes I posted earlier is truly a change in the sorting functionality, except in the case of quicksort (and argquicksort) for generic types, which would no longer rely on qsort, but on our own implementation of it, just like every other sort in numpy. So yes, the refactor PR can precede new functionality. And it can even be split into 3 or 4 incremental PRs, to make the reviewers life easier. Since it is most likely Charles and/or Julian that are going to have to swallow the review pill, I'd like to hear from them how would they like it better. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Fri Jan 16 09:11:29 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Fri, 16 Jan 2015 06:11:29 -0800 Subject: [Numpy-discussion] Sorting refactor In-Reply-To: References: Message-ID: On Fri, Jan 16, 2015 at 3:33 AM, Lars Buitinck wrote: > 2015-01-16 11:55 GMT+01:00 : > > Message: 2 > > Date: Thu, 15 Jan 2015 21:24:00 -0800 > > From: Jaime Fern?ndez del R?o > > Subject: [Numpy-discussion] Sorting refactor > > To: Discussion of Numerical Python > > Message-ID: > > < > CAPOWHWkF6RnWcrGmcwsmq_LO3hShjgBVLsrN19z-MDPe25E2Aw at mail.gmail.com> > > Content-Type: text/plain; charset="utf-8" > > > > This changes will make it easier for me to add a Timsort generic type > > function to numpy's arsenal of sorting routines. And I think they have > > value by cleaning the source code on their own. > > Yes, they do. I've been looking at the sorting functions as well and > I've found the following: > > * The code is generally hard to read because it prefers pointer over > indices. I'm wondering if it would get slower using indices. The > closer these algorithms are to the textbook, the easier to insert > fancy optimizations. > They are harder to read, but so cute to look at! C code just wouldn't feel the same without some magical pointer arithmetic thrown in here and there. ;-) > * The heap sort exploits undefined behavior by using a pointer that > points before the start of the array. However, rewriting it to always > point within the array made it slower. I haven't tried rewriting it > using indices. > > * Quicksort has a quadratic time worst case. I think it should be > turned into an introsort [1] for O(n log n) worst case; we have the > heapsort needed to do that. > > * Quicksort is robust to repeated elements, but doesn't exploit them. > It can be made to run in linear time if the input array has only O(1) > distinct elements [2]. This may come at the expense of some > performance on arrays with no repeated elements. > Java famously changed its library implementation of quicksort to a dual pivot one invented by Vladimir Yaroslavskiy[1], they claim that with substantial performance gains. I tried to implement that for numpy [2], but couldn't get it to work any faster than the current code. * Using optimal sorting networks instead of insertion sort as the base > case can speed up quicksort on float arrays by 5-10%, but only if NaNs > are moved out of the way first so that comparisons become cheaper [3]. > This has consequences for the selection algorithms that I haven't > fully worked out yet. > Even if we stick with selection sort, we should spin it off into an inline smallsort function within the npy_sort library, and have quicksort and mergesort call the same function, instead of each implementing their own. It would make optimizations like the sorting networks easier to implement for all sorts. We could even expose it outside npy_sort, as there are a few places around the code base that have ad-hoc implementations of sorting. > * Using Cilk Plus to parallelize merge sort can make it significantly > faster than quicksort at the expense of only a few lines of code, but > I haven't checked whether Cilk Plus plays nicely with multiprocessing > and other parallelism options (remember the trouble with OpenMP-ified > OpenBLAS?). > > This isn't really an answer to your questions, more like a brain dump > from someone who's stared at the same code for a while and did some > experiments. I'm not saying we should implement all of this, but keep > in mind that there are some interesting options besides implementing > timsort. > Timsort came up in a discussion several months ago, where I proposed adding a mergesorted function (which I have mostly ready, by the way, [3]) to speed-up some operations in arraysetops. I have serious doubts that it will perform comparably to the other sorts unless comparisons are terribly expensive, which they typically aren't in numpy, but it has been an interesting learning exercise so far, and I don't mind taking it all the way. Most of my proposed original changes do not affect the core sorting functionality, just the infrastructure around it. But if we agree that sorting has potential for being an actively developed part of the code base, then cleaning up its surroundings for clarity makes sense, so I'm taking your brain dump as an aye for my proposal. ;-) Jaime [1] http://iaroslavski.narod.ru/quicksort/DualPivotQuicksort.pdf [2] https://github.com/jaimefrio/numpy/commit/a99dd77cda7c4c0f2df1fb17a59c20e19999cd86 [3] https://github.com/jaimefrio/numpy/commit/2f53c99e7ec6d14fd77a29f9d3c1712d5b955079 > > [1] https://en.wikipedia.org/wiki/Introsort > [2] http://www.sorting-algorithms.com/static/QuicksortIsOptimal.pdf > [3] https://github.com/larsmans/numpy/tree/sorting-nets > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From larsmans at gmail.com Fri Jan 16 09:14:24 2015 From: larsmans at gmail.com (Lars Buitinck) Date: Fri, 16 Jan 2015 15:14:24 +0100 Subject: [Numpy-discussion] NumPy-Discussion Digest, Vol 100, Issue 28 Message-ID: 2015-01-16 13:29 GMT+01:00 : > Date: Fri, 16 Jan 2015 12:43:43 +0100 > From: Julian Taylor > Subject: Re: [Numpy-discussion] Sorting refactor > To: Discussion of Numerical Python > Message-ID: <54B8F96F.7090903 at googlemail.com> > Content-Type: text/plain; charset=windows-1252 > > On 16.01.2015 12:33, Lars Buitinck wrote: >> * Quicksort has a quadratic time worst case. I think it should be >> turned into an introsort [1] for O(n log n) worst case; we have the >> heapsort needed to do that. > > This probably rarely happens in numeric data, and we do have guaranteed > nlog runtime algorithms available. It's no more likely or unlikely than in any other type of data (AFAIK), but it's possible for an adversary to DOS any (web server) code that uses np.sort. > I was also thinking about this, an advantage of a sorting network is > that it can be vectorized to be significantly faster than an insertion > sort. Handling NaN's should also be directly possible. Tried that, and it didn't give any speedup, at least not without explicit vector instructions. Just moving the NaNs aside didn't cost anything in my preliminary benchmarks (without sorting nets), the cost of the operation was almost exactly compensated by simpler comparisons. > The issue is that its probably too much complicated code for only a very > small gain. Maybe. The thing is that the selection algorithms are optimized for NaNs and seem to depend on the current comparison code. We'd need distinct _LT and _LT_NONAN for each . The sorting nets themselves aren't complicated, just lengthy. My branch has the length-optimal (not optimally parallel) ones for n <= 16. > But I assume you will get the same trouble as openmp but that needs > testing, also cilk+ in gcc is not really production ready yet, I got > lots of crashes when I last tried it (it will be in 5.0 though). The data parallel constructs tend to crash the compiler, but task spawning seems to be stable in 4.9.2. I've still to see how it handles multiprocessing/fork. What do you mean by will be in 5.0, did they do a big push? > Date: Fri, 16 Jan 2015 13:28:56 +0100 > From: Da?id > Subject: Re: [Numpy-discussion] Sorting refactor > To: Discussion of Numerical Python > Message-ID: > > Content-Type: text/plain; charset=UTF-8 > > On 16 January 2015 at 13:15, Eelco Hoogendoorn > wrote: >> Perhaps an algorithm can be made faster, but often these multicore >> algorithms are also less efficient, and a less data-dependent way of putting >> my cores to good use would have been preferable. Potentially, other code >> could slow down due to cache trashing if too many parallel tasks run in >> parallel. Id rather be in charge of such matters myself; but I imagine >> adding a keyword arg for these matters would not be much trouble? > > As I understand it, that is where the strength of Cilk+ lies. It does > not force parallelisation, just suggests it. The decision to actually > spawn parallel is decided at runtime depending on the load of the > other cores. cilk+ guarantees that the amount of space used by a pool of P threads is at most P times the stack space used by the sequential version (+ a constant). The idea is that you can say for (i = 0; i < 1000000; i++) { cilk_spawn f(a[i]); } and it will never create more than P work items in memory, rather than 1e6, even if each f() spawns a bunch itself. Of course, it won't guarantee that OpenMP will not also spawn P threads and/or check that you're one of P processes cooperating on a task using multiprocessing. Personally I'd rather have an np.setnumthreads() to turn this on or off for a process and have the runtime distribute work for me instead of having to do it myself. From larsmans at gmail.com Fri Jan 16 09:23:03 2015 From: larsmans at gmail.com (Lars Buitinck) Date: Fri, 16 Jan 2015 15:23:03 +0100 Subject: [Numpy-discussion] Sorting refactor Message-ID: 2015-01-16 15:14 GMT+01:00 : > Date: Fri, 16 Jan 2015 06:11:29 -0800 > From: Jaime Fern?ndez del R?o > Subject: Re: [Numpy-discussion] Sorting refactor > To: Discussion of Numerical Python > > Most of my proposed original changes do not affect the core sorting > functionality, just the infrastructure around it. But if we agree that > sorting has potential for being an actively developed part of the code > base, then cleaning up its surroundings for clarity makes sense, so I'm > taking your brain dump as an aye for my proposal. ;-) It is! From jtaylor.debian at googlemail.com Fri Jan 16 10:11:25 2015 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 16 Jan 2015 16:11:25 +0100 Subject: [Numpy-discussion] Sorting refactor In-Reply-To: References: Message-ID: <54B92A1D.8030808@googlemail.com> On 01/16/2015 03:14 PM, Lars Buitinck wrote: > 2015-01-16 13:29 GMT+01:00 : >> Date: Fri, 16 Jan 2015 12:43:43 +0100 >> From: Julian Taylor >> Subject: Re: [Numpy-discussion] Sorting refactor >> To: Discussion of Numerical Python >> Message-ID: <54B8F96F.7090903 at googlemail.com> >> Content-Type: text/plain; charset=windows-1252 >> >> On 16.01.2015 12:33, Lars Buitinck wrote: >>> * Quicksort has a quadratic time worst case. I think it should be >>> turned into an introsort [1] for O(n log n) worst case; we have the >>> heapsort needed to do that. >> >> This probably rarely happens in numeric data, and we do have guaranteed >> nlog runtime algorithms available. > > It's no more likely or unlikely than in any other type of data > (AFAIK), but it's possible for an adversary to DOS any (web server) > code that uses np.sort. if you are using numpy where an arbitrary user is allowed to control the data passed to a non isolated environment you have a problem anyway. numpy is far from secure software and there are likely hundreds of ways to produce DOS and dozens of ways to do code execution in any nontrivial numpy using application. > >> I was also thinking about this, an advantage of a sorting network is >> that it can be vectorized to be significantly faster than an insertion >> sort. Handling NaN's should also be directly possible. > > Tried that, and it didn't give any speedup, at least not without > explicit vector instructions. > > Just moving the NaNs aside didn't cost anything in my preliminary > benchmarks (without sorting nets), the cost of the operation was > almost exactly compensated by simpler comparisons. an SSE2 implementation a 16 entry bitonic sort is available here: https://github.com/mischasan/sse2/blob/master/ssesort.c there is also a benchmark, on my machine its 6 times faster than insertion sort. But again this would only gain us 5-10% improvement at best as the partition part of quicksort is still the major time consuming part. > >> The issue is that its probably too much complicated code for only a very >> small gain. > > Maybe. The thing is that the selection algorithms are optimized for > NaNs and seem to depend on the current comparison code. We'd need > distinct _LT and _LT_NONAN for each . > > The sorting nets themselves aren't complicated, just lengthy. My > branch has the length-optimal (not optimally parallel) ones for n <= > 16. > >> But I assume you will get the same trouble as openmp but that needs >> testing, also cilk+ in gcc is not really production ready yet, I got >> lots of crashes when I last tried it (it will be in 5.0 though). > > The data parallel constructs tend to crash the compiler, but task > spawning seems to be stable in 4.9.2. I've still to see how it handles > multiprocessing/fork. > > What do you mean by will be in 5.0, did they do a big push? gcc 5.0 changelog reports "full support for cilk plus". Also all bugs I have filed have been fixed in 5.0. > > >> Date: Fri, 16 Jan 2015 13:28:56 +0100 >> From: Da?id >> Subject: Re: [Numpy-discussion] Sorting refactor >> To: Discussion of Numerical Python >> Message-ID: >> >> Content-Type: text/plain; charset=UTF-8 >> >> On 16 January 2015 at 13:15, Eelco Hoogendoorn >> wrote: >>> Perhaps an algorithm can be made faster, but often these multicore >>> algorithms are also less efficient, and a less data-dependent way of putting >>> my cores to good use would have been preferable. Potentially, other code >>> could slow down due to cache trashing if too many parallel tasks run in >>> parallel. Id rather be in charge of such matters myself; but I imagine >>> adding a keyword arg for these matters would not be much trouble? >> >> As I understand it, that is where the strength of Cilk+ lies. It does >> not force parallelisation, just suggests it. The decision to actually >> spawn parallel is decided at runtime depending on the load of the >> other cores. > > cilk+ guarantees that the amount of space used by a pool of P threads > is at most P times the stack space used by the sequential version (+ a > constant). The idea is that you can say > > for (i = 0; i < 1000000; i++) { > cilk_spawn f(a[i]); > } > > and it will never create more than P work items in memory, rather than > 1e6, even if each f() spawns a bunch itself. Of course, it won't > guarantee that OpenMP will not also spawn P threads and/or check that > you're one of P processes cooperating on a task using multiprocessing. > > Personally I'd rather have an np.setnumthreads() to turn this on or > off for a process and have the runtime distribute work for me instead > of having to do it myself. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From hoogendoorn.eelco at gmail.com Fri Jan 16 11:14:24 2015 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Fri, 16 Jan 2015 17:14:24 +0100 Subject: [Numpy-discussion] Sorting refactor In-Reply-To: <54B92A1D.8030808@googlemail.com> References: <54B92A1D.8030808@googlemail.com> Message-ID: I agree; an np.setnumthreads to manage a numpy-global threadpool makes sense to me. Of course there are a great many cases where just spawning as many threads as cores is a sensible default, but if this kind of behavior could not be overridden I could see that greatly reduce performance for some of my more complex projects On Fri, Jan 16, 2015 at 4:11 PM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > On 01/16/2015 03:14 PM, Lars Buitinck wrote: > > 2015-01-16 13:29 GMT+01:00 : > >> Date: Fri, 16 Jan 2015 12:43:43 +0100 > >> From: Julian Taylor > >> Subject: Re: [Numpy-discussion] Sorting refactor > >> To: Discussion of Numerical Python > >> Message-ID: <54B8F96F.7090903 at googlemail.com> > >> Content-Type: text/plain; charset=windows-1252 > >> > >> On 16.01.2015 12:33, Lars Buitinck wrote: > >>> * Quicksort has a quadratic time worst case. I think it should be > >>> turned into an introsort [1] for O(n log n) worst case; we have the > >>> heapsort needed to do that. > >> > >> This probably rarely happens in numeric data, and we do have guaranteed > >> nlog runtime algorithms available. > > > > It's no more likely or unlikely than in any other type of data > > (AFAIK), but it's possible for an adversary to DOS any (web server) > > code that uses np.sort. > > if you are using numpy where an arbitrary user is allowed to control the > data passed to a non isolated environment you have a problem anyway. > numpy is far from secure software and there are likely hundreds of ways > to produce DOS and dozens of ways to do code execution in any nontrivial > numpy using application. > > > > >> I was also thinking about this, an advantage of a sorting network is > >> that it can be vectorized to be significantly faster than an insertion > >> sort. Handling NaN's should also be directly possible. > > > > Tried that, and it didn't give any speedup, at least not without > > explicit vector instructions. > > > > Just moving the NaNs aside didn't cost anything in my preliminary > > benchmarks (without sorting nets), the cost of the operation was > > almost exactly compensated by simpler comparisons. > > an SSE2 implementation a 16 entry bitonic sort is available here: > https://github.com/mischasan/sse2/blob/master/ssesort.c > there is also a benchmark, on my machine its 6 times faster than > insertion sort. > But again this would only gain us 5-10% improvement at best as the > partition part of quicksort is still the major time consuming part. > > > > >> The issue is that its probably too much complicated code for only a very > >> small gain. > > > > Maybe. The thing is that the selection algorithms are optimized for > > NaNs and seem to depend on the current comparison code. We'd need > > distinct _LT and _LT_NONAN for each . > > > > The sorting nets themselves aren't complicated, just lengthy. My > > branch has the length-optimal (not optimally parallel) ones for n <= > > 16. > > > >> But I assume you will get the same trouble as openmp but that needs > >> testing, also cilk+ in gcc is not really production ready yet, I got > >> lots of crashes when I last tried it (it will be in 5.0 though). > > > > The data parallel constructs tend to crash the compiler, but task > > spawning seems to be stable in 4.9.2. I've still to see how it handles > > multiprocessing/fork. > > > > What do you mean by will be in 5.0, did they do a big push? > > gcc 5.0 changelog reports "full support for cilk plus". > Also all bugs I have filed have been fixed in 5.0. > > > > > > >> Date: Fri, 16 Jan 2015 13:28:56 +0100 > >> From: Da?id > >> Subject: Re: [Numpy-discussion] Sorting refactor > >> To: Discussion of Numerical Python > >> Message-ID: > >> iuNQrEiaSDY23DNW6w at mail.gmail.com> > >> Content-Type: text/plain; charset=UTF-8 > >> > >> On 16 January 2015 at 13:15, Eelco Hoogendoorn > >> wrote: > >>> Perhaps an algorithm can be made faster, but often these multicore > >>> algorithms are also less efficient, and a less data-dependent way of > putting > >>> my cores to good use would have been preferable. Potentially, other > code > >>> could slow down due to cache trashing if too many parallel tasks run in > >>> parallel. Id rather be in charge of such matters myself; but I imagine > >>> adding a keyword arg for these matters would not be much trouble? > >> > >> As I understand it, that is where the strength of Cilk+ lies. It does > >> not force parallelisation, just suggests it. The decision to actually > >> spawn parallel is decided at runtime depending on the load of the > >> other cores. > > > > cilk+ guarantees that the amount of space used by a pool of P threads > > is at most P times the stack space used by the sequential version (+ a > > constant). The idea is that you can say > > > > for (i = 0; i < 1000000; i++) { > > cilk_spawn f(a[i]); > > } > > > > and it will never create more than P work items in memory, rather than > > 1e6, even if each f() spawns a bunch itself. Of course, it won't > > guarantee that OpenMP will not also spawn P threads and/or check that > > you're one of P processes cooperating on a task using multiprocessing. > > > > Personally I'd rather have an np.setnumthreads() to turn this on or > > off for a process and have the runtime distribute work for me instead > > of having to do it myself. > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jan 16 11:15:29 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 16 Jan 2015 09:15:29 -0700 Subject: [Numpy-discussion] Sorting refactor In-Reply-To: References: Message-ID: On Fri, Jan 16, 2015 at 7:11 AM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > On Fri, Jan 16, 2015 at 3:33 AM, Lars Buitinck wrote: > >> 2015-01-16 11:55 GMT+01:00 : >> > Message: 2 >> > Date: Thu, 15 Jan 2015 21:24:00 -0800 >> > From: Jaime Fern?ndez del R?o >> > Subject: [Numpy-discussion] Sorting refactor >> > To: Discussion of Numerical Python >> > Message-ID: >> > < >> CAPOWHWkF6RnWcrGmcwsmq_LO3hShjgBVLsrN19z-MDPe25E2Aw at mail.gmail.com> >> > Content-Type: text/plain; charset="utf-8" >> > >> > This changes will make it easier for me to add a Timsort generic type >> > function to numpy's arsenal of sorting routines. And I think they have >> > value by cleaning the source code on their own. >> >> Yes, they do. I've been looking at the sorting functions as well and >> I've found the following: >> >> * The code is generally hard to read because it prefers pointer over >> indices. I'm wondering if it would get slower using indices. The >> closer these algorithms are to the textbook, the easier to insert >> fancy optimizations. >> > > They are harder to read, but so cute to look at! C code just wouldn't feel > the same without some magical pointer arithmetic thrown in here and there. > ;-) > Pointers were faster than indexing. That advantage can be hardware dependent, but for small numbers of pointers is typical. > > >> * The heap sort exploits undefined behavior by using a pointer that >> points before the start of the array. However, rewriting it to always >> point within the array made it slower. I haven't tried rewriting it >> using indices > > Fortran uses the same pointer trick for one based indexing, or at least the old DEC compilers did ;) There is no reason to avoid it. > . >> >> * Quicksort has a quadratic time worst case. I think it should be >> turned into an introsort [1] for O(n log n) worst case; we have the >> heapsort needed to do that. >> >> * Quicksort is robust to repeated elements, but doesn't exploit them. >> It can be made to run in linear time if the input array has only O(1) >> distinct elements [2]. This may come at the expense of some >> performance on arrays with no repeated elements. >> > > Java famously changed its library implementation of quicksort to a dual > pivot one invented by Vladimir Yaroslavskiy[1], they claim that with > substantial performance gains. I tried to implement that for numpy [2], but > couldn't get it to work any faster than the current code. > For sorting, simple often beats fancy. > > * Using optimal sorting networks instead of insertion sort as the base >> case can speed up quicksort on float arrays by 5-10%, but only if NaNs >> are moved out of the way first so that comparisons become cheaper [3]. >> This has consequences for the selection algorithms that I haven't >> fully worked out yet. >> > > I expect the gains here would be for small sorts, which tend to be dominated by call overhead. > Even if we stick with selection sort, we should spin it off into an inline > smallsort function within the npy_sort library, and have quicksort and > mergesort call the same function, instead of each implementing their own. > It would make optimizations like the sorting networks easier to implement > for all sorts. We could even expose it outside npy_sort, as there are a few > places around the code base that have ad-hoc implementations of sorting. > Good idea, I've thought of doing it myself. > >> * Using Cilk Plus to parallelize merge sort can make it significantly >> faster than quicksort at the expense of only a few lines of code, but >> I haven't checked whether Cilk Plus plays nicely with multiprocessing >> and other parallelism options (remember the trouble with OpenMP-ified >> OpenBLAS?). >> >> This isn't really an answer to your questions, more like a brain dump >> from someone who's stared at the same code for a while and did some >> experiments. I'm not saying we should implement all of this, but keep >> in mind that there are some interesting options besides implementing >> timsort. >> > > Timsort came up in a discussion several months ago, where I proposed > adding a mergesorted function (which I have mostly ready, by the way, [3]) > to speed-up some operations in arraysetops. I have serious doubts that it > will perform comparably to the other sorts unless comparisons are terribly > expensive, which they typically aren't in numpy, but it has been an > interesting learning exercise so far, and I don't mind taking it all the > way. > > Most of my proposed original changes do not affect the core sorting > functionality, just the infrastructure around it. But if we agree that > sorting has potential for being an actively developed part of the code > base, then cleaning up its surroundings for clarity makes sense, so I'm > taking your brain dump as an aye for my proposal. ;-) > I have a generic quicksort with standard interface sitting around somewhere in an ancient branch. Sorting objects needs to be sensitive to comparison exceptions, which is something to keep in mind. I'd also like to push the GIL release back down into the interface functions where it used to be, but that isn't a priority. Another other possibility I've toyed with is adding a step for sorting non-contiguous arrays, but the sort functions being part of the dtype complicates that for compatibility reasons. I suppose that could be handled with interface functions. I think the prototypes should also be regularized. Cleaning up the sorting dispatch to use just one function and avoid the global would be good, the current code is excessively ugly. That cleanup, together with a generic quicksort, would be a good place to start. And remember, simpler is better. Usually. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Fri Jan 16 19:11:55 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Fri, 16 Jan 2015 16:11:55 -0800 Subject: [Numpy-discussion] Sorting refactor In-Reply-To: References: Message-ID: On Fri, Jan 16, 2015 at 8:15 AM, Charles R Harris wrote: > > > On Fri, Jan 16, 2015 at 7:11 AM, Jaime Fern?ndez del R?o < > jaime.frio at gmail.com> wrote: > >> On Fri, Jan 16, 2015 at 3:33 AM, Lars Buitinck >> wrote: >> >>> 2015-01-16 11:55 GMT+01:00 : >>> > Message: 2 >>> > Date: Thu, 15 Jan 2015 21:24:00 -0800 >>> > From: Jaime Fern?ndez del R?o >>> > Subject: [Numpy-discussion] Sorting refactor >>> > To: Discussion of Numerical Python >>> > Message-ID: >>> > < >>> CAPOWHWkF6RnWcrGmcwsmq_LO3hShjgBVLsrN19z-MDPe25E2Aw at mail.gmail.com> >>> > Content-Type: text/plain; charset="utf-8" >>> > >>> > This changes will make it easier for me to add a Timsort generic type >>> > function to numpy's arsenal of sorting routines. And I think they have >>> > value by cleaning the source code on their own. >>> >>> Yes, they do. I've been looking at the sorting functions as well and >>> I've found the following: >>> >>> * The code is generally hard to read because it prefers pointer over >>> indices. I'm wondering if it would get slower using indices. The >>> closer these algorithms are to the textbook, the easier to insert >>> fancy optimizations. >>> >> >> They are harder to read, but so cute to look at! C code just wouldn't >> feel the same without some magical pointer arithmetic thrown in here and >> there. ;-) >> > > Pointers were faster than indexing. That advantage can be hardware > dependent, but for small numbers of pointers is typical. > > >> >> >>> * The heap sort exploits undefined behavior by using a pointer that >>> points before the start of the array. However, rewriting it to always >>> point within the array made it slower. I haven't tried rewriting it >>> using indices >> >> > Fortran uses the same pointer trick for one based indexing, or at least > the old DEC compilers did ;) There is no reason to avoid it. > > >> . >>> >>> * Quicksort has a quadratic time worst case. I think it should be >>> turned into an introsort [1] for O(n log n) worst case; we have the >>> heapsort needed to do that. >>> >>> * Quicksort is robust to repeated elements, but doesn't exploit them. >>> It can be made to run in linear time if the input array has only O(1) >>> distinct elements [2]. This may come at the expense of some >>> performance on arrays with no repeated elements. >>> >> >> Java famously changed its library implementation of quicksort to a dual >> pivot one invented by Vladimir Yaroslavskiy[1], they claim that with >> substantial performance gains. I tried to implement that for numpy [2], but >> couldn't get it to work any faster than the current code. >> > > For sorting, simple often beats fancy. > > >> >> * Using optimal sorting networks instead of insertion sort as the base >>> case can speed up quicksort on float arrays by 5-10%, but only if NaNs >>> are moved out of the way first so that comparisons become cheaper [3]. >>> This has consequences for the selection algorithms that I haven't >>> fully worked out yet. >>> >> >> > I expect the gains here would be for small sorts, which tend to be > dominated by call overhead. > > >> Even if we stick with selection sort, we should spin it off into an >> inline smallsort function within the npy_sort library, and have quicksort >> and mergesort call the same function, instead of each implementing their >> own. It would make optimizations like the sorting networks easier to >> implement for all sorts. We could even expose it outside npy_sort, as there >> are a few places around the code base that have ad-hoc implementations of >> sorting. >> > > Good idea, I've thought of doing it myself. > > >> >>> * Using Cilk Plus to parallelize merge sort can make it significantly >>> faster than quicksort at the expense of only a few lines of code, but >>> I haven't checked whether Cilk Plus plays nicely with multiprocessing >>> and other parallelism options (remember the trouble with OpenMP-ified >>> OpenBLAS?). >>> >>> This isn't really an answer to your questions, more like a brain dump >>> from someone who's stared at the same code for a while and did some >>> experiments. I'm not saying we should implement all of this, but keep >>> in mind that there are some interesting options besides implementing >>> timsort. >>> >> >> Timsort came up in a discussion several months ago, where I proposed >> adding a mergesorted function (which I have mostly ready, by the way, [3]) >> to speed-up some operations in arraysetops. I have serious doubts that it >> will perform comparably to the other sorts unless comparisons are terribly >> expensive, which they typically aren't in numpy, but it has been an >> interesting learning exercise so far, and I don't mind taking it all the >> way. >> >> Most of my proposed original changes do not affect the core sorting >> functionality, just the infrastructure around it. But if we agree that >> sorting has potential for being an actively developed part of the code >> base, then cleaning up its surroundings for clarity makes sense, so I'm >> taking your brain dump as an aye for my proposal. ;-) >> > > I have a generic quicksort with standard interface sitting around > somewhere in an ancient branch. Sorting objects needs to be sensitive to > comparison exceptions, which is something to keep in mind. I'd also like to > push the GIL release back down into the interface functions where it used > to be, but that isn't a priority. Another other possibility I've toyed with > is adding a step for sorting non-contiguous arrays, but the sort functions > being part of the dtype complicates that for compatibility reasons. I > suppose that could be handled with interface functions. I think the > prototypes should also be regularized. > > Cleaning up the sorting dispatch to use just one function and avoid the > global would be good, the current code is excessively ugly. That cleanup, > together with a generic quicksort, would be a good place to start. > Let the fun begin then.. I have just sent PR #5458, in case anyone wants to take a look. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.a.fischer at gmail.com Sat Jan 17 10:11:41 2015 From: greg.a.fischer at gmail.com (Greg Fischer) Date: Sat, 17 Jan 2015 10:11:41 -0500 Subject: [Numpy-discussion] using f2py with a module containing a derived type? Message-ID: Hello, I would like to use f2py to wrap a Fortran module that contains a derived data type. I don't necessarily need to access the data that is inside the derived type from Python, but I would really like to be able to call the subroutines that are contained inside the module. When I attempt to use f2py on this module, it appears to choke when it gets to the derived data type. Is there any way around this? Thanks, Greg -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjhelmus at gmail.com Sat Jan 17 11:15:42 2015 From: jjhelmus at gmail.com (Jonathan J. Helmus) Date: Sat, 17 Jan 2015 10:15:42 -0600 Subject: [Numpy-discussion] using f2py with a module containing a derived type? In-Reply-To: References: Message-ID: <54BA8AAE.7090604@gmail.com> On 1/17/2015 9:11 AM, Greg Fischer wrote: > Hello, > > I would like to use f2py to wrap a Fortran module that contains a > derived data type. I don't necessarily need to access the data that is > inside the derived type from Python, but I would really like to be > able to call the subroutines that are contained inside the module. > > When I attempt to use f2py on this module, it appears to choke when it > gets to the derived data type. Is there any way around this? > > Thanks, > Greg > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Greg, F2py does not support Fortran derived types natively [1]. I have had good luck using James Kermode's f90wrap [2] to wrap Fortran code with derived types. Cheers, - Jonathan Helmus [1] https://sysbio.ioc.ee/projects/f2py2e/FAQ.html#q-does-f2py-support-derived-types-in-f90-code [2] http://www.jrkermode.co.uk/f90wrap -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sun Jan 18 14:22:54 2015 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 18 Jan 2015 21:22:54 +0200 Subject: [Numpy-discussion] ANN: Scipy 0.15.1 Message-ID: <54BC080E.7040109@iki.fi> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dear all, We are pleased to announce the Scipy 0.15.1 release. Scipy 0.15.1 contains only bugfixes. The module ``scipy.linalg.calc_lwork`` removed in Scipy 0.15.0 is restored. This module is not a part of Scipy's public API, and although it is available again in Scipy 0.15.1, using it is deprecated and it may be removed again in a future Scipy release. Source tarballs, binaries, and full release notes are available at https://sourceforge.net/projects/scipy/files/scipy/0.15.1/ Best regards, Pauli Virtanen ========================== SciPy 0.15.1 Release Notes ========================== SciPy 0.15.1 is a bug-fix release with no new features compared to 0.15.0. Issues fixed - ------------ * `#4413 `__: BUG: Tests too strict, f2py doesn't have to overwrite this array * `#4417 `__: BLD: avoid using NPY_API_VERSION to check not using deprecated... * `#4418 `__: Restore and deprecate scipy.linalg.calc_work -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEARECAAYFAlS8CA4ACgkQ6BQxb7O0pWCmOQCgzg9AXDaqRaK5/QBWopIrv2OA WkEAn0ltDfDHFpw0zMzB9mUscAAb2xnE =JrGj -----END PGP SIGNATURE----- From allanhaldane at gmail.com Sun Jan 18 23:36:50 2015 From: allanhaldane at gmail.com (Allan Haldane) Date: Sun, 18 Jan 2015 23:36:50 -0500 Subject: [Numpy-discussion] structured arrays, recarrays, and record arrays Message-ID: <54BC89E2.2070709@gmail.com> Hello all, Documentation of recarrays is poor and I'd like to improve it. In order to do this I've been looking at core/records.py, and I would appreciate some feedback on my plan. Let me start by describing what I see. In the docs there is some confusion about 'structured arrays' vs 'record arrays' vs 'recarrays' - the docs use them often interchangeably. They also refer to structured dtypes alternately as 'struct data types', 'record data types' or simply 'records' (eg, see the reference/arrays.dtypes and reference/arrays.indexing doc pages). But by my reading of the code there are really three (or four) distinct types of arrays with structure. Here's a possible nomenclature: * "Structured arrays" are simply ndarrays with structured dtypes. That is, the data type is subdivided into fields of different type. * "recarrays" are a subclass of ndarrays that allow access to the fields by attribute. * "Record arrays" are recarrays where the elements have additionally been converted to 'numpy.core.records.record' type such that each data element is an object with field attributes. * (it is also possible to create arrays with dtype.dtype of numpy.core.records.record, but which are not recarrays. However I have never seen this done.) Here's code demonstrating the creation of the different types of array (in order: structured array, recarray, ???, record array). >>> arr = np.array([(1,'a'), (2,'b')], dtype=[('foo', int), ('bar', 'S1')]) >>> recarr = arr.view(type=np.recarray) >>> noname = arr.view(dtype=dtype(np.record, arr.dtype)) >>> recordarr = arr.view(dtype=dtype((np.record, arr.dtype)), type=np.recarray) >>> type(arr), arr.dtype.type (numpy.ndarray, numpy.void) >>> type(recarr), recarr.dtype.type (numpy.core.records.recarray, numpy.void) >>> type(recordarr), recordarr.dtype.type (numpy.core.records.recarray, numpy.core.records.record) Note that the functions numpy.rec.array, numpy.rec.fromrecords, numpy.rec.fromarrays, and np.recarray.__new__ create record arrays. However, in the docs you can see examples of the creation of recarrays, eg in the recarray and ndarray.view doctrings and in http://www.scipy.org/Cookbook/Recarray. The files numpy/lib/recfunctions.py and numpy/lib/npyio.py (and possibly masked arrays, but I haven't looked yet) make extensive use of recarrays (but not record arrays). The main functional difference between recarrays and record arrays is field access on individual elements: >>> recordarr[0].foo 1 >>> recarr[0].foo Traceback (most recent call last): File "", line 1, in AttributeError: 'numpy.void' object has no attribute 'foo' Also, note that recarrays have a small performance penalty relative to structured arrays, and record arrays have another one relative to recarrays because of the additional python logic. So my first goal in updating the docs is to use the right terms in the right place. In almost all cases, references to 'records' (eg 'record types') should be replaced with 'structured' (eg 'structured types'), with the exception of docs that deal specifically with record arrays. It's my guess that in the distant past structured datatypes were intended to always be of type numpy.core.records.record (thus the description in reference/arrays.dtypes) but that numpy.core.records.record became generally obsolete without updates to the docs. doc/records.rst.txt seems to document the transition. I've made a preliminary pass of the docs, which you can see here https://github.com/ahaldane/numpy/commit/d87633b228dabee2ddfe75d1ee9e41ba7039e715 Mostly I renamed 'record type' to 'structured type', and added a very rough draft to numpy/doc/structured_arrays.py. I would love to hear from those more knowledgeable than myself on whether this works! Cheers, Allan From allanhaldane at gmail.com Sun Jan 18 23:52:47 2015 From: allanhaldane at gmail.com (Allan Haldane) Date: Sun, 18 Jan 2015 23:52:47 -0500 Subject: [Numpy-discussion] structured arrays, recarrays, and record arrays In-Reply-To: <54BC89E2.2070709@gmail.com> References: <54BC89E2.2070709@gmail.com> Message-ID: <54BC8D9F.4080407@gmail.com> In light of my previous message I'd like to bring up https://github.com/numpy/numpy/issues/3581, as it is now clearer to me what is happening. In the example on that page the user creates a recarray and a record array (in my nomenclature) without realizing that they are slightly different types of beast. This is probably because the str() or repr() representations of these two objects are identical. To distinguish them you have to look at their dtype.type. Using the setup from my last message: >>> print repr(recarr) rec.array([(1, 'a'), (2, 'b')], dtype=[('foo', '>> print repr(recordarr) rec.array([(1, 'a'), (2, 'b')], dtype=[('foo', '>> print repr(recarr.dtype) dtype([('foo', '>> print repr(recordarr.dtype) dtype([('foo', '>> print recarr.dtype.type >>> print recordarr.dtype.type Based on this, it occurs to me that the repr of a dtype should list dtype.type if it is not numpy.void. This might be nice to see: >>> print repr(recarr.dtype) dtype([('foo', '>> print repr(recordarr.dtype) dtype((numpy.core.records.record, [('foo', ' Hello all, > > Documentation of recarrays is poor and I'd like to improve it. In order > to do this I've been looking at core/records.py, and I would appreciate > some feedback on my plan. > > Let me start by describing what I see. In the docs there is some > confusion about 'structured arrays' vs 'record arrays' vs 'recarrays' - > the docs use them often interchangeably. They also refer to structured > dtypes alternately as 'struct data types', 'record data types' or simply > 'records' (eg, see the reference/arrays.dtypes and > reference/arrays.indexing doc pages). > > But by my reading of the code there are really three (or four) distinct > types of arrays with structure. Here's a possible nomenclature: > * "Structured arrays" are simply ndarrays with structured dtypes. That > is, the data type is subdivided into fields of different type. > * "recarrays" are a subclass of ndarrays that allow access to the > fields by attribute. > * "Record arrays" are recarrays where the elements have additionally > been converted to 'numpy.core.records.record' type such that each > data element is an object with field attributes. > * (it is also possible to create arrays with dtype.dtype of > numpy.core.records.record, but which are not recarrays. However I > have never seen this done.) > > Here's code demonstrating the creation of the different types of array > (in order: structured array, recarray, ???, record array). > > >>> arr = np.array([(1,'a'), (2,'b')], > dtype=[('foo', int), ('bar', 'S1')]) > >>> recarr = arr.view(type=np.recarray) > >>> noname = arr.view(dtype=dtype(np.record, arr.dtype)) > >>> recordarr = arr.view(dtype=dtype((np.record, arr.dtype)), > type=np.recarray) > > >>> type(arr), arr.dtype.type > (numpy.ndarray, numpy.void) > >>> type(recarr), recarr.dtype.type > (numpy.core.records.recarray, numpy.void) > >>> type(recordarr), recordarr.dtype.type > (numpy.core.records.recarray, numpy.core.records.record) > > Note that the functions numpy.rec.array, numpy.rec.fromrecords, > numpy.rec.fromarrays, and np.recarray.__new__ create record arrays. > However, in the docs you can see examples of the creation of recarrays, > eg in the recarray and ndarray.view doctrings and in > http://www.scipy.org/Cookbook/Recarray. The files > numpy/lib/recfunctions.py and numpy/lib/npyio.py (and possibly masked > arrays, but I haven't looked yet) make extensive use of recarrays (but > not record arrays). > > The main functional difference between recarrays and record arrays is > field access on individual elements: > > >>> recordarr[0].foo > 1 > >>> recarr[0].foo > Traceback (most recent call last): > File "", line 1, in > AttributeError: 'numpy.void' object has no attribute 'foo' > > Also, note that recarrays have a small performance penalty relative to > structured arrays, and record arrays have another one relative to > recarrays because of the additional python logic. > > So my first goal in updating the docs is to use the right terms in the > right place. In almost all cases, references to 'records' (eg 'record > types') should be replaced with 'structured' (eg 'structured types'), > with the exception of docs that deal specifically with record arrays. > It's my guess that in the distant past structured datatypes were > intended to always be of type numpy.core.records.record (thus the > description in reference/arrays.dtypes) but that > numpy.core.records.record became generally obsolete without updates to > the docs. doc/records.rst.txt seems to document the transition. > > I've made a preliminary pass of the docs, which you can see here > https://github.com/ahaldane/numpy/commit/d87633b228dabee2ddfe75d1ee9e41ba7039e715 > > Mostly I renamed 'record type' to 'structured type', and added a very > rough draft to numpy/doc/structured_arrays.py. > > I would love to hear from those more knowledgeable than myself on > whether this works! > > Cheers, > Allan From solipsis at pitrou.net Mon Jan 19 13:39:19 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 19 Jan 2015 19:39:19 +0100 Subject: [Numpy-discussion] Aligned array allocations Message-ID: <20150119193919.3e8bb997@fsol> Hello, In https://github.com/numpy/numpy/issues/5312 there's a request for an aligned allocator in Numpy (more than the default alignment of the platform's memory allocator). The reason is that on modern vectorization instruction sets, a certain alignment is required for optimal performance (even though unaligned data still works: it's just that performance is degraded... by how much will depend on the CPU micro-architecture). For example Intel recommends a 32-byte alignment for AVX loads and stores. In https://github.com/numpy/numpy/pull/5457 I have proposed a patch to wrap the system allocator in an aligned allocator. The proposed scheme makes the alignment configurable at runtime (through a Python API), because different platforms may have different desirable alignments, and it is not reasonable for Numpy to know about them all, nor for users to recompile Numpy each time they have a different CPU. By always using an aligned allocator there is some overhead: - all arrays occupy a bit more memory by a small average amount (probably 16 bytes average on a 64-bit machine, for a 16 byte guaranteed alignment) - array resizes can be more expensive in CPU time, when the physical start changes and its alignment changes too There is also a limitation: while the physical start of an array will always be aligned, this can be defeated when taking a view starting at a non-zero index. (note that to take advantage of certain instruction set features such as AVX, Numpy may need to be compiled with specific compiler flags... but Numpy's allocations also affect other packages such as Numba which is able to generate code at runtime) I would like to know if people are interested in this feature, and if the proposed approach is acceptable. Regards Antoine. From efiring at hawaii.edu Tue Jan 20 17:51:29 2015 From: efiring at hawaii.edu (Eric Firing) Date: Tue, 20 Jan 2015 12:51:29 -1000 Subject: [Numpy-discussion] EDF+ specification Message-ID: <54BEDBF1.7010200@hawaii.edu> http://www.edfplus.info/specs/edfplus.html#additionalspecs Io, Is this the file format you have? Eric From njs at pobox.com Tue Jan 20 18:17:52 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 20 Jan 2015 23:17:52 +0000 Subject: [Numpy-discussion] EDF+ specification In-Reply-To: <54BEDBF1.7010200@hawaii.edu> References: <54BEDBF1.7010200@hawaii.edu> Message-ID: On Tue, Jan 20, 2015 at 10:51 PM, Eric Firing wrote: > http://www.edfplus.info/specs/edfplus.html#additionalspecs > > Io, Is this the file format you have? Sorry, I don't quite understand the question! Maybe you're looking for https://github.com/breuderink/eegtools https://github.com/rays/pyedf https://bitbucket.org/cleemesser/python-edf/ ...? -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From efiring at hawaii.edu Tue Jan 20 18:34:46 2015 From: efiring at hawaii.edu (Eric Firing) Date: Tue, 20 Jan 2015 13:34:46 -1000 Subject: [Numpy-discussion] EDF+ specification In-Reply-To: References: <54BEDBF1.7010200@hawaii.edu> Message-ID: <54BEE616.1060403@hawaii.edu> Nathaniel, I don't know what sequence of wrong button pushes led to this, but the message was intended for Io Flament. Sorry for the puzzling disruption! Eric On 2015/01/20 1:17 PM, Nathaniel Smith wrote: > On Tue, Jan 20, 2015 at 10:51 PM, Eric Firing wrote: >> http://www.edfplus.info/specs/edfplus.html#additionalspecs >> >> Io, Is this the file format you have? > > Sorry, I don't quite understand the question! > > Maybe you're looking for > > https://github.com/breuderink/eegtools > https://github.com/rays/pyedf > https://bitbucket.org/cleemesser/python-edf/ > > ...? > From charlesr.harris at gmail.com Thu Jan 22 09:51:33 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 22 Jan 2015 07:51:33 -0700 Subject: [Numpy-discussion] Datetime again Message-ID: Hi All, I'm playing with the idea of building a simplified datetime class on top of the current numpy implementation. I believe Pandas does something like this, and blaze will (does?) have a simplified version. The reason for the new class would be to have an easier, and hopefully more portable, API that can be implemented in Python, and maybe pushed down into C when things settle. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Jan 22 09:54:49 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 22 Jan 2015 14:54:49 +0000 Subject: [Numpy-discussion] Datetime again In-Reply-To: References: Message-ID: On Thu, Jan 22, 2015 at 2:51 PM, Charles R Harris wrote: > Hi All, > > I'm playing with the idea of building a simplified datetime class on top of > the current numpy implementation. I believe Pandas does something like this, > and blaze will (does?) have a simplified version. The reason for the new > class would be to have an easier, and hopefully more portable, API that can > be implemented in Python, and maybe pushed down into C when things settle. When you say "datetime class" what do you mean? A dtype? An ndarray subclass? A python class representing a scalar datetime that you can put in an object array? ...? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From charlesr.harris at gmail.com Thu Jan 22 10:08:36 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 22 Jan 2015 08:08:36 -0700 Subject: [Numpy-discussion] Datetime again In-Reply-To: References: Message-ID: On Thu, Jan 22, 2015 at 7:54 AM, Nathaniel Smith wrote: > On Thu, Jan 22, 2015 at 2:51 PM, Charles R Harris > wrote: > > Hi All, > > > > I'm playing with the idea of building a simplified datetime class on top > of > > the current numpy implementation. I believe Pandas does something like > this, > > and blaze will (does?) have a simplified version. The reason for the new > > class would be to have an easier, and hopefully more portable, API that > can > > be implemented in Python, and maybe pushed down into C when things > settle. > > When you say "datetime class" what do you mean? A dtype? An ndarray > subclass? A python class representing a scalar datetime that you can > put in an object array? ...? > I was thinking an ndarray subclass that is based on a single datetime type, but part of the reason for this post is to elicit ideas. I'm influenced by Mark's discussion apropos blaze . I thought it easier to start such a project in python, as it is far easier for people interested in the problem to work with. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Jan 22 10:18:17 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 22 Jan 2015 08:18:17 -0700 Subject: [Numpy-discussion] Datetime again In-Reply-To: References: Message-ID: On Thu, Jan 22, 2015 at 8:08 AM, Charles R Harris wrote: > > > On Thu, Jan 22, 2015 at 7:54 AM, Nathaniel Smith wrote: > >> On Thu, Jan 22, 2015 at 2:51 PM, Charles R Harris >> wrote: >> > Hi All, >> > >> > I'm playing with the idea of building a simplified datetime class on >> top of >> > the current numpy implementation. I believe Pandas does something like >> this, >> > and blaze will (does?) have a simplified version. The reason for the new >> > class would be to have an easier, and hopefully more portable, API that >> can >> > be implemented in Python, and maybe pushed down into C when things >> settle. >> >> When you say "datetime class" what do you mean? A dtype? An ndarray >> subclass? A python class representing a scalar datetime that you can >> put in an object array? ...? >> > > I was thinking an ndarray subclass that is based on a single datetime > type, but part of the reason for this post is to elicit ideas. I'm > influenced by Mark's discussion apropos blaze > . > I thought it easier to start such a project in python, as it is far easier > for people interested in the problem to work with. > And if I had my druthers, it would use quad precision floating point at it's heart. The 64 bits of long long really isn't enough and leads to all sorts of compromises. But that is probably a pipe dream. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Thu Jan 22 10:24:20 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 22 Jan 2015 15:24:20 +0000 (UTC) Subject: [Numpy-discussion] Aligned array allocations References: <20150119193919.3e8bb997@fsol> Message-ID: <738467086443632175.479267sturla.molden-gmail.com@news.gmane.org> Antoine Pitrou wrote: > By always using an aligned allocator there is some overhead: > - all arrays occupy a bit more memory by a small average amount > (probably 16 bytes average on a 64-bit machine, for a 16 byte > guaranteed alignment) NumPy arrays are Python objects. They have an overhead anyway, much more than this, and 16 bytes are not worse than adding a couple of pointers to the struct. In the big picture this tiny overhead does not matter. > - array resizes can be more expensive in CPU time, when the physical > start changes and its alignment changes too We are using Python. If we were worried about small inefficiencies we would not be using it. Resizing ndarrays are rare anyway. They are not used like Python lists or instead of lists. We use lists in the same way as anyone else who uses Python. So an ndarray resize can afford to be more espensive than a list append. Also the NumPy community expects an ndarray resize to be expensive and O(n) due to its current behavior: If an array has a view, realloc is out of the question. :-) Sturla From njs at pobox.com Thu Jan 22 15:58:52 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 22 Jan 2015 20:58:52 +0000 Subject: [Numpy-discussion] Datetime again In-Reply-To: References: Message-ID: On Thu, Jan 22, 2015 at 3:18 PM, Charles R Harris wrote: > > > On Thu, Jan 22, 2015 at 8:08 AM, Charles R Harris > wrote: >> >> >> >> On Thu, Jan 22, 2015 at 7:54 AM, Nathaniel Smith wrote: >>> >>> On Thu, Jan 22, 2015 at 2:51 PM, Charles R Harris >>> wrote: >>> > Hi All, >>> > >>> > I'm playing with the idea of building a simplified datetime class on >>> > top of >>> > the current numpy implementation. I believe Pandas does something like >>> > this, >>> > and blaze will (does?) have a simplified version. The reason for the >>> > new >>> > class would be to have an easier, and hopefully more portable, API that >>> > can >>> > be implemented in Python, and maybe pushed down into C when things >>> > settle. >>> >>> When you say "datetime class" what do you mean? A dtype? An ndarray >>> subclass? A python class representing a scalar datetime that you can >>> put in an object array? ...? >> >> >> I was thinking an ndarray subclass that is based on a single datetime >> type, but part of the reason for this post is to elicit ideas. I'm >> influenced by Mark's discussion apropos blaze. I thought it easier to >> start such a project in python, as it is far easier for people interested in >> the problem to work with. > > > And if I had my druthers, it would use quad precision floating point at it's > heart. The 64 bits of long long really isn't enough and leads to all sorts > of compromises. But that is probably a pipe dream. I guess there are lots of options -- e.g. 32-bit day + 64-bit time-of-day (I think that would give 11.8 million years at 10-femtisecond precision?). Figuring out which clock this is on matters a lot more though (e.g. how to handle leap-seconds in absolute and relative times -- is adding 1 day always the same as adding 24 * 60 * 60 seconds?). At a very general level, I feel like numpy-qua-numpy's role here shouldn't be to try and add special code to handle any one specific datetime implementation: that hasn't worked out terribly well historically, and as referenced above there's a *ton* of plausible ways of approaching datetime handling that people might want, so we don't want to be in the position of having to pick the-one-and-only implementation. Telling people who want to tweak datetime handling that they have to start mucking around in umath.so is terrible. Instead, we should be trying to evolve numpy to add generic functionality, so that it's prepared to handle multiple third-party approaches to date-time handling (among other things). Implementing prototypes built on top of numpy could be an excellent way to generate ideas for appropriate changes to the numpy core. As far as this specific prototype, I should say that I'm dubious that subclassing ndarray is actually a *good* long-term solution. I really think that the *right* way to solve this would be to improve the dtype system so we could define useful date/time types that worked with plain vanilla ndarrays. But that approach requires a lot more up-front C coding; it's harder to throw together a quick prototype. OTOOH if your goal is the moon then you don't want to waste time investing in ladder technology... so I dunno. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From cmkleffner at gmail.com Thu Jan 22 16:29:25 2015 From: cmkleffner at gmail.com (Carl Kleffner) Date: Thu, 22 Jan 2015 22:29:25 +0100 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) Message-ID: I took time to create mingw-w64 based wheels of numpy-1.9.1 and scipy-0.15.1 source distributions and put them on https://bitbucket.org/carlkl/mingw-w64-for-python/downloads as well as on binstar.org. The test matrix is python-2.7 and 3.4 for both 32bit and 64bit. Feedback is welcome. The wheels can be pip installed with: pip install -i https://pypi.binstar.org/carlkl/simple numpy pip install -i https://pypi.binstar.org/carlkl/simple scipy Some technical details: the binaries are build upon OpenBLAS as accelerated BLAS/Lapack. OpenBLAS itself is build with dynamic kernels (similar to MKL) and automatic runtime selection depending on the CPU. The minimal requested feature supplied by the CPU is SSE2. SSE1 and non-SSE CPUs are not supported with this builds. This is the default for 64bit binaries anyway. OpenBLAS is deployed as part of the numpy wheel. That said, the scipy wheels mentioned above are dependant on the installation of the OpenBLAS based numpy and won't work i.e. with an installed numpy-MKL. For the numpy 32bit builds there are 3 failures for special FP value tests, due to a bug in mingw-w64 that is still present. All scipy versions show up 7 failures with some numerical noise, that could be ignored (or corrected with relaxed asserts in the test code). PR's for numpy and scipy are in preparation. The mingw-w64 compiler used for building can be found at https://bitbucket.org/carlkl/mingw-w64-for-python/downloads. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjw at ncf.ca Thu Jan 22 17:06:16 2015 From: cjw at ncf.ca (cjw) Date: Thu, 22 Jan 2015 17:06:16 -0500 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: <54C17458.3050206@ncf.ca> An HTML attachment was scrubbed... URL: From cmkleffner at gmail.com Thu Jan 22 17:42:52 2015 From: cmkleffner at gmail.com (Carl Kleffner) Date: Thu, 22 Jan 2015 23:42:52 +0100 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: <54C17458.3050206@ncf.ca> References: <54C17458.3050206@ncf.ca> Message-ID: Yes, I build win32 as well as amd64 binaries. Carlkl 2015-01-22 23:06 GMT+01:00 cjw : > Thanks Carl, > > This is good to hear. I presume that the AMD64 is covered. > > Colin W. > > On 22-Jan-15 4:29 PM, Carl Kleffner wrote: > > I took time to create mingw-w64 based wheels of numpy-1.9.1 and > scipy-0.15.1 source distributions and put them onhttps://bitbucket.org/carlkl/mingw-w64-for-python/downloads as well as onbinstar.org. The test matrix is python-2.7 and 3.4 for both 32bit and > 64bit. > > Feedback is welcome. > > The wheels can be pip installed with: > > pip install -i https://pypi.binstar.org/carlkl/simple numpy > pip install -i https://pypi.binstar.org/carlkl/simple scipy > > Some technical details: the binaries are build upon OpenBLAS as accelerated > BLAS/Lapack. OpenBLAS itself is build with dynamic kernels (similar to MKL) > and automatic runtime selection depending on the CPU. The minimal requested > feature supplied by the CPU is SSE2. SSE1 and non-SSE CPUs are not > supported with this builds. This is the default for 64bit binaries anyway. > > OpenBLAS is deployed as part of the numpy wheel. That said, the scipy > wheels mentioned above are dependant on the installation of the OpenBLAS > based numpy and won't work i.e. with an installed numpy-MKL. > > For the numpy 32bit builds there are 3 failures for special FP value tests, > due to a bug in mingw-w64 that is still present. All scipy versions show up > 7 failures with some numerical noise, that could be ignored (or corrected > with relaxed asserts in the test code). > > PR's for numpy and scipy are in preparation. The mingw-w64 compiler used > for building can be found athttps://bitbucket.org/carlkl/mingw-w64-for-python/downloads. > > > > > _______________________________________________ > NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Thu Jan 22 18:11:41 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 23 Jan 2015 00:11:41 +0100 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: Were there any failures with the 64 bit build, or did all tests pass? Sturla On 22/01/15 22:29, Carl Kleffner wrote: > I took time to create mingw-w64 based wheels of numpy-1.9.1 and > scipy-0.15.1 source distributions and put them on > https://bitbucket.org/carlkl/mingw-w64-for-python/downloads as well as > on binstar.org . The test matrix is python-2.7 and > 3.4 for both 32bit and 64bit. > > Feedback is welcome. > > The wheels can be pip installed with: > > pip install -i https://pypi.binstar.org/carlkl/simple numpy > pip install -i https://pypi.binstar.org/carlkl/simple scipy > > Some technical details: the binaries are build upon OpenBLAS as > accelerated BLAS/Lapack. OpenBLAS itself is build with dynamic kernels > (similar to MKL) and automatic runtime selection depending on the CPU. > The minimal requested feature supplied by the CPU is SSE2. SSE1 and > non-SSE CPUs are not supported with this builds. This is the default for > 64bit binaries anyway. > > OpenBLAS is deployed as part of the numpy wheel. That said, the scipy > wheels mentioned above are dependant on the installation of the OpenBLAS > based numpy and won't work i.e. with an installed numpy-MKL. > > For the numpy 32bit builds there are 3 failures for special FP value > tests, due to a bug in mingw-w64 that is still present. All scipy > versions show up 7 failures with some numerical noise, that could be > ignored (or corrected with relaxed asserts in the test code). > > PR's for numpy and scipy are in preparation. The mingw-w64 compiler used > for building can be found at > https://bitbucket.org/carlkl/mingw-w64-for-python/downloads. > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From njs at pobox.com Thu Jan 22 18:23:16 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 22 Jan 2015 23:23:16 +0000 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: On Thu, Jan 22, 2015 at 9:29 PM, Carl Kleffner wrote: > I took time to create mingw-w64 based wheels of numpy-1.9.1 and scipy-0.15.1 > source distributions and put them on > https://bitbucket.org/carlkl/mingw-w64-for-python/downloads as well as on > binstar.org. The test matrix is python-2.7 and 3.4 for both 32bit and 64bit. > > Feedback is welcome. > > The wheels can be pip installed with: > > pip install -i https://pypi.binstar.org/carlkl/simple numpy > pip install -i https://pypi.binstar.org/carlkl/simple scipy > > Some technical details: the binaries are build upon OpenBLAS as accelerated > BLAS/Lapack. OpenBLAS itself is build with dynamic kernels (similar to MKL) > and automatic runtime selection depending on the CPU. The minimal requested > feature supplied by the CPU is SSE2. SSE1 and non-SSE CPUs are not supported > with this builds. This is the default for 64bit binaries anyway. According to the steam hardware survey, 99.98% of windows computers have SSE2. (http://store.steampowered.com/hwsurvey , click on "other settings" at the bottom). So this is probably OK :-). > OpenBLAS is deployed as part of the numpy wheel. That said, the scipy wheels > mentioned above are dependant on the installation of the OpenBLAS based > numpy and won't work i.e. with an installed numpy-MKL. This sounds like it probably needs to be fixed before we can recommend the scipy wheels for anyone? OTOH it might be fine to start distributing numpy wheels first. > For the numpy 32bit builds there are 3 failures for special FP value tests, > due to a bug in mingw-w64 that is still present. All scipy versions show up > 7 failures with some numerical noise, that could be ignored (or corrected > with relaxed asserts in the test code). > > PR's for numpy and scipy are in preparation. The mingw-w64 compiler used for > building can be found at > https://bitbucket.org/carlkl/mingw-w64-for-python/downloads. Correct me if I'm wrong, but it looks like there isn't any details on how exactly the compiler was set up? Which is fine, I know you've been doing a ton of work on this and it's much appreciated :-). But eventually I do think a prerequisite for us adopting these as official builds is that we'll need a text document (or an executable script!) that walks through all the steps in setting up the toolchain etc., so that someone starting from scratch could get it all up and running. Otherwise we run the risk of eventually ending up back where we are today, with a creaky old mingw binary snapshot that no-one knows how it works or how to reproduce... -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From cjw at ncf.ca Thu Jan 22 20:07:26 2015 From: cjw at ncf.ca (cjw) Date: Thu, 22 Jan 2015 20:07:26 -0500 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: <54C17458.3050206@ncf.ca> Message-ID: <54C19ECE.6030205@ncf.ca> An HTML attachment was scrubbed... URL: From cmkleffner at gmail.com Fri Jan 23 03:25:05 2015 From: cmkleffner at gmail.com (Carl Kleffner) Date: Fri, 23 Jan 2015 09:25:05 +0100 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: All tests for the 64bit builds passed. Carl 2015-01-23 0:11 GMT+01:00 Sturla Molden : > Were there any failures with the 64 bit build, or did all tests pass? > > Sturla > > > On 22/01/15 22:29, Carl Kleffner wrote: > > I took time to create mingw-w64 based wheels of numpy-1.9.1 and > > scipy-0.15.1 source distributions and put them on > > https://bitbucket.org/carlkl/mingw-w64-for-python/downloads as well as > > on binstar.org . The test matrix is python-2.7 and > > 3.4 for both 32bit and 64bit. > > > > Feedback is welcome. > > > > The wheels can be pip installed with: > > > > pip install -i https://pypi.binstar.org/carlkl/simple numpy > > pip install -i https://pypi.binstar.org/carlkl/simple scipy > > > > Some technical details: the binaries are build upon OpenBLAS as > > accelerated BLAS/Lapack. OpenBLAS itself is build with dynamic kernels > > (similar to MKL) and automatic runtime selection depending on the CPU. > > The minimal requested feature supplied by the CPU is SSE2. SSE1 and > > non-SSE CPUs are not supported with this builds. This is the default for > > 64bit binaries anyway. > > > > OpenBLAS is deployed as part of the numpy wheel. That said, the scipy > > wheels mentioned above are dependant on the installation of the OpenBLAS > > based numpy and won't work i.e. with an installed numpy-MKL. > > > > For the numpy 32bit builds there are 3 failures for special FP value > > tests, due to a bug in mingw-w64 that is still present. All scipy > > versions show up 7 failures with some numerical noise, that could be > > ignored (or corrected with relaxed asserts in the test code). > > > > PR's for numpy and scipy are in preparation. The mingw-w64 compiler used > > for building can be found at > > https://bitbucket.org/carlkl/mingw-w64-for-python/downloads. > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Fri Jan 23 10:07:08 2015 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Fri, 23 Jan 2015 16:07:08 +0100 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: 2015-01-23 9:25 GMT+01:00 Carl Kleffner : > All tests for the 64bit builds passed. Thanks very much Carl. Did you have to patch the numpy / distutils source to build those wheels are is this using the source code from the official releases? -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel From cjw at ncf.ca Sat Jan 24 09:48:46 2015 From: cjw at ncf.ca (cjw) Date: Sat, 24 Jan 2015 09:48:46 -0500 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: <54C3B0CE.7070001@ncf.ca> On 22-Jan-15 6:23 PM, Nathaniel Smith wrote: > On Thu, Jan 22, 2015 at 9:29 PM, Carl Kleffner wrote: >> I took time to create mingw-w64 based wheels of numpy-1.9.1 and scipy-0.15.1 >> source distributions and put them on >> https://bitbucket.org/carlkl/mingw-w64-for-python/downloads as well as on >> binstar.org. The test matrix is python-2.7 and 3.4 for both 32bit and 64bit. >> >> Feedback is welcome. >> >> The wheels can be pip installed with: >> >> pip install -i https://pypi.binstar.org/carlkl/simple numpy >> pip install -i https://pypi.binstar.org/carlkl/simple scipy >> >> Some technical details: the binaries are build upon OpenBLAS as accelerated >> BLAS/Lapack. OpenBLAS itself is build with dynamic kernels (similar to MKL) >> and automatic runtime selection depending on the CPU. The minimal requested >> feature supplied by the CPU is SSE2. SSE1 and non-SSE CPUs are not supported >> with this builds. This is the default for 64bit binaries anyway. > According to the steam hardware survey, 99.98% of windows computers > have SSE2. (http://store.steampowered.com/hwsurvey , click on "other > settings" at the bottom). So this is probably OK :-). > >> OpenBLAS is deployed as part of the numpy wheel. That said, the scipy wheels >> mentioned above are dependant on the installation of the OpenBLAS based >> numpy and won't work i.e. with an installed numpy-MKL. > This sounds like it probably needs to be fixed before we can recommend > the scipy wheels for anyone? OTOH it might be fine to start > distributing numpy wheels first. > >> For the numpy 32bit builds there are 3 failures for special FP value tests, >> due to a bug in mingw-w64 that is still present. All scipy versions show up >> 7 failures with some numerical noise, that could be ignored (or corrected >> with relaxed asserts in the test code). >> >> PR's for numpy and scipy are in preparation. The mingw-w64 compiler used for >> building can be found at >> https://bitbucket.org/carlkl/mingw-w64-for-python/downloads. > Correct me if I'm wrong, but it looks like there isn't any details on > how exactly the compiler was set up? Which is fine, I know you've been > doing a ton of work on this and it's much appreciated :-). But > eventually I do think a prerequisite for us adopting these as official > builds is that we'll need a text document (or an executable script!) > that walks through all the steps in setting up the toolchain etc., so > that someone starting from scratch could get it all up and running. > Otherwise we run the risk of eventually ending up back where we are > today, with a creaky old mingw binary snapshot that no-one knows how > it works or how to reproduce... > > -n > Karl, I tried and failed, even after adding --pre. My log file is here: ------------------------------------------------------------ C:\Python27\Scripts\pip run on 01/24/15 07:51:10 Downloading/unpacking https://pypi.binstar.org/carlkl/simple Downloading simple Downloading from URL https://pypi.binstar.org/carlkl/simple Cleaning up... Exception: Traceback (most recent call last): File "C:\Python27\lib\site-packages\pip\basecommand.py", line 122, in main status = self.run(options, args) File "C:\Python27\lib\site-packages\pip\commands\install.py", line 278, in run requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle) File "C:\Python27\lib\site-packages\pip\req.py", line 1197, in prepare_files do_download, File "C:\Python27\lib\site-packages\pip\req.py", line 1375, in unpack_url self.session, File "C:\Python27\lib\site-packages\pip\download.py", line 582, in unpack_http_url unpack_file(temp_location, location, content_type, link) File "C:\Python27\lib\site-packages\pip\util.py", line 627, in unpack_file and is_svn_page(file_contents(filename))): File "C:\Python27\lib\site-packages\pip\util.py", line 210, in file_contents return fp.read().decode('utf-8') File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: invalid start byte Do you have any suggestions? Colin W. From charlesr.harris at gmail.com Sat Jan 24 10:11:15 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 24 Jan 2015 08:11:15 -0700 Subject: [Numpy-discussion] What should recfromcsv defaults be? Message-ID: Hi All, This question comes apropos this bugfix #5495 that ensures that the default options get passed down the call chain. The current defaults are kwargs.setdefault("case_sensitive", "lower") kwargs.setdefault("names", True) kwargs.setdefault("delimiter", ",") kwargs.setdefault("dtype", None) The ones in question are for "names" and "case_sensitive", that, due to the bug, were defaulting to 'True' and None respectively. I think those defaults should be kept rather than the values currently specified in the recfromcsv definition. However, I don't use these tools, so would like some feedback from those who do. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmkleffner at gmail.com Sat Jan 24 12:14:10 2015 From: cmkleffner at gmail.com (Carl Kleffner) Date: Sat, 24 Jan 2015 18:14:10 +0100 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: <54C3B0CE.7070001@ncf.ca> References: <54C3B0CE.7070001@ncf.ca> Message-ID: Just a wild guess: (1) update your pip and try again (2) use the bitbucket wheels with: pip install --no-index -f https://bitbucket.org/carlkl/mingw-w64-for-python/downloads numpy pip install --no-index -f https://bitbucket.org/carlkl/mingw-w64-for-python/downloads scipy (3) check if there i something left in site-packages\numpy in the case you have uninstalled another numpy distribution before. Carl 2015-01-24 15:48 GMT+01:00 cjw : > On 22-Jan-15 6:23 PM, Nathaniel Smith wrote: > >> On Thu, Jan 22, 2015 at 9:29 PM, Carl Kleffner >> wrote: >> >>> I took time to create mingw-w64 based wheels of numpy-1.9.1 and >>> scipy-0.15.1 >>> source distributions and put them on >>> https://bitbucket.org/carlkl/mingw-w64-for-python/downloads as well as >>> on >>> binstar.org. The test matrix is python-2.7 and 3.4 for both 32bit and >>> 64bit. >>> >>> Feedback is welcome. >>> >>> The wheels can be pip installed with: >>> >>> pip install -i https://pypi.binstar.org/carlkl/simple numpy >>> pip install -i https://pypi.binstar.org/carlkl/simple scipy >>> >>> Some technical details: the binaries are build upon OpenBLAS as >>> accelerated >>> BLAS/Lapack. OpenBLAS itself is build with dynamic kernels (similar to >>> MKL) >>> and automatic runtime selection depending on the CPU. The minimal >>> requested >>> feature supplied by the CPU is SSE2. SSE1 and non-SSE CPUs are not >>> supported >>> with this builds. This is the default for 64bit binaries anyway. >>> >> According to the steam hardware survey, 99.98% of windows computers >> have SSE2. (http://store.steampowered.com/hwsurvey , click on "other >> settings" at the bottom). So this is probably OK :-). >> >> OpenBLAS is deployed as part of the numpy wheel. That said, the scipy >>> wheels >>> mentioned above are dependant on the installation of the OpenBLAS based >>> numpy and won't work i.e. with an installed numpy-MKL. >>> >> This sounds like it probably needs to be fixed before we can recommend >> the scipy wheels for anyone? OTOH it might be fine to start >> distributing numpy wheels first. >> >> For the numpy 32bit builds there are 3 failures for special FP value >>> tests, >>> due to a bug in mingw-w64 that is still present. All scipy versions show >>> up >>> 7 failures with some numerical noise, that could be ignored (or corrected >>> with relaxed asserts in the test code). >>> >>> PR's for numpy and scipy are in preparation. The mingw-w64 compiler used >>> for >>> building can be found at >>> https://bitbucket.org/carlkl/mingw-w64-for-python/downloads. >>> >> Correct me if I'm wrong, but it looks like there isn't any details on >> how exactly the compiler was set up? Which is fine, I know you've been >> doing a ton of work on this and it's much appreciated :-). But >> eventually I do think a prerequisite for us adopting these as official >> builds is that we'll need a text document (or an executable script!) >> that walks through all the steps in setting up the toolchain etc., so >> that someone starting from scratch could get it all up and running. >> Otherwise we run the risk of eventually ending up back where we are >> today, with a creaky old mingw binary snapshot that no-one knows how >> it works or how to reproduce... >> >> -n >> >> Karl, > > I tried and failed, even after adding --pre. > > My log file is here: > > ------------------------------------------------------------ > C:\Python27\Scripts\pip run on 01/24/15 07:51:10 > Downloading/unpacking https://pypi.binstar.org/carlkl/simple > Downloading simple > Downloading from URL https://pypi.binstar.org/carlkl/simple > Cleaning up... > Exception: > Traceback (most recent call last): > File "C:\Python27\lib\site-packages\pip\basecommand.py", line 122, in > main > status = self.run(options, args) > File "C:\Python27\lib\site-packages\pip\commands\install.py", line 278, > in run > requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, > bundle=self.bundle) > File "C:\Python27\lib\site-packages\pip\req.py", line 1197, in > prepare_files > do_download, > File "C:\Python27\lib\site-packages\pip\req.py", line 1375, in > unpack_url > self.session, > File "C:\Python27\lib\site-packages\pip\download.py", line 582, in > unpack_http_url > unpack_file(temp_location, location, content_type, link) > File "C:\Python27\lib\site-packages\pip\util.py", line 627, in > unpack_file > and is_svn_page(file_contents(filename))): > File "C:\Python27\lib\site-packages\pip\util.py", line 210, in > file_contents > return fp.read().decode('utf-8') > File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode > return codecs.utf_8_decode(input, errors, True) > UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: > invalid start byte > > Do you have any suggestions? > > Colin W. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmkleffner at gmail.com Sat Jan 24 12:29:52 2015 From: cmkleffner at gmail.com (Carl Kleffner) Date: Sat, 24 Jan 2015 18:29:52 +0100 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: 2015-01-23 0:23 GMT+01:00 Nathaniel Smith : > On Thu, Jan 22, 2015 at 9:29 PM, Carl Kleffner > wrote: > > I took time to create mingw-w64 based wheels of numpy-1.9.1 and > scipy-0.15.1 > > source distributions and put them on > > https://bitbucket.org/carlkl/mingw-w64-for-python/downloads as well as > on > > binstar.org. The test matrix is python-2.7 and 3.4 for both 32bit and > 64bit. > > > > Feedback is welcome. > > > > The wheels can be pip installed with: > > > > pip install -i https://pypi.binstar.org/carlkl/simple numpy > > pip install -i https://pypi.binstar.org/carlkl/simple scipy > > > > Some technical details: the binaries are build upon OpenBLAS as > accelerated > > BLAS/Lapack. OpenBLAS itself is build with dynamic kernels (similar to > MKL) > > and automatic runtime selection depending on the CPU. The minimal > requested > > feature supplied by the CPU is SSE2. SSE1 and non-SSE CPUs are not > supported > > with this builds. This is the default for 64bit binaries anyway. > > According to the steam hardware survey, 99.98% of windows computers > have SSE2. (http://store.steampowered.com/hwsurvey , click on "other > settings" at the bottom). So this is probably OK :-). > > > OpenBLAS is deployed as part of the numpy wheel. That said, the scipy > wheels > > mentioned above are dependant on the installation of the OpenBLAS based > > numpy and won't work i.e. with an installed numpy-MKL. > > This sounds like it probably needs to be fixed before we can recommend > the scipy wheels for anyone? OTOH it might be fine to start > distributing numpy wheels first. > I very much prefer dynamic linking to numpy\core\libopenblas.dll instead of static linking to avoid bloat. This matters, because libopenblas.dll is a heavy library (around 30Mb for amd64). As a consequence all packages with dynamic linkage to OpenBLAS depend on numpy-openblas. This is not different to scipy-MKL that has a dependancy to numpy-MKL - see C. Gohlke's site. > For the numpy 32bit builds there are 3 failures for special FP value > tests, > > due to a bug in mingw-w64 that is still present. All scipy versions show > up > > 7 failures with some numerical noise, that could be ignored (or corrected > > with relaxed asserts in the test code). > > > > PR's for numpy and scipy are in preparation. The mingw-w64 compiler used > for > > building can be found at > > https://bitbucket.org/carlkl/mingw-w64-for-python/downloads. > > Correct me if I'm wrong, but it looks like there isn't any details on > how exactly the compiler was set up? Which is fine, I know you've been > doing a ton of work on this and it's much appreciated :-). But > eventually I do think a prerequisite for us adopting these as official > builds is that we'll need a text document (or an executable script!) > that walks through all the steps in setting up the toolchain etc., so > that someone starting from scratch could get it all up and running. > Otherwise we run the risk of eventually ending up back where we are > today, with a creaky old mingw binary snapshot that no-one knows how > it works or how to reproduce... > This has to be done and is in preperation, but not ready for consumption right now. Some preliminary information is given here: https://bitbucket.org/carlkl/mingw-w64-for-python/downloads/mingwstatic-2014-11-readme.md > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjw at ncf.ca Sun Jan 25 06:39:09 2015 From: cjw at ncf.ca (cjw) Date: Sun, 25 Jan 2015 06:39:09 -0500 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: <54C3B0CE.7070001@ncf.ca> Message-ID: <54C4D5DD.5070507@ncf.ca> On 24-Jan-15 12:14 PM, Carl Kleffner wrote: > Just a wild guess: > > (1) update your pip and try again Thanks. My pip version was 1,5,6, it is now 6.0.6 > > (2) use the bitbucket wheels with: > pip install --no-index -f > https://bitbucket.org/carlkl/mingw-w64-for-python/downloads numpy Successfully installed numpy-1.9.1 > pip install --no-index -f > https://bitbucket.org/carlkl/mingw-w64-for-python/downloads scipy Successfully installed scipy-0.15.1 > > (3) check if there i something left in site-packages\numpy in the case you > have uninstalled another numpy distribution before. Could you be more specific please? C:\Python27\Lib\site-packages>python Python 2.7.9 (default, Dec 10 2014, 12:28:03) [MSC v.1500 64 bit (AMD64)] on win32 C:\Python27\Lib>cd site-packages C:\Python27\Lib\site-packages> dir Volume in drive C has no label. Volume Serial Number is 9691-2C8F Directory of C:\Python27\Lib\site-packages 25-Jan-15 06:11 AM . 25-Jan-15 06:11 AM .. 26-Nov-14 06:30 PM 909 apsw-3.8.7-py2.7.egg-info 26-Nov-14 06:30 PM 990,208 apsw.pyd 11-Dec-14 06:35 PM astroid 11-Dec-14 06:35 PM astroid-1.3.2.dist-info 11-Dec-14 06:35 PM colorama 11-Dec-14 06:35 PM colorama-0.3.2-py2.7.egg-info 09-Sep-14 09:38 AM dateutil 29-Dec-14 10:16 PM 126 easy_install.py 29-Dec-14 10:16 PM 343 easy_install.pyc 27-Dec-14 09:18 AM epydoc 21-Jan-13 03:19 PM 297 epydoc-3.0.1-py2.7.egg-info 11-Dec-14 06:35 PM logilab 11-Dec-14 06:35 PM 309 logilab_common-0.63.2-py2.7-nspkg.pth 11-Dec-14 06:35 PM logilab_common-0.63.2-py2.7.egg-info 16-Nov-14 04:02 PM matplotlib 22-Oct-14 03:11 PM 324 matplotlib-1.4.2-py2.7-nspkg.pth 16-Nov-14 04:02 PM matplotlib-1.4.2-py2.7.egg-info 16-Nov-14 04:02 PM mpl_toolkits 25-Jan-15 06:07 AM numpy 25-Jan-15 06:07 AM numpy-1.9.1.dist-info 25-Jan-15 06:01 AM pip 25-Jan-15 06:01 AM pip-6.0.6.dist-info 29-Dec-14 10:16 PM 101,530 pkg_resources.py 29-Dec-14 10:16 PM 115,360 pkg_resources.pyc 10-Sep-14 12:30 PM pycparser 10-Sep-14 12:30 PM pycparser-2.10-py2.7.egg-info 16-Dec-14 08:21 AM pygame 25-Mar-14 11:03 AM 543 pygame-1.9.2a0-py2.7.egg-info 24-Nov-14 06:55 AM pygit2 24-Nov-14 06:55 AM pygit2-0.21.3-py2.7.egg-info 26-Mar-14 01:23 PM 90 pylab.py 16-Nov-14 04:02 PM 237 pylab.pyc 16-Nov-14 04:02 PM 237 pylab.pyo 11-Dec-14 06:35 PM pylint 11-Dec-14 06:35 PM pylint-1.4.0.dist-info 11-Sep-14 08:26 PM pyparsing-2.0.2-py2.7.egg-info 24-Nov-14 06:27 AM 157,300 pyparsing.py 30-Nov-14 08:51 AM 154,996 pyparsing.pyc 09-Sep-14 09:38 AM python_dateutil-2.2-py2.7.egg-info 09-Sep-14 10:03 AM pytz 09-Sep-14 10:03 AM pytz-2014.7-py2.7.egg-info 30-Apr-14 08:54 AM 119 README.txt 25-Jan-15 06:11 AM scipy 25-Jan-15 06:11 AM scipy-0.15.1.dist-info 29-Dec-14 10:16 PM setuptools 29-Dec-14 10:16 PM setuptools-7.0.dist-info 09-Sep-14 09:50 AM six-1.7.3.dist-info 09-Sep-14 09:50 AM 26,518 six.py 09-Sep-14 09:50 AM 28,288 six.pyc 21-Dec-14 07:56 PM System 21-Dec-14 07:55 PM User 21-Sep-14 06:00 PM 878,592 _cffi__xf1819144xd61e91d9.pyd 29-Dec-14 10:16 PM _markerlib 21-Sep-14 06:00 PM 890,368 _pygit2.pyd 20 File(s) 3,346,694 bytes 36 Dir(s) 9,810,276,352 bytes free C:\Python27\Lib\site-packages> > I tried and failed, even after adding --pre. > > My log file is here: > > ------------------------------------------------------------ > C:\Python27\Scripts\pip run on 01/24/15 07:51:10 > Downloading/unpacking https://pypi.binstar.org/carlkl/simple > Downloading simple > Downloading from URL https://pypi.binstar.org/carlkl/simple > Cleaning up... > Exception: > Traceback (most recent call last): > File "C:\Python27\lib\site-packages\pip\basecommand.py", line 122, in > main > status = self.run(options, args) > File "C:\Python27\lib\site-packages\pip\commands\install.py", line 278, > in run > requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, > bundle=self.bundle) > File "C:\Python27\lib\site-packages\pip\req.py", line 1197, in > prepare_files > do_download, > File "C:\Python27\lib\site-packages\pip\req.py", line 1375, in > unpack_url > self.session, > File "C:\Python27\lib\site-packages\pip\download.py", line 582, in > unpack_http_url > unpack_file(temp_location, location, content_type, link) > File "C:\Python27\lib\site-packages\pip\util.py", line 627, in > unpack_file > and is_svn_page(file_contents(filename))): > File "C:\Python27\lib\site-packages\pip\util.py", line 210, in > file_contents > return fp.read().decode('utf-8') > File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode > return codecs.utf_8_decode(input, errors, True) > UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: > invalid start byte > > Do you have any suggestions? > > Colin W. > From sturla.molden at gmail.com Sun Jan 25 06:57:37 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 25 Jan 2015 11:57:37 +0000 (UTC) Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) References: Message-ID: <100463511443879398.146150sturla.molden-gmail.com@news.gmane.org> Carl Kleffner wrote: > I very much prefer dynamic linking to numpy\core\libopenblas.dll instead of > static linking to avoid bloat. This matters, because libopenblas.dll is a > heavy library (around 30Mb for amd64). As a consequence all packages with > dynamic linkage to OpenBLAS depend on numpy-openblas. This is not different > to scipy-MKL that has a dependancy to numpy-MKL - see C. Gohlke's site. It is probably ok if we name the OpenBLAS DLL something else than libopenblas.dll. We could e.g. add to the filename a combined hash for NumPy version, CPU, OpenBLAS version, Python version, C compiler, platform, build number, etc. Sturla From njs at pobox.com Sun Jan 25 10:46:27 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 25 Jan 2015 15:46:27 +0000 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: On Sat, Jan 24, 2015 at 5:29 PM, Carl Kleffner wrote: > > 2015-01-23 0:23 GMT+01:00 Nathaniel Smith : >> >> On Thu, Jan 22, 2015 at 9:29 PM, Carl Kleffner >> wrote: >> > OpenBLAS is deployed as part of the numpy wheel. That said, the scipy >> > wheels >> > mentioned above are dependant on the installation of the OpenBLAS based >> > numpy and won't work i.e. with an installed numpy-MKL. >> >> This sounds like it probably needs to be fixed before we can recommend >> the scipy wheels for anyone? OTOH it might be fine to start >> distributing numpy wheels first. > > > I very much prefer dynamic linking to numpy\core\libopenblas.dll instead of > static linking to avoid bloat. This matters, because libopenblas.dll is a > heavy library (around 30Mb for amd64). As a consequence all packages with > dynamic linkage to OpenBLAS depend on numpy-openblas. This is not different > to scipy-MKL that has a dependancy to numpy-MKL - see C. Gohlke's site. The difference is that if we upload this as the standard scipy wheel, and then someone goes "hey, look, a new scipy release just got announced, 'pip upgrade scipy'", then the result will often be that they just get random unexplained crashes. I think we should try to avoid that kind of outcome, even if it means making some technical compromises. The whole idea of having the wheels is to make fetching particular versions seamless and robust, and the other kinds of builds will still be available for those willing to invest more effort. One solution would be for the scipy wheel to explicitly depend on a numpy+openblas wheel, so that someone doing 'pip install scipy' also forced a numpy upgrade. But I think we should forget about trying this given the current state of python packaging tools: pip/setuptools/etc. are not really sophisticated enough to let us do this without a lot of kluges and compromises, and anyway it is nicer to allow scipy and numpy to be upgraded separately. Another solution would be to just include openblas in both. This bloats downloads, but I'd rather waste 30 MiB then waste users' time fighting with random library incompatibility nonsense that they don't care about. Another solution would be to split the openblas library off into its own "python package", that just dropped the binary somewhere where it could be found later, and then have both the numpy and scipy wheels depend on this package. We could start with the brute force solution (just including openblas in both) for the first release, and then upgrade to the fancier solution (both depend on a separate package) later. >> > For the numpy 32bit builds there are 3 failures for special FP value >> > tests, >> > due to a bug in mingw-w64 that is still present. All scipy versions show >> > up >> > 7 failures with some numerical noise, that could be ignored (or >> > corrected >> > with relaxed asserts in the test code). >> > >> > PR's for numpy and scipy are in preparation. The mingw-w64 compiler used >> > for >> > building can be found at >> > https://bitbucket.org/carlkl/mingw-w64-for-python/downloads. >> >> Correct me if I'm wrong, but it looks like there isn't any details on >> how exactly the compiler was set up? Which is fine, I know you've been >> doing a ton of work on this and it's much appreciated :-). But >> eventually I do think a prerequisite for us adopting these as official >> builds is that we'll need a text document (or an executable script!) >> that walks through all the steps in setting up the toolchain etc., so >> that someone starting from scratch could get it all up and running. >> Otherwise we run the risk of eventually ending up back where we are >> today, with a creaky old mingw binary snapshot that no-one knows how >> it works or how to reproduce... > > > This has to be done and is in preperation, but not ready for consumption > right now. Some preliminary information is given here: > https://bitbucket.org/carlkl/mingw-w64-for-python/downloads/mingwstatic-2014-11-readme.md Right, I read that :-). There's no way that I could sit down with that document and a clean windows install and replicate your mingw-w64 toolchain, though :-). Which, like I said, is totally fine at this stage in the process, I just wanted to make sure that this step is on the radar, b/c it will eventually become crucial. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From cmkleffner at gmail.com Sun Jan 25 13:46:49 2015 From: cmkleffner at gmail.com (Carl Kleffner) Date: Sun, 25 Jan 2015 19:46:49 +0100 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: 2015-01-25 16:46 GMT+01:00 Nathaniel Smith : > On Sat, Jan 24, 2015 at 5:29 PM, Carl Kleffner > wrote: > > > > 2015-01-23 0:23 GMT+01:00 Nathaniel Smith : > >> > >> On Thu, Jan 22, 2015 at 9:29 PM, Carl Kleffner > >> wrote: > >> > OpenBLAS is deployed as part of the numpy wheel. That said, the scipy > >> > wheels > >> > mentioned above are dependant on the installation of the OpenBLAS > based > >> > numpy and won't work i.e. with an installed numpy-MKL. > >> > >> This sounds like it probably needs to be fixed before we can recommend > >> the scipy wheels for anyone? OTOH it might be fine to start > >> distributing numpy wheels first. > > > > > > I very much prefer dynamic linking to numpy\core\libopenblas.dll instead > of > > static linking to avoid bloat. This matters, because libopenblas.dll is a > > heavy library (around 30Mb for amd64). As a consequence all packages with > > dynamic linkage to OpenBLAS depend on numpy-openblas. This is not > different > > to scipy-MKL that has a dependancy to numpy-MKL - see C. Gohlke's site. > > The difference is that if we upload this as the standard scipy wheel, > and then someone goes "hey, look, a new scipy release just got > announced, 'pip upgrade scipy'", then the result will often be that > they just get random unexplained crashes. I think we should try to > avoid that kind of outcome, even if it means making some technical > compromises. The whole idea of having the wheels is to make fetching > particular versions seamless and robust, and the other kinds of builds > will still be available for those willing to invest more effort. > > One solution would be for the scipy wheel to explicitly depend on a > numpy+openblas wheel, so that someone doing 'pip install scipy' also > forced a numpy upgrade. But I think we should forget about trying this > given the current state of python packaging tools: pip/setuptools/etc. > are not really sophisticated enough to let us do this without a lot of > kluges and compromises, and anyway it is nicer to allow scipy and > numpy to be upgraded separately. > I've learned, that mark numpy with something like numpy+openblas is called "local version identifier": https://www.python.org/dev/peps/pep-0440/#local-version-identifiers These identifieres are not allowed for Pypi however. > > Another solution would be to just include openblas in both. This > bloats downloads, but I'd rather waste 30 MiB then waste users' time > fighting with random library incompatibility nonsense that they don't > care about. > > Another solution would be to split the openblas library off into its > own "python package", that just dropped the binary somewhere where it > could be found later, and then have both the numpy and scipy wheels > depend on this package. > Creating a dedicated OpenBLAS package and adding this package as an dependancy to numpy/scipy would also allow independant upgrade paths to OpenBLAS, numpy and scipy. The API of OpenBLAS seems to be stable enough to allow for that. Having an additional package dependancy is a minor problem, as pip can handle this automatically for the user. > We could start with the brute force solution (just including openblas > in both) for the first release, and then upgrade to the fancier > solution (both depend on a separate package) later. > > >> > For the numpy 32bit builds there are 3 failures for special FP value > >> > tests, > >> > due to a bug in mingw-w64 that is still present. All scipy versions > show > >> > up > >> > 7 failures with some numerical noise, that could be ignored (or > >> > corrected > >> > with relaxed asserts in the test code). > >> > > >> > PR's for numpy and scipy are in preparation. The mingw-w64 compiler > used > >> > for > >> > building can be found at > >> > https://bitbucket.org/carlkl/mingw-w64-for-python/downloads. > >> > >> Correct me if I'm wrong, but it looks like there isn't any details on > >> how exactly the compiler was set up? Which is fine, I know you've been > >> doing a ton of work on this and it's much appreciated :-). But > >> eventually I do think a prerequisite for us adopting these as official > >> builds is that we'll need a text document (or an executable script!) > >> that walks through all the steps in setting up the toolchain etc., so > >> that someone starting from scratch could get it all up and running. > >> Otherwise we run the risk of eventually ending up back where we are > >> today, with a creaky old mingw binary snapshot that no-one knows how > >> it works or how to reproduce... > > > > > > This has to be done and is in preperation, but not ready for consumption > > right now. Some preliminary information is given here: > > > https://bitbucket.org/carlkl/mingw-w64-for-python/downloads/mingwstatic-2014-11-readme.md > > Right, I read that :-). There's no way that I could sit down with that > document and a clean windows install and replicate your mingw-w64 > toolchain, though :-). Which, like I said, is totally fine at this > stage in the process, I just wanted to make sure that this step is on > the radar, b/c it will eventually become crucial. > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Sun Jan 25 13:48:41 2015 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sun, 25 Jan 2015 13:48:41 -0500 Subject: [Numpy-discussion] New function `count_unique` to generate contingency tables. In-Reply-To: References: Message-ID: On Wed, Aug 13, 2014 at 6:17 PM, Eelco Hoogendoorn < hoogendoorn.eelco at gmail.com> wrote: > Its pretty easy to implement this table functionality and more on top of > the code I linked above. I still think such a comprehensive overhaul of > arraysetops is worth discussing. > > import numpy as np > import grouping > x = [1, 1, 1, 1, 2, 2, 2, 2, 2] > y = [3, 4, 3, 3, 3, 4, 5, 5, 5] > z = np.random.randint(0,2,(9,2)) > def table(*keys): > """ > desired table implementation, building on the index object > cleaner, and more functionality > performance should be the same > """ > indices = [grouping.as_index(k, axis=0) for k in keys] > uniques = [i.unique for i in indices] > inverses = [i.inverse for i in indices] > shape = [i.groups for i in indices] > t = np.zeros(shape, np.int) > np.add.at(t, inverses, 1) > return tuple(uniques), t > #here is how to use > print table(x,y) > #but we can use fancy keys as well; here a composite key and a row-key > print table((x,y), z) > #this effectively creates a sparse matrix equivalent of your desired table > print grouping.count((x,y)) > > > On Wed, Aug 13, 2014 at 11:25 PM, Warren Weckesser < > warren.weckesser at gmail.com> wrote: > >> >> >> >> On Wed, Aug 13, 2014 at 5:15 PM, Benjamin Root wrote: >> >>> The ever-wonderful pylab mode in matplotlib has a table function for >>> plotting a table of text in a plot. If I remember correctly, what would >>> happen is that matplotlib's table() function will simply obliterate the >>> numpy's table function. This isn't a show-stopper, I just wanted to point >>> that out. >>> >>> Personally, while I wasn't a particular fan of "count_unique" because I >>> wouldn't necessarially think of it when needing a contingency table, I do >>> like that it is verb-ish. "table()", in this sense, is not a verb. That >>> said, I am perfectly fine with it if you are fine with the name collision >>> in pylab mode. >>> >>> >> >> Thanks for pointing that out. I only changed it to have something that >> sounded more table-ish, like the Pandas, R and Matlab functions. I won't >> update it right now, but if there is interest in putting it into numpy, >> I'll rename it to avoid the pylab conflict. Anything along the lines of >> `crosstab`, `xtable`, etc., would be fine with me. >> >> Warren >> >> >> >>> On Wed, Aug 13, 2014 at 4:57 PM, Warren Weckesser < >>> warren.weckesser at gmail.com> wrote: >>> >>>> >>>> >>>> >>>> On Tue, Aug 12, 2014 at 12:51 PM, Eelco Hoogendoorn < >>>> hoogendoorn.eelco at gmail.com> wrote: >>>> >>>>> ah yes, that's also an issue I was trying to deal with. the semantics >>>>> I prefer in these type of operators, is (as a default), to have every array >>>>> be treated as a sequence of keys, so if calling unique(arr_2d), youd get >>>>> unique rows, unless you pass axis=None, in which case the array is >>>>> flattened. >>>>> >>>>> I also agree that the extension you propose here is useful; but >>>>> ideally, with a little more discussion on these subjects we can converge on >>>>> an even more comprehensive overhaul >>>>> >>>>> >>>>> On Tue, Aug 12, 2014 at 6:33 PM, Joe Kington >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Aug 12, 2014 at 11:17 AM, Eelco Hoogendoorn < >>>>>> hoogendoorn.eelco at gmail.com> wrote: >>>>>> >>>>>>> Thanks. Prompted by that stackoverflow question, and similar >>>>>>> problems I had to deal with myself, I started working on a much more >>>>>>> general extension to numpy's functionality in this space. Like you noted, >>>>>>> things get a little panda-y, but I think there is a lot of panda's >>>>>>> functionality that could or should be part of the numpy core, a robust set >>>>>>> of grouping operations in particular. >>>>>>> >>>>>>> see pastebin here: >>>>>>> http://pastebin.com/c5WLWPbp >>>>>>> >>>>>> >>>>>> On a side note, this is related to a pull request of mine from awhile >>>>>> back: https://github.com/numpy/numpy/pull/3584 >>>>>> >>>>>> There was a lot of disagreement on the mailing list about what to >>>>>> call a "unique slices along a given axis" function, so I wound up closing >>>>>> the pull request pending more discussion. >>>>>> >>>>>> At any rate, I think it's a useful thing to have in "base" numpy. >>>>>> >>>>>> _______________________________________________ >>>>>> NumPy-Discussion mailing list >>>>>> NumPy-Discussion at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>>> >>>> >>>> Update: I renamed the function to `table` in the pull request: >>>> https://github.com/numpy/numpy/pull/4958 >>>> >>>> >>>> Warren >>>> >>>> Hey all, I'm reviving this thread about the proposed `table` enhancement in https://github.com/numpy/numpy/pull/4958, because Chuck has poked me (via the pull request ) about it, so I'm poking the mailing list. Ignoring the issue of the name for the moment, is there any opposition to adding the proposed `table` function to numpy? I don't think it would preclude adding more powerful tools later, but that's not something I have time to work on at the moment. If the only issue is the name, I'm open to any suggestions. I started with `count_unique`, and changed it to `table`, but Benjamin pointed out the potential conflict of `table` with a matplotlib function. Warren _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Sun Jan 25 14:00:11 2015 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sun, 25 Jan 2015 14:00:11 -0500 Subject: [Numpy-discussion] New function `count_unique` to generate contingency tables. In-Reply-To: References: Message-ID: On Sun, Jan 25, 2015 at 1:48 PM, Warren Weckesser < warren.weckesser at gmail.com> wrote: > > > On Wed, Aug 13, 2014 at 6:17 PM, Eelco Hoogendoorn < > hoogendoorn.eelco at gmail.com> wrote: > >> Its pretty easy to implement this table functionality and more on top of >> the code I linked above. I still think such a comprehensive overhaul of >> arraysetops is worth discussing. >> >> import numpy as np >> import grouping >> x = [1, 1, 1, 1, 2, 2, 2, 2, 2] >> y = [3, 4, 3, 3, 3, 4, 5, 5, 5] >> z = np.random.randint(0,2,(9,2)) >> def table(*keys): >> """ >> desired table implementation, building on the index object >> cleaner, and more functionality >> performance should be the same >> """ >> indices = [grouping.as_index(k, axis=0) for k in keys] >> uniques = [i.unique for i in indices] >> inverses = [i.inverse for i in indices] >> shape = [i.groups for i in indices] >> t = np.zeros(shape, np.int) >> np.add.at(t, inverses, 1) >> return tuple(uniques), t >> #here is how to use >> print table(x,y) >> #but we can use fancy keys as well; here a composite key and a row-key >> print table((x,y), z) >> #this effectively creates a sparse matrix equivalent of your desired table >> print grouping.count((x,y)) >> >> >> On Wed, Aug 13, 2014 at 11:25 PM, Warren Weckesser < >> warren.weckesser at gmail.com> wrote: >> >>> >>> >>> >>> On Wed, Aug 13, 2014 at 5:15 PM, Benjamin Root wrote: >>> >>>> The ever-wonderful pylab mode in matplotlib has a table function for >>>> plotting a table of text in a plot. If I remember correctly, what would >>>> happen is that matplotlib's table() function will simply obliterate the >>>> numpy's table function. This isn't a show-stopper, I just wanted to point >>>> that out. >>>> >>>> Personally, while I wasn't a particular fan of "count_unique" because I >>>> wouldn't necessarially think of it when needing a contingency table, I do >>>> like that it is verb-ish. "table()", in this sense, is not a verb. That >>>> said, I am perfectly fine with it if you are fine with the name collision >>>> in pylab mode. >>>> >>>> >>> >>> Thanks for pointing that out. I only changed it to have something that >>> sounded more table-ish, like the Pandas, R and Matlab functions. I won't >>> update it right now, but if there is interest in putting it into numpy, >>> I'll rename it to avoid the pylab conflict. Anything along the lines of >>> `crosstab`, `xtable`, etc., would be fine with me. >>> >>> Warren >>> >>> >>> >>>> On Wed, Aug 13, 2014 at 4:57 PM, Warren Weckesser < >>>> warren.weckesser at gmail.com> wrote: >>>> >>>>> >>>>> >>>>> >>>>> On Tue, Aug 12, 2014 at 12:51 PM, Eelco Hoogendoorn < >>>>> hoogendoorn.eelco at gmail.com> wrote: >>>>> >>>>>> ah yes, that's also an issue I was trying to deal with. the semantics >>>>>> I prefer in these type of operators, is (as a default), to have every array >>>>>> be treated as a sequence of keys, so if calling unique(arr_2d), youd get >>>>>> unique rows, unless you pass axis=None, in which case the array is >>>>>> flattened. >>>>>> >>>>>> I also agree that the extension you propose here is useful; but >>>>>> ideally, with a little more discussion on these subjects we can converge on >>>>>> an even more comprehensive overhaul >>>>>> >>>>>> >>>>>> On Tue, Aug 12, 2014 at 6:33 PM, Joe Kington >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Aug 12, 2014 at 11:17 AM, Eelco Hoogendoorn < >>>>>>> hoogendoorn.eelco at gmail.com> wrote: >>>>>>> >>>>>>>> Thanks. Prompted by that stackoverflow question, and similar >>>>>>>> problems I had to deal with myself, I started working on a much more >>>>>>>> general extension to numpy's functionality in this space. Like you noted, >>>>>>>> things get a little panda-y, but I think there is a lot of panda's >>>>>>>> functionality that could or should be part of the numpy core, a robust set >>>>>>>> of grouping operations in particular. >>>>>>>> >>>>>>>> see pastebin here: >>>>>>>> http://pastebin.com/c5WLWPbp >>>>>>>> >>>>>>> >>>>>>> On a side note, this is related to a pull request of mine from >>>>>>> awhile back: https://github.com/numpy/numpy/pull/3584 >>>>>>> >>>>>>> There was a lot of disagreement on the mailing list about what to >>>>>>> call a "unique slices along a given axis" function, so I wound up closing >>>>>>> the pull request pending more discussion. >>>>>>> >>>>>>> At any rate, I think it's a useful thing to have in "base" numpy. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> NumPy-Discussion mailing list >>>>>>> NumPy-Discussion at scipy.org >>>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> NumPy-Discussion mailing list >>>>>> NumPy-Discussion at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>>> >>>>>> >>>>> >>>>> Update: I renamed the function to `table` in the pull request: >>>>> https://github.com/numpy/numpy/pull/4958 >>>>> >>>>> >>>>> Warren >>>>> >>>>> > > Hey all, > > I'm reviving this thread about the proposed `table` enhancement in > https://github.com/numpy/numpy/pull/4958, because Chuck has poked me (via > the pull request ) about it, so I'm poking the mailing list. Ignoring the > issue of the name for the moment, is there any opposition to adding the > proposed `table` function to numpy? I don't think it would preclude adding > more powerful tools later, but that's not something I have time to work on > at the moment. > > If the only issue is the name, I'm open to any suggestions. I started > with `count_unique`, and changed it to `table`, but Benjamin pointed out > the potential conflict of `table` with a matplotlib function. > > Warren > Looks like the original email in the thread is not part of the quoted (and somewhat disordered) emails. Here's my original email from last August: http://mail.scipy.org/pipermail/numpy-discussion/2014-August/070941.html Warren > > > > > _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aldcroft at head.cfa.harvard.edu Sun Jan 25 14:32:14 2015 From: aldcroft at head.cfa.harvard.edu (Aldcroft, Thomas) Date: Sun, 25 Jan 2015 14:32:14 -0500 Subject: [Numpy-discussion] New function `count_unique` to generate contingency tables. In-Reply-To: References: Message-ID: On Tue, Aug 12, 2014 at 12:17 PM, Eelco Hoogendoorn < hoogendoorn.eelco at gmail.com> wrote: > Thanks. Prompted by that stackoverflow question, and similar problems I > had to deal with myself, I started working on a much more general extension > to numpy's functionality in this space. Like you noted, things get a little > panda-y, but I think there is a lot of panda's functionality that could or > should be part of the numpy core, a robust set of grouping operations in > particular. > FYI I wrote some table grouping operations (join, hstack, vstack) for numpy some time ago, available here: https://github.com/astropy/astropy/blob/v0.4.x/astropy/table/np_utils.py These are part of the astropy project but this module has no actual astropy dependencies apart from a local backport of OrderedDict for Python < 2.7. Cheers, Tom > see pastebin here: > http://pastebin.com/c5WLWPbp > > Ive posted about it on this list before, but without apparent interest; > and I havnt gotten around to getting this up to professional standards yet > either. But there is a lot more that could be done in this direction. > > Note that the count functionality in the stackoverflow answer is > relatively indirect and inefficient, using the inverse_index and such. A > much more efficient method is obtained by the code used here. > > > On Tue, Aug 12, 2014 at 5:57 PM, Warren Weckesser < > warren.weckesser at gmail.com> wrote: > >> >> >> >> On Tue, Aug 12, 2014 at 11:35 AM, Warren Weckesser < >> warren.weckesser at gmail.com> wrote: >> >>> I created a pull request (https://github.com/numpy/numpy/pull/4958) >>> that defines the function `count_unique`. `count_unique` generates a >>> contingency table from a collection of sequences. For example, >>> >>> In [7]: x = [1, 1, 1, 1, 2, 2, 2, 2, 2] >>> >>> In [8]: y = [3, 4, 3, 3, 3, 4, 5, 5, 5] >>> >>> In [9]: (xvals, yvals), counts = count_unique(x, y) >>> >>> In [10]: xvals >>> Out[10]: array([1, 2]) >>> >>> In [11]: yvals >>> Out[11]: array([3, 4, 5]) >>> >>> In [12]: counts >>> Out[12]: >>> array([[3, 1, 0], >>> [1, 1, 3]]) >>> >>> >>> It can be interpreted as a multi-argument generalization of >>> `np.unique(x, return_counts=True)`. >>> >>> It overlaps with Pandas' `crosstab`, but I think this is a pretty >>> fundamental counting operation that fits in numpy. >>> >>> Matlab's `crosstab` (http://www.mathworks.com/help/stats/crosstab.html) >>> and R's `table` perform the same calculation (with a few more bells and >>> whistles). >>> >>> >>> For comparison, here's Pandas' `crosstab` (same `x` and `y` as above): >>> >>> In [28]: import pandas as pd >>> >>> In [29]: xs = pd.Series(x) >>> >>> In [30]: ys = pd.Series(y) >>> >>> In [31]: pd.crosstab(xs, ys) >>> Out[31]: >>> col_0 3 4 5 >>> row_0 >>> 1 3 1 0 >>> 2 1 1 3 >>> >>> >>> And here is R's `table`: >>> >>> > x <- c(1,1,1,1,2,2,2,2,2) >>> > y <- c(3,4,3,3,3,4,5,5,5) >>> > table(x, y) >>> y >>> x 3 4 5 >>> 1 3 1 0 >>> 2 1 1 3 >>> >>> >>> Is there any interest in adding this (or some variation of it) to numpy? >>> >>> >>> Warren >>> >>> >> >> While searching StackOverflow in the numpy tag for "count unique", I just >> discovered that I basically reinvented Eelco Hoogendoorn's code in his >> answer to >> http://stackoverflow.com/questions/10741346/numpy-frequency-counts-for-unique-values-in-an-array. >> Nice one, Eelco! >> >> Warren >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Jan 25 15:23:11 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 25 Jan 2015 20:23:11 +0000 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: On 25 Jan 2015 18:46, "Carl Kleffner" wrote: > > 2015-01-25 16:46 GMT+01:00 Nathaniel Smith : >> >> On Sat, Jan 24, 2015 at 5:29 PM, Carl Kleffner wrote: >> > >> > 2015-01-23 0:23 GMT+01:00 Nathaniel Smith : >> >> >> >> On Thu, Jan 22, 2015 at 9:29 PM, Carl Kleffner >> >> wrote: >> >> > OpenBLAS is deployed as part of the numpy wheel. That said, the scipy >> >> > wheels >> >> > mentioned above are dependant on the installation of the OpenBLAS based >> >> > numpy and won't work i.e. with an installed numpy-MKL. >> >> >> >> This sounds like it probably needs to be fixed before we can recommend >> >> the scipy wheels for anyone? OTOH it might be fine to start >> >> distributing numpy wheels first. >> > >> > >> > I very much prefer dynamic linking to numpy\core\libopenblas.dll instead of >> > static linking to avoid bloat. This matters, because libopenblas.dll is a >> > heavy library (around 30Mb for amd64). As a consequence all packages with >> > dynamic linkage to OpenBLAS depend on numpy-openblas. This is not different >> > to scipy-MKL that has a dependancy to numpy-MKL - see C. Gohlke's site. >> >> The difference is that if we upload this as the standard scipy wheel, >> and then someone goes "hey, look, a new scipy release just got >> announced, 'pip upgrade scipy'", then the result will often be that >> they just get random unexplained crashes. I think we should try to >> avoid that kind of outcome, even if it means making some technical >> compromises. The whole idea of having the wheels is to make fetching >> particular versions seamless and robust, and the other kinds of builds >> will still be available for those willing to invest more effort. >> >> One solution would be for the scipy wheel to explicitly depend on a >> numpy+openblas wheel, so that someone doing 'pip install scipy' also >> forced a numpy upgrade. But I think we should forget about trying this >> given the current state of python packaging tools: pip/setuptools/etc. >> are not really sophisticated enough to let us do this without a lot of >> kluges and compromises, and anyway it is nicer to allow scipy and >> numpy to be upgraded separately. > > > I've learned, that mark numpy with something like numpy+openblas is called "local version identifier": https://www.python.org/dev/peps/pep-0440/#local-version-identifiers > These identifieres are not allowed for Pypi however. Right, it's fine for the testing wheels, but even if it were allowed on pypi then it still wouldn't let us specify the correct dependency -- we'd have to say that scipy build X depends on exactly numpy 1.9.1+openblas, not numpy +openblas. So then when a new version of numpy was uploaded it'd be impossible to upgrade without also rebuilding numpy. Alternatively pip would be within its rights to simply ignore the local version part, because "Local version identifiers are used to denote fully API (and, if applicable, ABI) compatible patched versions of upstream projects." Here the +openblas is exactly designed to communicate ABI incompatibility. Soooooo yeah this is ugly all around. Pip and friends are getting better but they're just not up to this kind of thing. >> Another solution would be to just include openblas in both. This >> bloats downloads, but I'd rather waste 30 MiB then waste users' time >> fighting with random library incompatibility nonsense that they don't >> care about. >> >> Another solution would be to split the openblas library off into its >> own "python package", that just dropped the binary somewhere where it >> could be found later, and then have both the numpy and scipy wheels >> depend on this package. > > > Creating a dedicated OpenBLAS package and adding this package as an dependancy to numpy/scipy would also allow independant upgrade paths to OpenBLAS, numpy and scipy. The API of OpenBLAS seems to be stable enough to allow for that. Having an additional package dependancy is a minor problem, as pip can handle this automatically for the user. Exactly. We might even want to give it a tiny python wrapper, e.g. you do import openblas openblas.add_to_library_path() and that would be a little function that modifies LD_LIBRARY_PATH or calls AddDllDirectory etc. as appropriate, so that code linking to openblas can ignore all these details. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sun Jan 25 16:15:59 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 25 Jan 2015 13:15:59 -0800 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: Hi, On Sun, Jan 25, 2015 at 12:23 PM, Nathaniel Smith wrote: > On 25 Jan 2015 18:46, "Carl Kleffner" wrote: >> >> 2015-01-25 16:46 GMT+01:00 Nathaniel Smith : >>> >>> On Sat, Jan 24, 2015 at 5:29 PM, Carl Kleffner >>> wrote: >>> > >>> > 2015-01-23 0:23 GMT+01:00 Nathaniel Smith : >>> >> >>> >> On Thu, Jan 22, 2015 at 9:29 PM, Carl Kleffner >>> >> wrote: >>> >> > OpenBLAS is deployed as part of the numpy wheel. That said, the >>> >> > scipy >>> >> > wheels >>> >> > mentioned above are dependant on the installation of the OpenBLAS >>> >> > based >>> >> > numpy and won't work i.e. with an installed numpy-MKL. >>> >> >>> >> This sounds like it probably needs to be fixed before we can recommend >>> >> the scipy wheels for anyone? OTOH it might be fine to start >>> >> distributing numpy wheels first. >>> > >>> > >>> > I very much prefer dynamic linking to numpy\core\libopenblas.dll >>> > instead of >>> > static linking to avoid bloat. This matters, because libopenblas.dll is >>> > a >>> > heavy library (around 30Mb for amd64). As a consequence all packages >>> > with >>> > dynamic linkage to OpenBLAS depend on numpy-openblas. This is not >>> > different >>> > to scipy-MKL that has a dependancy to numpy-MKL - see C. Gohlke's site. >>> >>> The difference is that if we upload this as the standard scipy wheel, >>> and then someone goes "hey, look, a new scipy release just got >>> announced, 'pip upgrade scipy'", then the result will often be that >>> they just get random unexplained crashes. I think we should try to >>> avoid that kind of outcome, even if it means making some technical >>> compromises. The whole idea of having the wheels is to make fetching >>> particular versions seamless and robust, and the other kinds of builds >>> will still be available for those willing to invest more effort. >>> >>> One solution would be for the scipy wheel to explicitly depend on a >>> numpy+openblas wheel, so that someone doing 'pip install scipy' also >>> forced a numpy upgrade. But I think we should forget about trying this >>> given the current state of python packaging tools: pip/setuptools/etc. >>> are not really sophisticated enough to let us do this without a lot of >>> kluges and compromises, and anyway it is nicer to allow scipy and >>> numpy to be upgraded separately. >> >> >> I've learned, that mark numpy with something like numpy+openblas is called >> "local version identifier": >> https://www.python.org/dev/peps/pep-0440/#local-version-identifiers >> These identifieres are not allowed for Pypi however. > > Right, it's fine for the testing wheels, but even if it were allowed on pypi > then it still wouldn't let us specify the correct dependency -- we'd have to > say that scipy build X depends on exactly numpy 1.9.1+openblas, not numpy > +openblas. So then when a new version of numpy was uploaded it'd > be impossible to upgrade without also rebuilding numpy. > > Alternatively pip would be within its rights to simply ignore the local > version part, because "Local version identifiers are used to denote fully > API (and, if applicable, ABI) compatible patched versions of upstream > projects." Here the +openblas is exactly designed to communicate ABI > incompatibility. > > Soooooo yeah this is ugly all around. Pip and friends are getting better but > they're just not up to this kind of thing. I agree, that shipping openblas with both numpy and scipy seems perfectly reasonable to me - I don't think anyone will much care about the 30M, and I think our job is to make something that works with the least complexity and likelihood of error. It would be good to rename the dll according to the package and version though, to avoid a scipy binary using a pre-loaded but incompatible 'libopenblas.dll'. Say something like openblas-scipy-0.15.1.dll - on the basis that there can only be one copy of scipy loaded at a time. Cheers, Matthew From olivier.grisel at ensta.org Sun Jan 25 17:14:06 2015 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Sun, 25 Jan 2015 23:14:06 +0100 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: +1 for bundling OpenBLAS both in scipy and numpy in the short term. Introducing a new dependency project for OpenBLAS sounds like a good idea but this is probably more work. -- Olivier From sturla.molden at gmail.com Sun Jan 25 20:16:31 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 26 Jan 2015 02:16:31 +0100 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: On 25/01/15 22:15, Matthew Brett wrote: > I agree, that shipping openblas with both numpy and scipy seems > perfectly reasonable to me - I don't think anyone will much care about > the 30M, and I think our job is to make something that works with the > least complexity and likelihood of error. Yes. Make something that works first, optimize for space later. > It would be good to rename the dll according to the package and > version though, to avoid a scipy binary using a pre-loaded but > incompatible 'libopenblas.dll'. Say something like > openblas-scipy-0.15.1.dll - on the basis that there can only be one > copy of scipy loaded at a time. That is a good idea and we should do this for NumPy too I think. Sturla From jensj at fysik.dtu.dk Mon Jan 26 03:24:06 2015 From: jensj at fysik.dtu.dk (=?UTF-8?B?SmVucyBKw7hyZ2VuIE1vcnRlbnNlbg==?=) Date: Mon, 26 Jan 2015 09:24:06 +0100 Subject: [Numpy-discussion] Float view of complex array Message-ID: <54C5F9A6.9080109@fysik.dtu.dk> Hi! I have a view of a 2-d complex array that I would like to view as a 2-d float array. This works OK: >>> np.ones((2, 4), complex).view(float) array([[ 1., 0., 1., 0., 1., 0., 1., 0.], [ 1., 0., 1., 0., 1., 0., 1., 0.]]) but this doesn't: >>> np.ones((2, 4), complex)[:, :2].view(float) Traceback (most recent call last): File "", line 1, in ValueError: new type not compatible with array. >>> np.__version__ '1.9.0' and I don't understand why. When looking at the memory layout, I think it should be possible. Jens J?rgen From maniteja.modesty067 at gmail.com Mon Jan 26 04:27:35 2015 From: maniteja.modesty067 at gmail.com (Maniteja Nandana) Date: Mon, 26 Jan 2015 14:57:35 +0530 Subject: [Numpy-discussion] Float view of complex array In-Reply-To: <54C5F9A6.9080109@fysik.dtu.dk> References: <54C5F9A6.9080109@fysik.dtu.dk> Message-ID: Hi Jens, I don't have enough knowledge about the internal memory layout, but the documentation ndarray.view says that: Views that change the dtype size (bytes per entry) should normally be avoided on arrays defined by slices, transposes, fortran-ordering, etc.: In your case, creating a *copy *of the slice and then calling *view *works. >>>a array([[ 1.+0.j, 1.+0.j, 1.+0.j, 1.+0.j], [ 1.+0.j, 1.+0.j, 1.+0.j, 1.+0.j]]) >>> a.view(float) array([[ 1., 0., 1., 0., 1., 0., 1., 0.], [ 1., 0., 1., 0., 1., 0., 1., 0.]]) >>> b=a[:,:2].copy() >>> b.view(float) array([[ 1., 0., 1., 0.], [ 1., 0., 1., 0.]]) >>> c=a[:,:2] >>> c.view(float) Traceback (most recent call last): File "", line 1, in ValueError: new type not compatible with array Hope it helps :) Cheers, N.Maniteja. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Jan 26 04:41:35 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 26 Jan 2015 10:41:35 +0100 Subject: [Numpy-discussion] Float view of complex array In-Reply-To: <54C5F9A6.9080109@fysik.dtu.dk> References: <54C5F9A6.9080109@fysik.dtu.dk> Message-ID: <1422265295.10406.1.camel@sebastian-t440> On Mo, 2015-01-26 at 09:24 +0100, Jens J?rgen Mortensen wrote: > Hi! > > I have a view of a 2-d complex array that I would like to view as a 2-d > float array. This works OK: > > >>> np.ones((2, 4), complex).view(float) > array([[ 1., 0., 1., 0., 1., 0., 1., 0.], > [ 1., 0., 1., 0., 1., 0., 1., 0.]]) > > but this doesn't: > > >>> np.ones((2, 4), complex)[:, :2].view(float) > Traceback (most recent call last): > File "", line 1, in > ValueError: new type not compatible with array. > >>> np.__version__ > '1.9.0' > > and I don't understand why. When looking at the memory layout, I think > it should be possible. > Yes, it should be possible, but it is not :). You could hack it by using `np.ndarray` (or stride tricks). Or maybe you are interested making the checks whether it makes sense or not less strict. - Sebastian > Jens J?rgen > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From jaime.frio at gmail.com Mon Jan 26 05:02:20 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Mon, 26 Jan 2015 02:02:20 -0800 Subject: [Numpy-discussion] Float view of complex array In-Reply-To: <1422265295.10406.1.camel@sebastian-t440> References: <54C5F9A6.9080109@fysik.dtu.dk> <1422265295.10406.1.camel@sebastian-t440> Message-ID: On Mon, Jan 26, 2015 at 1:41 AM, Sebastian Berg wrote: > On Mo, 2015-01-26 at 09:24 +0100, Jens J?rgen Mortensen wrote: > > Hi! > > > > I have a view of a 2-d complex array that I would like to view as a 2-d > > float array. This works OK: > > > > >>> np.ones((2, 4), complex).view(float) > > array([[ 1., 0., 1., 0., 1., 0., 1., 0.], > > [ 1., 0., 1., 0., 1., 0., 1., 0.]]) > > > > but this doesn't: > > > > >>> np.ones((2, 4), complex)[:, :2].view(float) > > Traceback (most recent call last): > > File "", line 1, in > > ValueError: new type not compatible with array. > > >>> np.__version__ > > '1.9.0' > > > > and I don't understand why. When looking at the memory layout, I think > > it should be possible. > > > > Yes, it should be possible, but it is not :). You could hack it by using > `np.ndarray` (or stride tricks). Or maybe you are interested making the > checks whether it makes sense or not less strict. > How would it be possible? He goes from an array with 16 byte strides along the last axis: r0i0, r1i1, r2i2, r3i3 to one with 32 byte strides, which is OK r0i0, xxxx, r2i2, xxxx but everything breaks down when he wants to have alternating strides of 8 and 24 bytes: r0, i0, xxxx, r2, i2, xxxx which cannot be hacked in any sensible way. What I think could be made to work, but also fails, is this: np.ones((2, 4), complex).reshape(2, 4, 1)[:, :2, :].view(float) Here the original strides are (64, 16, xx) and the resulting view should have strides (64, 32, 8), not sure what trips this. Jaime > > - Sebastian > > > Jens J?rgen > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Jan 26 05:23:22 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 26 Jan 2015 11:23:22 +0100 Subject: [Numpy-discussion] Float view of complex array In-Reply-To: References: <54C5F9A6.9080109@fysik.dtu.dk> <1422265295.10406.1.camel@sebastian-t440> Message-ID: <1422267802.11549.1.camel@sebastian-t440> On Mo, 2015-01-26 at 02:02 -0800, Jaime Fern?ndez del R?o wrote: > On Mon, Jan 26, 2015 at 1:41 AM, Sebastian Berg > wrote: > On Mo, 2015-01-26 at 09:24 +0100, Jens J?rgen Mortensen wrote: > > Hi! > > > > I have a view of a 2-d complex array that I would like to > view as a 2-d > > float array. This works OK: > > > > >>> np.ones((2, 4), complex).view(float) > > array([[ 1., 0., 1., 0., 1., 0., 1., 0.], > > [ 1., 0., 1., 0., 1., 0., 1., 0.]]) > > > > but this doesn't: > > > > >>> np.ones((2, 4), complex)[:, :2].view(float) > > Traceback (most recent call last): > > File "", line 1, in > > ValueError: new type not compatible with array. > > >>> np.__version__ > > '1.9.0' > > > > and I don't understand why. When looking at the memory > layout, I think > > it should be possible. > > > > Yes, it should be possible, but it is not :). You could hack > it by using > `np.ndarray` (or stride tricks). Or maybe you are interested > making the > checks whether it makes sense or not less strict. > > > How would it be possible? He goes from an array with 16 byte strides > along the last axis: > Oh, sorry, you are right of course. I thought it was going the other way around, from double -> complex. That way could work (in this case I think), but does not currently. > > r0i0, r1i1, r2i2, r3i3 > > > to one with 32 byte strides, which is OK > > > r0i0, xxxx, r2i2, xxxx > > > but everything breaks down when he wants to have alternating strides > of 8 and 24 bytes: > > > r0, i0, xxxx, r2, i2, xxxx > > > which cannot be hacked in any sensible way. > > > What I think could be made to work, but also fails, is this: > > > np.ones((2, 4), complex).reshape(2, 4, 1)[:, :2, :].view(float) > > > > Here the original strides are (64, 16, xx) and the resulting view > should have strides (64, 32, 8), not sure what trips this. > > > Jaime > > > > - Sebastian > > > Jens J?rgen > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus > planes de dominaci?n mundial. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From dieter.van.eessen at gmail.com Mon Jan 26 06:06:44 2015 From: dieter.van.eessen at gmail.com (Dieter Van Eessen) Date: Mon, 26 Jan 2015 12:06:44 +0100 Subject: [Numpy-discussion] 3D array and the right hand rule Message-ID: Hello, I'm a novice with respect to scientific computing using python. I've read that numpy.array isn't arranged according to the 'right-hand-rule' (right-hand-rule => thumb = +x; index finger = +y, bend middle finder = +z). This is also confirmed by an old message I dug up from the mailing list archives. (see message below) I guess this has consequences for certain algorithms (been a while, should actually revise some algebra textbooks). At least it does when I try to mentally visualize a 3D array when I'm not sure which dimensions to use... What are the consequences in using 3D arrays in for example transformation algorithms or other algorithms which expect certain shape? Should I always 'reshape' before using them or do most algorithms take this into account in the underlying algebra? Does the 'Fortran contiguous' shape always respect the 'right-hand-rule' (for 3D arrays)? And just to be sure: Is the information telling if an array is C-contiguous or Fortran-contiguous always stored within the array? kind regards, Dieter / Old message From: Anne Archibald gmail.com> Subject: Re: dimension aligment Newsgroups: gmane.comp.python.numeric.general Date: 2008-05-20 18:04:46 GMT (6 years, 35 weeks, 5 days, 4 hours and 34 minutes ago) 2008/5/20 Thomas Hrabe burnham.org>: > given a *3d* *array* > a = > numpy.*array*([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]],[[13,14,15],[16,17,18]],[[19,20,21],[22,23,24]]]) > a.shape > returns (4,2,3) > > so I assume the first digit is the 3rd dimension, second is 2nd dim and > third is the first. > > how is the data aligned in memory now? > according to the strides it should be > 1,2,3,4,5,6,7,8,9,10,... > *right*? > > if I had an *array* of more dimensions, the first digit returned by shape > should always be the highest dim. You are basically *right*, but this is a surprisingly subtle issue for numpy. A numpy *array* is basically a block of memory and some description. One piece of that description is the type of data it contains (i.e., how to interpret each chunk of memory) for example int32, float64, etc. Another is the sizes of all the various dimensions. A third piece, which makes many of the things numpy does possible, is the "strides". The way numpy works is that basically it translates A[i,j,k] into a lookup of the item in the memory block at position i*strides[0]+j*strides[1]+k*strides[2] This means, if you have an *array* A and you want every second element (A[::2]), all numpy needs to do is *hand* you back a new *array* pointing to the same data block, but with strides[0] doubled. Similarly if you want to transpose a two-dimensional *array*, all it needs to do is exchange strides[0] and strides[1]; no data need be moved. This means, though, that if you are *hand*ed a numpy *array*, the elements can be arranged in memory in quite a complicated fashion. Sometimes this is no problem - you can always use the strides to find it all. But sometimes you need the data arranged in a particular way. numpy defines two particular ways: "C contiguous" and "FORTRAN contiguous". "C contiguous" *array*s are what you describe, and they're what numpy produces by default; they are arranged so that the *right*most index has the smallest stride. "FORTRAN contiguous" *array*s are arranged the other way around; the leftmost index has the smallest stride. (This is how FORTRAN *array*s are arranged in memory.) There is also a special case: the reshape() function changes the shape of the *array*. It has an "order" argument that describes not how the elements are arranged in memory but how you want to think of the elements as arranged in memory for the reshape operation. Anne /Old message -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmkleffner at gmail.com Mon Jan 26 10:30:43 2015 From: cmkleffner at gmail.com (Carl Kleffner) Date: Mon, 26 Jan 2015 16:30:43 +0100 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: Thanks for all your ideas. The next version will contain an augumented libopenblas.dll in both numpy and scipy. On the long term I would prefer an external openblas wheel package, if there is an agreement about this among numpy-dev. Another idea for the future is to conditionally load a debug version of libopenblas instead. Together with the backtrace.dll (part of mingwstatic, but undocumentated right now) a meaningfull stacktrace in case of segfaults inside the code comiled with mingwstatic will be given. 2015-01-26 2:16 GMT+01:00 Sturla Molden : > On 25/01/15 22:15, Matthew Brett wrote: > > > I agree, that shipping openblas with both numpy and scipy seems > > perfectly reasonable to me - I don't think anyone will much care about > > the 30M, and I think our job is to make something that works with the > > least complexity and likelihood of error. > > Yes. Make something that works first, optimize for space later. > > > > It would be good to rename the dll according to the package and > > version though, to avoid a scipy binary using a pre-loaded but > > incompatible 'libopenblas.dll'. Say something like > > openblas-scipy-0.15.1.dll - on the basis that there can only be one > > copy of scipy loaded at a time. > > That is a good idea and we should do this for NumPy too I think. > > > > Sturla > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Mon Jan 26 18:16:34 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 27 Jan 2015 00:16:34 +0100 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: On 26/01/15 16:30, Carl Kleffner wrote: > Thanks for all your ideas. The next version will contain an augumented > libopenblas.dll in both numpy and scipy. On the long term I would > prefer an external openblas wheel package, if there is an agreement > about this among numpy-dev. Thanks for all your great work on this. An OpenBLAS wheel might be a good idea. Probably we should have some sort of instruction on the website how to install the binary wheel. And then we could include the OpenBLAS wheel in the instruction. Or we could have the OpenBLAS wheel as a part of the scipy stack. But make the bloated SciPy and NumPy wheels work first, then we can worry about a dedicated OpenBLAS wheel later :-) > Another idea for the future is to conditionally load a debug version of > libopenblas instead. Together with the backtrace.dll (part of > mingwstatic, but undocumentated right now) a meaningfull stacktrace in > case of segfaults inside the code comiled with mingwstatic will be given. An OpenBLAS wheel could also include multiple architectures. We can compile OpenBLAS for any kind of CPUs and and install the one that fits best with the computer. Also note that an OpenBLAS wheel could be useful on Linux. It is clearly superior to the ATLAS libraries that most distros ship. If we make a binary wheel that works for Windows, we are almost there for Linux too :-) For Apple we don't need OpenBLAS anymore. On OSX 10.9 and 10.10 Accelerate Framework is actually faster than MKL under many circumstances. DGEMM is about the same, but e.g. DAXPY and DDOT are faster in Accelerate. Sturla From yw5aj at virginia.edu Mon Jan 26 22:29:35 2015 From: yw5aj at virginia.edu (Yuxiang Wang) Date: Mon, 26 Jan 2015 22:29:35 -0500 Subject: [Numpy-discussion] F2PY cannot see module-scope variables Message-ID: Dear all, Sorry about being new to both Fortran 90 and f2py. I have a module in fortran, written as follows, with a module-scope variable dp: ======================================== ! testf2py.f90 module testf2py implicit none private public dp, i1 integer, parameter :: dp=kind(0.d0) contains real(dp) function i1(m) real(dp), intent(in) :: m(3, 3) i1 = m(1, 1) + m(2, 2) + m(3, 3) return end function i1 end module testf2py ======================================== Then, if I run f2py -c testf2py.f90 -m testf2py It would report an error, stating that dp was not declared. If I copy the module-scope to the function-scope, it would work. ======================================== ! testf2py.f90 module testf2py implicit none private public i1 integer, parameter :: dp=kind(0.d0) contains real(dp) function i1(m) integer, parameter :: dp=kind(0.d0) real(dp), intent(in) :: m(3, 3) i1 = m(1, 1) + m(2, 2) + m(3, 3) return end function i1 end module testf2py ======================================== However, this does not look like the best coding practice though, as it is pretty "wet". Any ideas? Thanks, Shawn -- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia yw5aj at virginia.edu +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/ From yw5aj at virginia.edu Mon Jan 26 22:31:16 2015 From: yw5aj at virginia.edu (Yuxiang Wang) Date: Mon, 26 Jan 2015 22:31:16 -0500 Subject: [Numpy-discussion] F2PY cannot see module-scope variables In-Reply-To: References: Message-ID: Sorry that I forgot to report the environment - Windows 64 bit, Python 3.4 64 bit. Numpy version is 1.9.1, and I commented the "raise NotImplementedError("Only MS compiler supported with gfortran on win64")" in the gnu.py, as instructed on this link: http://scientificcomputingco.blogspot.com.au/2013/02/f2py-on-64bit-windows-python27.html On Mon, Jan 26, 2015 at 10:29 PM, Yuxiang Wang wrote: > Dear all, > > Sorry about being new to both Fortran 90 and f2py. > > I have a module in fortran, written as follows, with a module-scope variable dp: > > ======================================== > ! testf2py.f90 > module testf2py > implicit none > private > public dp, i1 > integer, parameter :: dp=kind(0.d0) > contains > real(dp) function i1(m) > real(dp), intent(in) :: m(3, 3) > i1 = m(1, 1) + m(2, 2) + m(3, 3) > return > end function i1 > end module testf2py > ======================================== > > Then, if I run f2py -c testf2py.f90 -m testf2py > > It would report an error, stating that dp was not declared. > > If I copy the module-scope to the function-scope, it would work. > > ======================================== > ! testf2py.f90 > module testf2py > implicit none > private > public i1 > integer, parameter :: dp=kind(0.d0) > contains > real(dp) function i1(m) > integer, parameter :: dp=kind(0.d0) > real(dp), intent(in) :: m(3, 3) > i1 = m(1, 1) + m(2, 2) + m(3, 3) > return > end function i1 > end module testf2py > ======================================== > > However, this does not look like the best coding practice though, as > it is pretty "wet". > > Any ideas? > > Thanks, > > Shawn > > -- > Yuxiang "Shawn" Wang > Gerling Research Lab > University of Virginia > yw5aj at virginia.edu > +1 (434) 284-0836 > https://sites.google.com/a/virginia.edu/yw5aj/ -- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia yw5aj at virginia.edu +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/ From warren.weckesser at gmail.com Mon Jan 26 23:56:19 2015 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Mon, 26 Jan 2015 23:56:19 -0500 Subject: [Numpy-discussion] F2PY cannot see module-scope variables In-Reply-To: References: Message-ID: On 1/26/15, Yuxiang Wang wrote: > Dear all, > > Sorry about being new to both Fortran 90 and f2py. > > I have a module in fortran, written as follows, with a module-scope variable > dp: > > ======================================== > ! testf2py.f90 > module testf2py > implicit none > private > public dp, i1 > integer, parameter :: dp=kind(0.d0) > contains > real(dp) function i1(m) > real(dp), intent(in) :: m(3, 3) > i1 = m(1, 1) + m(2, 2) + m(3, 3) > return > end function i1 > end module testf2py > ======================================== > > Then, if I run f2py -c testf2py.f90 -m testf2py > > It would report an error, stating that dp was not declared. > > If I copy the module-scope to the function-scope, it would work. > > ======================================== > ! testf2py.f90 > module testf2py > implicit none > private > public i1 > integer, parameter :: dp=kind(0.d0) > contains > real(dp) function i1(m) > integer, parameter :: dp=kind(0.d0) > real(dp), intent(in) :: m(3, 3) > i1 = m(1, 1) + m(2, 2) + m(3, 3) > return > end function i1 > end module testf2py > ======================================== > > However, this does not look like the best coding practice though, as > it is pretty "wet". > > Any ideas? > > Thanks, > > Shawn > Shawn, I posted a suggestion as an answer to your question on stackoverflow: http://stackoverflow.com/questions/28162922/f2py-cannot-see-module-scope-variables For the mailing-list-only folks, here's what I wrote: Here's a work-around, in which `dp` is moved to a `types` module, and the `use types` statement is added to the function `i1`. ! testf2py.f90 module types implicit none integer, parameter :: dp=kind(0.d0) end module types module testf2py implicit none private public i1 contains real(dp) function i1(m) use types real(dp), intent(in) :: m(3, 3) i1 = m(1, 1) + m(2, 2) + m(3, 3) return end function i1 end module testf2py In action: In [6]: import numpy as np In [7]: m = np.array([[10, 20, 30], [40, 50, 60], [70, 80, 90]]) In [8]: import testf2py In [9]: testf2py.testf2py.i1(m) Out[9]: 150.0 The change is similar to the third option that I described in this answer: http://stackoverflow.com/questions/12523524/f2py-specifying-real-precision-in-fortran-when-interfacing-with-python/12524403#12524403 Warren > -- > Yuxiang "Shawn" Wang > Gerling Research Lab > University of Virginia > yw5aj at virginia.edu > +1 (434) 284-0836 > https://sites.google.com/a/virginia.edu/yw5aj/ > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jensj at fysik.dtu.dk Tue Jan 27 01:28:26 2015 From: jensj at fysik.dtu.dk (=?windows-1252?Q?Jens_J=F8rgen_Mortensen?=) Date: Tue, 27 Jan 2015 07:28:26 +0100 Subject: [Numpy-discussion] Float view of complex array In-Reply-To: References: <54C5F9A6.9080109@fysik.dtu.dk> <1422265295.10406.1.camel@sebastian-t440> Message-ID: <54C7300A.7060606@fysik.dtu.dk> On 01/26/2015 11:02 AM, Jaime Fern?ndez del R?o wrote: > On Mon, Jan 26, 2015 at 1:41 AM, Sebastian Berg > > wrote: > > On Mo, 2015-01-26 at 09:24 +0100, Jens J?rgen Mortensen wrote: > > Hi! > > > > I have a view of a 2-d complex array that I would like to view > as a 2-d > > float array. This works OK: > > > > >>> np.ones((2, 4), complex).view(float) > > array([[ 1., 0., 1., 0., 1., 0.,1., 0.], > > [ 1., 0., 1., 0., 1., 0., 1., 0.]]) > > > > but this doesn't: > > > > >>> np.ones((2, 4), complex)[:, :2].view(float) > > Traceback (most recent call last): > > File "", line 1, in > > ValueError: new type not compatible with array. > > >>> np.__version__ > > '1.9.0' > > > > and I don't understand why. When looking at the memory layout, > I think > > it should be possible. > > > > Yes, it should be possible, but it is not :). You could hack it by > using > `np.ndarray` (or stride tricks). Or maybe you are interested > making the > checks whether it makes sense or not less strict. > > > How would it be possible? He goes from an array with 16 byte strides > along the last axis: > > r0i0, r1i1, r2i2, r3i3 > > to one with 32 byte strides, which is OK > > r0i0, xxxx, r2i2, xxxx > > but everything breaks down when he wants to have alternating strides > of 8 and 24 bytes: > > r0, i0, xxxx, r2, i2, xxxx No, that is not what I want. I want this: r0, i0, r1, i1, xxxx, xxxx with stride 8 on the last axis - which should be fine. My current workaround is to do a copy() before view() - thanks Maniteja. Jens J?rgen > > which cannot be hacked in any sensible way. > > What I think could be made to work, but also fails, is this: > > np.ones((2, 4), complex).reshape(2, 4, 1)[:, :2, :].view(float) > > Here the original strides are (64, 16, xx) and the resulting view > should have strides (64, 32, 8), not sure what trips this. > > Jaime > > > - Sebastian > > > Jens J?rgen > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus > planes de dominaci?n mundial. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From cmkleffner at gmail.com Tue Jan 27 05:32:45 2015 From: cmkleffner at gmail.com (Carl Kleffner) Date: Tue, 27 Jan 2015 11:32:45 +0100 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: 2015-01-27 0:16 GMT+01:00 Sturla Molden : > On 26/01/15 16:30, Carl Kleffner wrote: > > > Thanks for all your ideas. The next version will contain an augumented > > libopenblas.dll in both numpy and scipy. On the long term I would > > prefer an external openblas wheel package, if there is an agreement > > about this among numpy-dev. > > > Thanks for all your great work on this. > > An OpenBLAS wheel might be a good idea. Probably we should have some > sort of instruction on the website how to install the binary wheel. And > then we could include the OpenBLAS wheel in the instruction. Or we could > have the OpenBLAS wheel as a part of the scipy stack. > > But make the bloated SciPy and NumPy wheels work first, then we can > worry about a dedicated OpenBLAS wheel later :-) > > > > Another idea for the future is to conditionally load a debug version of > > libopenblas instead. Together with the backtrace.dll (part of > > mingwstatic, but undocumentated right now) a meaningfull stacktrace in > > case of segfaults inside the code comiled with mingwstatic will be given. > > An OpenBLAS wheel could also include multiple architectures. We can > compile OpenBLAS for any kind of CPUs and and install the one that fits > best with the computer. > OpenBLAS in the test wheels is build with DYNAMIC_ARCH, that is all assembler based kernels are included and are choosen at runtime. Non optimized parts of Lapack have been build with -march=sse2. > > Also note that an OpenBLAS wheel could be useful on Linux. It is clearly > superior to the ATLAS libraries that most distros ship. If we make a > binary wheel that works for Windows, we are almost there for Linux too :-) > I have in mind, that binary wheels are not supported for Linux. Maybe this could be done as conda package for Anaconda/Miniconda as an OSS alternative to MKL. > > For Apple we don't need OpenBLAS anymore. On OSX 10.9 and 10.10 > Accelerate Framework is actually faster than MKL under many > circumstances. DGEMM is about the same, but e.g. DAXPY and DDOT are > faster in Accelerate. > > > Sturla > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Tue Jan 27 06:14:00 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 27 Jan 2015 12:14:00 +0100 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: On 27/01/15 11:32, Carl Kleffner wrote: > OpenBLAS in the test wheels is build with DYNAMIC_ARCH, that is all > assembler based kernels are included and are choosen at runtime. Ok, I wasn't aware of that option. Last time I built OpenBLAS I think I had to specify the target CPU. > Non > optimized parts of Lapack have been build with -march=sse2. Since LAPACK delegates almost all of its heavy lifting to BLAS, there is probably not a lot to gain from SSE3, SSE4 or AVX here. Sturla From jaime.frio at gmail.com Tue Jan 27 06:25:54 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Tue, 27 Jan 2015 03:25:54 -0800 Subject: [Numpy-discussion] Float view of complex array In-Reply-To: <54C7300A.7060606@fysik.dtu.dk> References: <54C5F9A6.9080109@fysik.dtu.dk> <1422265295.10406.1.camel@sebastian-t440> <54C7300A.7060606@fysik.dtu.dk> Message-ID: On Mon, Jan 26, 2015 at 10:28 PM, Jens J?rgen Mortensen wrote: > On 01/26/2015 11:02 AM, Jaime Fern?ndez del R?o wrote: > > On Mon, Jan 26, 2015 at 1:41 AM, Sebastian Berg > > > wrote: > > > > On Mo, 2015-01-26 at 09:24 +0100, Jens J?rgen Mortensen wrote: > > > Hi! > > > > > > I have a view of a 2-d complex array that I would like to view > > as a 2-d > > > float array. This works OK: > > > > > > >>> np.ones((2, 4), complex).view(float) > > > array([[ 1., 0., 1., 0., 1., 0.,1., 0.], > > > [ 1., 0., 1., 0., 1., 0., 1., 0.]]) > > > > > > but this doesn't: > > > > > > >>> np.ones((2, 4), complex)[:, :2].view(float) > > > Traceback (most recent call last): > > > File "", line 1, in > > > ValueError: new type not compatible with array. > > > >>> np.__version__ > > > '1.9.0' > > > > > > and I don't understand why. When looking at the memory layout, > > I think > > > it should be possible. > > > > > > > Yes, it should be possible, but it is not :). You could hack it by > > using > > `np.ndarray` (or stride tricks). Or maybe you are interested > > making the > > checks whether it makes sense or not less strict. > > > > > > How would it be possible? He goes from an array with 16 byte strides > > along the last axis: > > > > r0i0, r1i1, r2i2, r3i3 > > > > to one with 32 byte strides, which is OK > > > > r0i0, xxxx, r2i2, xxxx > > > > but everything breaks down when he wants to have alternating strides > > of 8 and 24 bytes: > > > > r0, i0, xxxx, r2, i2, xxxx > > No, that is not what I want. I want this: > > r0, i0, r1, i1, xxxx, xxxx > > with stride 8 on the last axis - which should be fine. My current > workaround is to do a copy() before view() - thanks Maniteja. My bad, you are absolutely right, Jens... I have put together a quick PR (https://github.com/numpy/numpy/pull/5508) that fixes your use case, by relaxing the requirements for views of different dtypes. I'd appreciate if you could take a look at the logic in the code (it is profusely commented), and see if you can think of other cases that can be viewed as another dtype that I may have overlooked. Thanks, Jaime > Jens J?rgen > > > > > which cannot be hacked in any sensible way. > > > > What I think could be made to work, but also fails, is this: > > > > np.ones((2, 4), complex).reshape(2, 4, 1)[:, :2, :].view(float) > > > > Here the original strides are (64, 16, xx) and the resulting view > > should have strides (64, 32, 8), not sure what trips this. > > > > Jaime > > > > > > - Sebastian > > > > > Jens J?rgen > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > -- > > (\__/) > > ( O.o) > > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus > > planes de dominaci?n mundial. > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjw at ncf.ca Tue Jan 27 09:47:07 2015 From: cjw at ncf.ca (cjw) Date: Tue, 27 Jan 2015 09:47:07 -0500 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: <54C7A4EB.2040103@ncf.ca> An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Tue Jan 27 15:53:07 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 27 Jan 2015 21:53:07 +0100 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: On Mon, Jan 26, 2015 at 4:30 PM, Carl Kleffner wrote: > Thanks for all your ideas. The next version will contain an augumented > libopenblas.dll in both numpy and scipy. On the long term I would prefer > an external openblas wheel package, if there is an agreement about this > among numpy-dev. > Sounds fine in principle, but reliable dependency handling will be hard to support in setup.py. You'd want the dependency on Openblas when installing a complete set of wheels, but not make it impossible to use: - building against ATLAS/MKL/... from source with pip or distutils - allowing use of a local wheelhouse which uses ATLAS/MKL/... wheels - pip install numpy --no-use-wheel - etc. Static bundling is a lot easier to get right. > Another idea for the future is to conditionally load a debug version of > libopenblas instead. Together with the backtrace.dll (part of mingwstatic, > but undocumentated right now) a meaningfull stacktrace in case of segfaults > inside the code comiled with mingwstatic will be given. > > > 2015-01-26 2:16 GMT+01:00 Sturla Molden : > >> On 25/01/15 22:15, Matthew Brett wrote: >> >> > I agree, that shipping openblas with both numpy and scipy seems >> > perfectly reasonable to me - I don't think anyone will much care about >> > the 30M, and I think our job is to make something that works with the >> > least complexity and likelihood of error. >> >> Yes. Make something that works first, optimize for space later. >> > +1 Ralf > > It would be good to rename the dll according to the package and >> > version though, to avoid a scipy binary using a pre-loaded but >> > incompatible 'libopenblas.dll'. Say something like >> > openblas-scipy-0.15.1.dll - on the basis that there can only be one >> > copy of scipy loaded at a time. >> >> That is a good idea and we should do this for NumPy too I think. >> >> >> >> Sturla >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Jan 27 16:13:01 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 27 Jan 2015 21:13:01 +0000 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: On Tue, Jan 27, 2015 at 8:53 PM, Ralf Gommers wrote: > > On Mon, Jan 26, 2015 at 4:30 PM, Carl Kleffner wrote: >> >> Thanks for all your ideas. The next version will contain an augumented >> libopenblas.dll in both numpy and scipy. On the long term I would prefer an >> external openblas wheel package, if there is an agreement about this among >> numpy-dev. > > > Sounds fine in principle, but reliable dependency handling will be hard to > support in setup.py. You'd want the dependency on Openblas when installing a > complete set of wheels, but not make it impossible to use: > > - building against ATLAS/MKL/... from source with pip or distutils > - allowing use of a local wheelhouse which uses ATLAS/MKL/... wheels > - pip install numpy --no-use-wheel > - etc. > > Static bundling is a lot easier to get right. In principle I think this should be easy: when installing a .whl, pip or whatever looks at the dependencies declared in the distribution metadata file inside the wheel. When installing via setup.py, pip or whatever uses the dependencies declared by setup.py. We just have to make sure that the wheels we distribute have the right metadata inside them and everything should work. Accomplishing this may be somewhat awkward with existing tools, but as a worst-case/proof-of-concept approach we could just have a step in the wheel build that opens up the .whl and edits it to add the dependency. Ugly, but it'd work. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From ralf.gommers at gmail.com Tue Jan 27 16:34:45 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 27 Jan 2015 22:34:45 +0100 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: On Tue, Jan 27, 2015 at 10:13 PM, Nathaniel Smith wrote: > On Tue, Jan 27, 2015 at 8:53 PM, Ralf Gommers > wrote: > > > > On Mon, Jan 26, 2015 at 4:30 PM, Carl Kleffner > wrote: > >> > >> Thanks for all your ideas. The next version will contain an augumented > >> libopenblas.dll in both numpy and scipy. On the long term I would > prefer an > >> external openblas wheel package, if there is an agreement about this > among > >> numpy-dev. > > > > > > Sounds fine in principle, but reliable dependency handling will be hard > to > > support in setup.py. You'd want the dependency on Openblas when > installing a > > complete set of wheels, but not make it impossible to use: > > > > - building against ATLAS/MKL/... from source with pip or distutils > > - allowing use of a local wheelhouse which uses ATLAS/MKL/... wheels > > - pip install numpy --no-use-wheel > > - etc. > > > > Static bundling is a lot easier to get right. > > In principle I think this should be easy: when installing a .whl, pip > or whatever looks at the dependencies declared in the distribution > metadata file inside the wheel. When installing via setup.py, pip or > whatever uses the dependencies declared by setup.py. We just have to > make sure that the wheels we distribute have the right metadata inside > them and everything should work. > > Accomplishing this may be somewhat awkward with existing tools, but as > a worst-case/proof-of-concept approach we could just have a step in > the wheel build that opens up the .whl and edits it to add the > dependency. Ugly, but it'd work. Good point, that should work. Not all that much uglier than some of the other stuff we do in release scripts for Windows binaries. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmkleffner at gmail.com Tue Jan 27 16:37:51 2015 From: cmkleffner at gmail.com (Carl Kleffner) Date: Tue, 27 Jan 2015 22:37:51 +0100 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: 2015-01-27 22:13 GMT+01:00 Nathaniel Smith : > On Tue, Jan 27, 2015 at 8:53 PM, Ralf Gommers > wrote: > > > > On Mon, Jan 26, 2015 at 4:30 PM, Carl Kleffner > wrote: > >> > >> Thanks for all your ideas. The next version will contain an augumented > >> libopenblas.dll in both numpy and scipy. On the long term I would > prefer an > >> external openblas wheel package, if there is an agreement about this > among > >> numpy-dev. > > > > > > Sounds fine in principle, but reliable dependency handling will be hard > to > > support in setup.py. You'd want the dependency on Openblas when > installing a > > complete set of wheels, but not make it impossible to use: > > > > - building against ATLAS/MKL/... from source with pip or distutils > > - allowing use of a local wheelhouse which uses ATLAS/MKL/... wheels > > - pip install numpy --no-use-wheel > > - etc. > > > > Static bundling is a lot easier to get right. > > In principle I think this should be easy: when installing a .whl, pip > or whatever looks at the dependencies declared in the distribution > metadata file inside the wheel. When installing via setup.py, pip or > whatever uses the dependencies declared by setup.py. We just have to > make sure that the wheels we distribute have the right metadata inside > them and everything should work. > > Accomplishing this may be somewhat awkward with existing tools, but as > a worst-case/proof-of-concept approach we could just have a step in > the wheel build that opens up the .whl and edits it to add the > dependency. Ugly, but it'd work. > > maybe an install_requires in setup.py in the presence of an environment variable could help during build? > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Jan 27 18:23:24 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 27 Jan 2015 15:23:24 -0800 Subject: [Numpy-discussion] new mingw-w64 based numpy and scipy wheel (still experimental) In-Reply-To: References: Message-ID: Hi, On Tue, Jan 27, 2015 at 1:37 PM, Carl Kleffner wrote: > > > 2015-01-27 22:13 GMT+01:00 Nathaniel Smith : >> >> On Tue, Jan 27, 2015 at 8:53 PM, Ralf Gommers >> wrote: >> > >> > On Mon, Jan 26, 2015 at 4:30 PM, Carl Kleffner >> > wrote: >> >> >> >> Thanks for all your ideas. The next version will contain an augumented >> >> libopenblas.dll in both numpy and scipy. On the long term I would >> >> prefer an >> >> external openblas wheel package, if there is an agreement about this >> >> among >> >> numpy-dev. >> > >> > >> > Sounds fine in principle, but reliable dependency handling will be hard >> > to >> > support in setup.py. You'd want the dependency on Openblas when >> > installing a >> > complete set of wheels, but not make it impossible to use: >> > >> > - building against ATLAS/MKL/... from source with pip or distutils >> > - allowing use of a local wheelhouse which uses ATLAS/MKL/... wheels >> > - pip install numpy --no-use-wheel >> > - etc. >> > >> > Static bundling is a lot easier to get right. >> >> In principle I think this should be easy: when installing a .whl, pip >> or whatever looks at the dependencies declared in the distribution >> metadata file inside the wheel. When installing via setup.py, pip or >> whatever uses the dependencies declared by setup.py. We just have to >> make sure that the wheels we distribute have the right metadata inside >> them and everything should work. >> >> Accomplishing this may be somewhat awkward with existing tools, but as >> a worst-case/proof-of-concept approach we could just have a step in >> the wheel build that opens up the .whl and edits it to add the >> dependency. Ugly, but it'd work. My 'delocate' utility has a routine for patching wheels : pip install delocate delocate-patch --help Cheers, Matthew From jensj at fysik.dtu.dk Wed Jan 28 03:27:06 2015 From: jensj at fysik.dtu.dk (=?windows-1252?Q?Jens_J=F8rgen_Mortensen?=) Date: Wed, 28 Jan 2015 09:27:06 +0100 Subject: [Numpy-discussion] Float view of complex array In-Reply-To: References: <54C5F9A6.9080109@fysik.dtu.dk> <1422265295.10406.1.camel@sebastian-t440> <54C7300A.7060606@fysik.dtu.dk> Message-ID: <54C89D5A.8020600@fysik.dtu.dk> Den 27-01-2015 kl. 12:25 skrev Jaime Fern?ndez del R?o: > On Mon, Jan 26, 2015 at 10:28 PM, Jens J?rgen Mortensen > > wrote: > > On 01/26/2015 11:02 AM, Jaime Fern?ndez del R?o wrote: > > On Mon, Jan 26, 2015 at 1:41 AM, Sebastian Berg > > > >> wrote: > > > > On Mo, 2015-01-26 at 09:24 +0100, Jens J?rgen Mortensen wrote: > > > Hi! > > > > > > I have a view of a 2-d complex array that I would like to view > > as a 2-d > > > float array. This works OK: > > > > > > >>> np.ones((2, 4), complex).view(float) > > > array([[ 1., 0., 1., 0., 1., 0.,1., 0.], > > > [ 1., 0., 1., 0., 1., 0., 1., 0.]]) > > > > > > but this doesn't: > > > > > > >>> np.ones((2, 4), complex)[:, :2].view(float) > > > Traceback (most recent call last): > > > File "", line 1, in > > > ValueError: new type not compatible with array. > > > >>> np.__version__ > > > '1.9.0' > > > > > > and I don't understand why. When looking at the memory > layout, > > I think > > > it should be possible. > > > > > > > Yes, it should be possible, but it is not :). You could hack > it by > > using > > `np.ndarray` (or stride tricks). Or maybe you are interested > > making the > > checks whether it makes sense or not less strict. > > > > > > How would it be possible? He goes from an array with 16 byte strides > > along the last axis: > > > > r0i0, r1i1, r2i2, r3i3 > > > > to one with 32 byte strides, which is OK > > > > r0i0, xxxx, r2i2, xxxx > > > > but everything breaks down when he wants to have alternating strides > > of 8 and 24 bytes: > > > > r0, i0, xxxx, r2, i2, xxxx > > No, that is not what I want. I want this: > > r0, i0, r1, i1, xxxx, xxxx > > with stride 8 on the last axis - which should be fine. My current > workaround is to do a copy() before view() - thanks Maniteja. > > > My bad, you are absolutely right, Jens... > > I have put together a quick PR > (https://github.com/numpy/numpy/pull/5508) that fixes your use case, > by relaxing the requirements for views of different dtypes. I'd > appreciate if you could take a look at the logic in the code (it is > profusely commented), and see if you can think of other cases that can > be viewed as another dtype that I may have overlooked. Thanks for looking into this. I'll take a look at the code, but it will be a couple of days before I will find the time. Jens J?rgen > > Thanks, > > Jaime > > > Jens J?rgen > > > > > which cannot be hacked in any sensible way. > > > > What I think could be made to work, but also fails, is this: > > > > np.ones((2, 4), complex).reshape(2, 4, 1)[:, :2, :].view(float) > > > > Here the original strides are (64, 16, xx) and the resulting view > > should have strides (64, 32, 8), not sure what trips this. > > > > Jaime > > > > > > - Sebastian > > > > > Jens J?rgen > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > > > >http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > -- > > (\__/) > > ( O.o) > > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus > > planes de dominaci?n mundial. > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus > planes de dominaci?n mundial. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Wed Jan 28 19:56:50 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Wed, 28 Jan 2015 16:56:50 -0800 Subject: [Numpy-discussion] Views of a different dtype Message-ID: HI all, There has been some recent discussion going on on the limitations that numpy imposes to taking views of an array with a different dtype. As of right now, you can basically only take a view of an array if it has no Python objects and neither the old nor the new dtype are structured. Furthermore, the array has to be either C or Fortran contiguous. This seem to be way too strict, but the potential for disaster getting a loosening of the restrictions wrong is big, so it should be handled with care. Allan Haldane and myself have been looking into this separately and discussing some of the details over at github, and we both think that the only true limitation that has to be imposed is that the offsets of Python objects within the new and old dtypes remain compatible. I have expanded Allan's work from here: https://github.com/ahaldane/numpy/commit/e9ca367 to make it as flexible as I have been able. An implementation of the algorithm in Python, with a few tests, can be found here: https://gist.github.com/jaimefrio/b4dae59fa09fccd9638c#file-dtype_compat-py I would appreciate getting some eyes on it for correctness, and to make sure that it won't break with some weird dtype. I am also trying to figure out what the ground rules for stride and shape conversions when taking a view with a different dtype should be. I submitted a PR (gh-5508) a couple for days ago working on that, but I am not so sure that the logic is completely sound. Again, to get more eyes on it, I am going to reproduce my thoughts here on the hope of getting some feedback. The objective would be to always allow a view of a different dtype (given that the dtypes be compatible as described above) to be taken if: - The itemsize of the dtype doesn't change. - The itemsize changes, but the array being viewed is the result of slicing and transposing axes of a contiguous array, and it is still contiguous, defined as stride == dtype.itemsize, along its smallest-strided dimension, and the itemsize of the newtype exactly divides the size of that dimension. - Ideally taking a view should be a reversible process, i.e. if oldtype = arr.dtype, then doing arr.view(newtype).view(oldtype) should give you back a view of arr with the same original shape, strides and dtype. This last point can get tricky if the minimal stride dimension has size 1, as there could be several of those, e.g.: >>> a = np.ones((3, 4, 1), dtype=float)[:, :2, :].transpose(0, 2, 1) >>> a.flags.contiguous False >>> a.shape (3, 1, 2) >>> a.strides # the stride of the size 1 dimension could be anything, ignore it! (32, 8, 8) b = a.view(complex) # this fails right now, but should work >>> b.flags.contiguous False >>> b.shape (3, 1, 1) >>> b.strides # the stride of the size 1 dimensions could be anything, ignore them! (32, 16, 16) c = b.view(float) # which of the two size 1 dimensions should we expand? "In the face of ambiguity refuse the temptation to guess" dictates that last view should raise an error, unless we agree and document some default. Any thoughts? Then there is the endless complication one could get into with arrays created with as_strided. I'm not smart enough to figure when and when not those could work, but am willing to retake the discussion if someone wiser si interested. With all these in mind, my proposal for the new behavior is that taking a view of an array with a different dtype would require: 1. That the newtype and oldtype be compatible, as defined by the algorithm checking object offsets linked above. 2. If newtype.itemsize == oldtype.itemsize no more checks are needed, make it happen! 3. If the array is C/Fortran contiguous, check that the size in bytes of the last/first dimension is evenly divided by newtype.itemsize. If it does, go for it. 4. For non-contiguous arrays: 1. Ignoring dimensions of size 1, check that no stride is smaller than either oldtype.itemsize or newtype.itemsize. If any is found this is an as_strided product, sorry, can't do it! 2. Ignoring dimensions of size 1, find a contiguous dimension, i.e. stride == oldtype.itemsize 1. If found, check that it is the only one with that stride, that it is the minimal stride, and that the size in bytes of that dimension is evenly divided by newitem,itemsize. 2. If none is found, check if there is a size 1 dimension that is also unique (unless we agree on a default, as mentioned above) and that newtype.itemsize evenly divides oldtype.itemsize. Apologies for the long, dense content, but any thought or comments are very welcome. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Jan 28 20:13:27 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 28 Jan 2015 17:13:27 -0800 Subject: [Numpy-discussion] Datetime again In-Reply-To: References: Message-ID: Sorry not to notice this for a while -- I've been distracted by python-ideas. (Nathaniel knows what I'm talking about ;-) ) I do like the idea of prototyping some DateTime stuff -- it really isn't clear what's needed or how to do it at this point. Though we did more or less settle on a reasonable minimum set last summer at SciPy (shame on me for not getting that written up properly!) Chuck -- what have you got in mind for new functionality here? I tend to agree with Nathaniel that a ndarray subclass is less than ideal -- they tend to get ugly fast. But maybe that is the only way to do anything in Python, short of a major refactor to be able to write a dtype in Python -- which would be great, but sure sounds like a major project to me. And as for " The 64 bits of long long really isn't enough and leads to all sorts of compromises". not long enough for what? I've always thought that what we need is the ability to set the epoch. Does anyone ever need picoseconds since 100 years ago? And if they did, we'd be in a heck of a mess with leap seconds and all that anyway. Or is there a use-case I'm not thinking of? -Chris On Thu, Jan 22, 2015 at 12:58 PM, Nathaniel Smith wrote: > On Thu, Jan 22, 2015 at 3:18 PM, Charles R Harris > wrote: > > > > > > On Thu, Jan 22, 2015 at 8:08 AM, Charles R Harris > > wrote: > >> > >> > >> > >> On Thu, Jan 22, 2015 at 7:54 AM, Nathaniel Smith wrote: > >>> > >>> On Thu, Jan 22, 2015 at 2:51 PM, Charles R Harris > >>> wrote: > >>> > Hi All, > >>> > > >>> > I'm playing with the idea of building a simplified datetime class on > >>> > top of > >>> > the current numpy implementation. I believe Pandas does something > like > >>> > this, > >>> > and blaze will (does?) have a simplified version. The reason for the > >>> > new > >>> > class would be to have an easier, and hopefully more portable, API > that > >>> > can > >>> > be implemented in Python, and maybe pushed down into C when things > >>> > settle. > >>> > >>> When you say "datetime class" what do you mean? A dtype? An ndarray > >>> subclass? A python class representing a scalar datetime that you can > >>> put in an object array? ...? > >> > >> > >> I was thinking an ndarray subclass that is based on a single datetime > >> type, but part of the reason for this post is to elicit ideas. I'm > >> influenced by Mark's discussion apropos blaze. I thought it easier to > >> start such a project in python, as it is far easier for people > interested in > >> the problem to work with. > > > > > > And if I had my druthers, it would use quad precision floating point at > it's > > heart. The 64 bits of long long really isn't enough and leads to all > sorts > > of compromises. But that is probably a pipe dream. > > I guess there are lots of options -- e.g. 32-bit day + 64-bit > time-of-day (I think that would give 11.8 million years at > 10-femtisecond precision?). Figuring out which clock this is on > matters a lot more though (e.g. how to handle leap-seconds in absolute > and relative times -- is adding 1 day always the same as adding 24 * > 60 * 60 seconds?). > > At a very general level, I feel like numpy-qua-numpy's role here > shouldn't be to try and add special code to handle any one specific > datetime implementation: that hasn't worked out terribly well > historically, and as referenced above there's a *ton* of plausible > ways of approaching datetime handling that people might want, so we > don't want to be in the position of having to pick the-one-and-only > implementation. Telling people who want to tweak datetime handling > that they have to start mucking around in umath.so is terrible. > > Instead, we should be trying to evolve numpy to add generic > functionality, so that it's prepared to handle multiple third-party > approaches to date-time handling (among other things). > > Implementing prototypes built on top of numpy could be an excellent > way to generate ideas for appropriate changes to the numpy core. > > As far as this specific prototype, I should say that I'm dubious that > subclassing ndarray is actually a *good* long-term solution. I really > think that the *right* way to solve this would be to improve the dtype > system so we could define useful date/time types that worked with > plain vanilla ndarrays. But that approach requires a lot more up-front > C coding; it's harder to throw together a quick prototype. OTOOH if > your goal is the moon then you don't want to waste time investing in > ladder technology... so I dunno. > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Wed Jan 28 21:48:46 2015 From: shoyer at gmail.com (Stephan Hoyer) Date: Wed, 28 Jan 2015 18:48:46 -0800 Subject: [Numpy-discussion] Datetime again In-Reply-To: References: Message-ID: On Wed, Jan 28, 2015 at 5:13 PM, Chris Barker wrote: > I tend to agree with Nathaniel that a ndarray subclass is less than ideal > -- they tend to get ugly fast. But maybe that is the only way to do > anything in Python, short of a major refactor to be able to write a dtype > in Python -- which would be great, but sure sounds like a major project to > me. > My vote would be for using composition rather than inheritance. So DatetimeArray should contain but not be an ndarray, making use of appropriate APIs like __array__, __array_wrap__ and __numpy_ufunc__. And as for " The 64 bits of long long really isn't enough and leads to all > sorts of compromises". not long enough for what? I've always thought that > what we need is the ability to set the epoch. Does anyone ever need > picoseconds since 100 years ago? And if they did, we'd be in a heck of a > mess with leap seconds and all that anyway. > I agree pretty strongly with the Blaze docs with respect to time units. I think fixed precision int64 is probably OK (simplifying things quite a bit), but the ns precision chosen by pandas was probably a mistake (not a big enough range). The main advantage of using a single array for the underlying data is that it's very straightforward to drop in a Cython or Numba or whatever for performance critical steps. In my mind, the main advantage of using floating point math is that NaT (not a time) becomes much easier to represent and work with -- you can share map it to NaN. Handling NaT is a major source of complexity for the datetime operations in pandas. The other thing to consider is how much progress has been made on the datetime dype in DyND, which is where the "numpy replacement" part of Blaze has ended up. I know some sort of datetime object *has* been implemented, though from my tests it does not really appear to be in fully working condition at this point (e.g., there does not appear to be a corresponding timedelta time): https://github.com/libdynd/dynd-python Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Jan 28 23:29:27 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 28 Jan 2015 21:29:27 -0700 Subject: [Numpy-discussion] Datetime again In-Reply-To: References: Message-ID: On Wed, Jan 28, 2015 at 6:13 PM, Chris Barker wrote: > Sorry not to notice this for a while -- I've been distracted by > python-ideas. (Nathaniel knows what I'm talking about ;-) ) > > I do like the idea of prototyping some DateTime stuff -- it really isn't > clear what's needed or how to do it at this point. Though we did more or > less settle on a reasonable minimum set last summer at SciPy (shame on me > for not getting that written up properly!) > > Chuck -- what have you got in mind for new functionality here? I tend to > agree with Nathaniel that a ndarray subclass is less than ideal -- they > tend to get ugly fast. But maybe that is the only way to do anything in > Python, short of a major refactor to be able to write a dtype in Python -- > which would be great, but sure sounds like a major project to me. > I was mostly thinking of implementing a Blaze compatible API without having to rewrite the numpy datetime stuff. But also, I thought it might be an easy way to solve some of our problems, or at least experiment. > > And as for " The 64 bits of long long really isn't enough and leads to > all sorts of compromises". not long enough for what? I've always thought > that what we need is the ability to set the epoch. Does anyone ever need > picoseconds since 100 years ago? And if they did, we'd be in a heck of a > mess with leap seconds and all that anyway. > I was thinking elapsed time. Nanoseconds can be rather crude for that depending on the measurement. Of course, such short times aren't going to come from the system clock, but data collected in other ways, interference between light pulses over microscopic distances for instance. Such data is likely acquired as, or computed, from simple numbers with a unit, which gets us back to the numpy version. But that complicates the heck out of things when you want to start adding times in different units. Chuck > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Thu Jan 29 02:55:57 2015 From: efiring at hawaii.edu (Eric Firing) Date: Wed, 28 Jan 2015 21:55:57 -1000 Subject: [Numpy-discussion] Datetime again In-Reply-To: References: Message-ID: <54C9E78D.5070802@hawaii.edu> On 2015/01/28 6:29 PM, Charles R Harris wrote: > > > And as for "The 64 bits of long long really isn't enough and leads > to all sorts of compromises". not long enough for what? I've always > thought that what we need is the ability to set the epoch. Does > anyone ever need picoseconds since 100 years ago? And if they did, > we'd be in a heck of a mess with leap seconds and all that anyway. > > > I was thinking elapsed time. Nanoseconds can be rather crude for that > depending on the measurement. Of course, such short times aren't going > to come from the system clock, but data collected in other ways, > interference between light pulses over microscopic distances for > instance. Such data is likely acquired as, or computed, from simple > numbers with a unit, which gets us back to the numpy version. But that > complicates the heck out of things when you want to start adding times > in different units. Chuck, For any kind of data like that, I fail to see why any special numpy time type is needed at all. Wouldn't the user just keep elapsed time as a count, or floating point number, in whatever units the instrument spits out? Why does it need to be treated in a different way from any other numeric data? We don't have special types for length. It seems to me that numpy's present experimental datetime64 type has already fallen into the trap of overengineering--trying to be too many things to too many people. The main reason for having a special datetime type is to deal with the calendar mess, and conventional hours-minutes-seconds time. For very short time intervals, all that is irrelevant. Eric From charlesr.harris at gmail.com Thu Jan 29 09:14:20 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 29 Jan 2015 07:14:20 -0700 Subject: [Numpy-discussion] Datetime again In-Reply-To: <54C9E78D.5070802@hawaii.edu> References: <54C9E78D.5070802@hawaii.edu> Message-ID: On Thu, Jan 29, 2015 at 12:55 AM, Eric Firing wrote: > On 2015/01/28 6:29 PM, Charles R Harris wrote: > > > > > > And as for "The 64 bits of long long really isn't enough and leads > > to all sorts of compromises". not long enough for what? I've always > > thought that what we need is the ability to set the epoch. Does > > anyone ever need picoseconds since 100 years ago? And if they did, > > we'd be in a heck of a mess with leap seconds and all that anyway. > > > > > > I was thinking elapsed time. Nanoseconds can be rather crude for that > > depending on the measurement. Of course, such short times aren't going > > to come from the system clock, but data collected in other ways, > > interference between light pulses over microscopic distances for > > instance. Such data is likely acquired as, or computed, from simple > > numbers with a unit, which gets us back to the numpy version. But that > > complicates the heck out of things when you want to start adding times > > in different units. > > Chuck, > > For any kind of data like that, I fail to see why any special numpy time > type is needed at all. Wouldn't the user just keep elapsed time as a > count, or floating point number, in whatever units the instrument spits > out? Why does it need to be treated in a different way from any other > numeric data? We don't have special types for length. It seems to me > that numpy's present experimental datetime64 type has already fallen > into the trap of overengineering--trying to be too many things to too > many people. The main reason for having a special datetime type is to > deal with the calendar mess, and conventional hours-minutes-seconds > time. For very short time intervals, all that is irrelevant. > That's probably what it comes down to in practice. If we *had* quad precision floats, it would be an easy solution, but we don't, so probably the Blaze proposal with 64 bit integers and a fixed tick unit is the easy way to go. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Thu Jan 29 11:33:52 2015 From: allanhaldane at gmail.com (Allan Haldane) Date: Thu, 29 Jan 2015 11:33:52 -0500 Subject: [Numpy-discussion] Views of a different dtype In-Reply-To: References: Message-ID: <54CA60F0.5090109@gmail.com> Hi Jamie, I'm not sure whether to reply here or on github! I have a comment on the condition "There must be the same total number of objects before and after". I originally looked into this to solve https://github.com/numpy/numpy/issues/3256, which involves taking a view of a subset of the fields of a structured array. Such a view causes the final number objects to be less than the original number. For example, say you have an array with three fields a,b,c, and you only want a and c. Numpy currently does the equivalent of this (in _index_fields in numpy/core/_internals.py): >>> a = zeros([(1,2,3), (4,5,6)], ... dtype([('a', 'i8'), ('b', 'i8'), ('c', 'i8')])) >>> a.view(dtype({'names': ['x', 'y'], ... 'formats': ['i8', 'i8'], ... 'offsets': [0, 16]})) array([(1, 3), (4, 6)], dtype={'names':['x','y'], 'formats':[' HI all, > > There has been some recent discussion going on on the limitations that > numpy imposes to taking views of an array with a different dtype. > > As of right now, you can basically only take a view of an array if it > has no Python objects and neither the old nor the new dtype are > structured. Furthermore, the array has to be either C or Fortran contiguous. > > This seem to be way too strict, but the potential for disaster getting a > loosening of the restrictions wrong is big, so it should be handled with > care. > > Allan Haldane and myself have been looking into this separately and > discussing some of the details over at github, and we both think that > the only true limitation that has to be imposed is that the offsets of > Python objects within the new and old dtypes remain compatible. I have > expanded Allan's work from here: > > https://github.com/ahaldane/numpy/commit/e9ca367 > > to make it as flexible as I have been able. An implementation of the > algorithm in Python, with a few tests, can be found here: > > https://gist.github.com/jaimefrio/b4dae59fa09fccd9638c#file-dtype_compat-py > > I would appreciate getting some eyes on it for correctness, and to make > sure that it won't break with some weird dtype. > > I am also trying to figure out what the ground rules for stride and > shape conversions when taking a view with a different dtype should be. I > submitted a PR (gh-5508) a couple for days ago working on that, but I am > not so sure that the logic is completely sound. Again, to get more eyes > on it, I am going to reproduce my thoughts here on the hope of getting > some feedback. > > The objective would be to always allow a view of a different dtype > (given that the dtypes be compatible as described above) to be taken if: > > * The itemsize of the dtype doesn't change. > * The itemsize changes, but the array being viewed is the result of > slicing and transposing axes of a contiguous array, and it is still > contiguous, defined as stride == dtype.itemsize, along its > smallest-strided dimension, and the itemsize of the newtype exactly > divides the size of that dimension. > * Ideally taking a view should be a reversible process, i.e. if > oldtype = arr.dtype, then doing arr.view(newtype).view(oldtype) > should give you back a view of arr with the same original shape, > strides and dtype. > > This last point can get tricky if the minimal stride dimension has size > 1, as there could be several of those, e.g.: > > >>> a = np.ones((3, 4, 1), dtype=float)[:, :2, :].transpose(0, 2, 1) > >>> a.flags.contiguous > False > >>> a.shape > (3, 1, 2) > >>> a.strides # the stride of the size 1 dimension could be > anything, ignore it! > (32, 8, 8) > > b = a.view(complex) # this fails right now, but should work > >>> b.flags.contiguous > False > >>> b.shape > (3, 1, 1) > >>> b.strides # the stride of the size 1 dimensions could be > anything, ignore them! > (32, 16, 16) > > c = b.view(float) # which of the two size 1 dimensions should we > expand? > > > "In the face of ambiguity refuse the temptation to guess" dictates that > last view should raise an error, unless we agree and document some > default. Any thoughts? > > Then there is the endless complication one could get into with arrays > created with as_strided. I'm not smart enough to figure when and when > not those could work, but am willing to retake the discussion if someone > wiser si interested. > > With all these in mind, my proposal for the new behavior is that taking > a view of an array with a different dtype would require: > > 1. That the newtype and oldtype be compatible, as defined by the > algorithm checking object offsets linked above. > 2. If newtype.itemsize == oldtype.itemsize no more checks are needed, > make it happen! > 3. If the array is C/Fortran contiguous, check that the size in bytes > of the last/first dimension is evenly divided by newtype.itemsize. > If it does, go for it. > 4. For non-contiguous arrays: > 1. Ignoring dimensions of size 1, check that no stride is smaller > than either oldtype.itemsize or newtype.itemsize. If any is > found this is an as_strided product, sorry, can't do it! > 2. Ignoring dimensions of size 1, find a contiguous dimension, i.e. > stride == oldtype.itemsize > 1. If found, check that it is the only one with that stride, > that it is the minimal stride, and that the size in bytes of > that dimension is evenly divided by newitem,itemsize. > 2. If none is found, check if there is a size 1 dimension that > is also unique (unless we agree on a default, as mentioned > above) and that newtype.itemsize evenly divides > oldtype.itemsize. > > Apologies for the long, dense content, but any thought or comments are > very welcome. > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus > planes de dominaci?n mundial. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From njs at pobox.com Thu Jan 29 11:57:57 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 29 Jan 2015 16:57:57 +0000 Subject: [Numpy-discussion] Views of a different dtype In-Reply-To: References: Message-ID: On Thu, Jan 29, 2015 at 12:56 AM, Jaime Fern?ndez del R?o wrote: [...] > With all these in mind, my proposal for the new behavior is that taking a > view of an array with a different dtype would require: > > That the newtype and oldtype be compatible, as defined by the algorithm > checking object offsets linked above. > If newtype.itemsize == oldtype.itemsize no more checks are needed, make it > happen! > If the array is C/Fortran contiguous, check that the size in bytes of the > last/first dimension is evenly divided by newtype.itemsize. If it does, go > for it. > For non-contiguous arrays: > > Ignoring dimensions of size 1, check that no stride is smaller than either > oldtype.itemsize or newtype.itemsize. If any is found this is an as_strided > product, sorry, can't do it! > Ignoring dimensions of size 1, find a contiguous dimension, i.e. stride == > oldtype.itemsize > > If found, check that it is the only one with that stride, that it is the > minimal stride, and that the size in bytes of that dimension is evenly > divided by newitem,itemsize. > If none is found, check if there is a size 1 dimension that is also unique > (unless we agree on a default, as mentioned above) and that newtype.itemsize > evenly divides oldtype.itemsize. I'm really wary of this idea that we go grovelling around looking for some suitable dimension somewhere to absorb the new items. Basically nothing in numpy produces semantically different arrays (e.g., ones with different shapes) depending on the *strides* of the input array. Could we make it more like: check to see if the last dimension works. If not, raise an error (and let the user transpose some other dimension there if that's what they wanted)? Or require the user to specify which dimension will absorb the shape change? (If we were doing this from scratch, then it would be tempting to just say that we always add a new dimension at the end with newtype.itemsize / oldtype.itemsize entries, or absorb such a dimension if shrinking. As a bonus, this would always work, regardless of contiguity! Except that when shrinking the last dimension would have to be contiguous, of course.) I guess the main consideration for this is that we may be stuck with stuff b/c of backwards compatibility. Can you maybe say a little bit about what is allowed now, and what constraints that puts on things? E.g. are we already grovelling around in strides and picking random dimensions in some cases? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From allanhaldane at gmail.com Thu Jan 29 11:59:07 2015 From: allanhaldane at gmail.com (Allan Haldane) Date: Thu, 29 Jan 2015 11:59:07 -0500 Subject: [Numpy-discussion] Views of a different dtype In-Reply-To: References: Message-ID: <54CA66DB.1010801@gmail.com> Hello again, I also have a minor code comment: In get_object_offsets you iterate over dtype.fields.values(). Be careful, because dtype.fields also includes the field titles. For example this fails: dta = np.dtype([(('a', 'title'), 'O'), ('b', 'O'), ('c', 'i1')]) dtb = np.dtype([('a', 'O'), ('b', 'O'), ('c', 'i1')]) assert dtype_view_is_safe(dta, dtb) I've seen two strategies in the numpy code to work around this. One is to to skip entries that are titles, like this: for key,field in dtype.fields.iteritems(): if len(field) == 3 and field[2] == key: #detect titles continue #do something You can find all examples that do this by grepping NPY_TITLE_KEY in the numpy source. The other (more popular) strategy is to iterate over dtype.names. You can find all examples of this by grepping for names_size. I don't know the history of it, but it looks to me like "titles" in dtypes are an obsolete feature. Are they actually used anywhere? Allan On 01/28/2015 07:56 PM, Jaime Fern?ndez del R?o wrote: > HI all, > > There has been some recent discussion going on on the limitations that > numpy imposes to taking views of an array with a different dtype. > > As of right now, you can basically only take a view of an array if it > has no Python objects and neither the old nor the new dtype are > structured. Furthermore, the array has to be either C or Fortran contiguous. > > This seem to be way too strict, but the potential for disaster getting a > loosening of the restrictions wrong is big, so it should be handled with > care. > > Allan Haldane and myself have been looking into this separately and > discussing some of the details over at github, and we both think that > the only true limitation that has to be imposed is that the offsets of > Python objects within the new and old dtypes remain compatible. I have > expanded Allan's work from here: > > https://github.com/ahaldane/numpy/commit/e9ca367 > > to make it as flexible as I have been able. An implementation of the > algorithm in Python, with a few tests, can be found here: > > https://gist.github.com/jaimefrio/b4dae59fa09fccd9638c#file-dtype_compat-py > > I would appreciate getting some eyes on it for correctness, and to make > sure that it won't break with some weird dtype. > > I am also trying to figure out what the ground rules for stride and > shape conversions when taking a view with a different dtype should be. I > submitted a PR (gh-5508) a couple for days ago working on that, but I am > not so sure that the logic is completely sound. Again, to get more eyes > on it, I am going to reproduce my thoughts here on the hope of getting > some feedback. > > The objective would be to always allow a view of a different dtype > (given that the dtypes be compatible as described above) to be taken if: > > * The itemsize of the dtype doesn't change. > * The itemsize changes, but the array being viewed is the result of > slicing and transposing axes of a contiguous array, and it is still > contiguous, defined as stride == dtype.itemsize, along its > smallest-strided dimension, and the itemsize of the newtype exactly > divides the size of that dimension. > * Ideally taking a view should be a reversible process, i.e. if > oldtype = arr.dtype, then doing arr.view(newtype).view(oldtype) > should give you back a view of arr with the same original shape, > strides and dtype. > > This last point can get tricky if the minimal stride dimension has size > 1, as there could be several of those, e.g.: > > >>> a = np.ones((3, 4, 1), dtype=float)[:, :2, :].transpose(0, 2, 1) > >>> a.flags.contiguous > False > >>> a.shape > (3, 1, 2) > >>> a.strides # the stride of the size 1 dimension could be > anything, ignore it! > (32, 8, 8) > > b = a.view(complex) # this fails right now, but should work > >>> b.flags.contiguous > False > >>> b.shape > (3, 1, 1) > >>> b.strides # the stride of the size 1 dimensions could be > anything, ignore them! > (32, 16, 16) > > c = b.view(float) # which of the two size 1 dimensions should we > expand? > > > "In the face of ambiguity refuse the temptation to guess" dictates that > last view should raise an error, unless we agree and document some > default. Any thoughts? > > Then there is the endless complication one could get into with arrays > created with as_strided. I'm not smart enough to figure when and when > not those could work, but am willing to retake the discussion if someone > wiser si interested. > > With all these in mind, my proposal for the new behavior is that taking > a view of an array with a different dtype would require: > > 1. That the newtype and oldtype be compatible, as defined by the > algorithm checking object offsets linked above. > 2. If newtype.itemsize == oldtype.itemsize no more checks are needed, > make it happen! > 3. If the array is C/Fortran contiguous, check that the size in bytes > of the last/first dimension is evenly divided by newtype.itemsize. > If it does, go for it. > 4. For non-contiguous arrays: > 1. Ignoring dimensions of size 1, check that no stride is smaller > than either oldtype.itemsize or newtype.itemsize. If any is > found this is an as_strided product, sorry, can't do it! > 2. Ignoring dimensions of size 1, find a contiguous dimension, i.e. > stride == oldtype.itemsize > 1. If found, check that it is the only one with that stride, > that it is the minimal stride, and that the size in bytes of > that dimension is evenly divided by newitem,itemsize. > 2. If none is found, check if there is a size 1 dimension that > is also unique (unless we agree on a default, as mentioned > above) and that newtype.itemsize evenly divides > oldtype.itemsize. > > Apologies for the long, dense content, but any thought or comments are > very welcome. > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus > planes de dominaci?n mundial. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From chris.barker at noaa.gov Thu Jan 29 12:58:45 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 29 Jan 2015 09:58:45 -0800 Subject: [Numpy-discussion] Datetime again In-Reply-To: <54C9E78D.5070802@hawaii.edu> References: <54C9E78D.5070802@hawaii.edu> Message-ID: > > > I was thinking elapsed time. Nanoseconds can be rather crude for that > > depending on the measurement. > > Wouldn't the user just keep elapsed time as a > count, or floating point number, in whatever units the instrument spits > out? Why does it need to be treated in a different way from any other > numeric data? We don't have special types for length. I've wondered about this since np.datetime was first introduced. I know I have no need for better than second precision. Not that other's don't have that need, but is there even a single use-case of someone wanting nano, or sub-nanosecond precision and dates and calendar functionality in one array? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From sransom at nrao.edu Thu Jan 29 17:39:27 2015 From: sransom at nrao.edu (Scott Ransom) Date: Thu, 29 Jan 2015 14:39:27 -0800 Subject: [Numpy-discussion] Datetime again In-Reply-To: References: <54C9E78D.5070802@hawaii.edu> Message-ID: <54CAB69F.4060205@nrao.edu> On 01/29/2015 09:58 AM, Chris Barker wrote: > > I was thinking elapsed time. Nanoseconds can be rather crude for that > > depending on the measurement. > > Wouldn't the user just keep elapsed time as a > count, or floating point number, in whatever units the instrument spits > out? Why does it need to be treated in a different way from any other > numeric data? We don't have special types for length. > > > I've wondered about this since np.datetime was first introduced. I know > I have no need for better than second precision. Not that other's don't > have that need, but is there even a single use-case of someone wanting > nano, or sub-nanosecond precision and dates and calendar functionality > in one array? > > -Chris Millisecond pulsar timing does have need of absolute and relative time at the (single) nanosecond-precision level over time-spans of several decades. You are correct that doing that kind of work, though, requires *very* special consideration of the various time scales and their idiosyncrasies (like leap seconds). So I don't think we would use a numpy datetime array or object. We have recently started using the very nice Time object in the AstroPy package. That can (and does) do all of this stuff correctly. Internally, times are treated as pairs of 64-bit floats. Cheers, Scott -- Scott M. Ransom Address: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: sransom at nrao.edu Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From ndarray at mac.com Thu Jan 29 20:32:30 2015 From: ndarray at mac.com (Alexander Belopolsky) Date: Thu, 29 Jan 2015 20:32:30 -0500 Subject: [Numpy-discussion] 3D array and the right hand rule In-Reply-To: References: Message-ID: On Mon, Jan 26, 2015 at 6:06 AM, Dieter Van Eessen < dieter.van.eessen at gmail.com> wrote: > I've read that numpy.array isn't arranged according to the > 'right-hand-rule' (right-hand-rule => thumb = +x; index finger = +y, bend > middle finder = +z). This is also confirmed by an old message I dug up from > the mailing list archives. (see message below) > Dieter, It looks like you are confusing dimensionality of the array with the dimensionality of a vector that it might store. If you are interested in using numpy for 3D modeling, you will likely only encounter 1-dimensional arrays (vectors) of size 3 and 2-dimensional arrays (matrices) of size 9 or shape (3, 3). A 3-dimensional array is a stack of matrices and the 'right-hand-rule' does not really apply. The notion of C/F-contiguous deals with the order of axes (e.g. width first or depth first) while the right-hand-rule is about the direction of the axes (if you "flip" the middle finger right hand becomes left.) In the case of arrays this would probably correspond to little-endian vs. big-endian: is a[0] stored at a higher or lower address than a[1]. However, whatever the answer to this question is for a particular system, it is the same for all axes in the array, so right-hand - left-hand distinction does not apply. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Thu Jan 29 20:57:28 2015 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 29 Jan 2015 20:57:28 -0500 Subject: [Numpy-discussion] question np.partition Message-ID: It sounds like np.partition could be used to answer the question: give me the highest K elements in a vector. Is this a correct interpretation? Something like partial sort, but returned elements are unsorted. I could really make some use of this, but in my case it is a list of objects I need to sort on a particular key. Is this algorithm available in general python code (not specific to numpy arrays)? -- -- Those who don't understand recursion are doomed to repeat it From cmkleffner at gmail.com Fri Jan 30 05:09:11 2015 From: cmkleffner at gmail.com (Carl Kleffner) Date: Fri, 30 Jan 2015 11:09:11 +0100 Subject: [Numpy-discussion] Testing of scipy In-Reply-To: <54CA7DC9.2000502@ncf.ca> References: <54CA7DC9.2000502@ncf.ca> Message-ID: Hi Colin. this is an interesting test with different hardware. As a summary: - Python-2.7 amd64 - numpy-0.1.9.openblas: OK - scipy-0.15.1.openblas: 2 errors 11 failures - CPU: AMD A8-5600K APU (piledriver) scipy errors and failures due to piledriver: (1) ERROR: test_improvement (test_quadpack.TestCtypesQuad) WindowsError: [Error 193] %1 is not a valid Win32 application (2) ERROR: test_typical (test_quadpack.TestCtypesQuad) WindowsError: [Error 193] %1 is not a valid Win32 application (3) FAIL: test_interpolate.TestInterp1D.test_circular_refs ReferenceError: Remaining reference(s) to object (4) FAIL: test__gcutils.test_assert_deallocated ReferenceError: Remaining reference(s) to object other failures are known failures due to mingw-w64 / openblas build. (1) and (2) is a problem with ctypes.util.find_msvcrt(). This methods seems to be buggy. Not a scipy problem. Maybe the test could be enhanced. (3) and (4) is a problem due to garbage collection. No idea why. Maybe you can file a bug for (1) ... (4) Carl 2015-01-29 19:36 GMT+01:00 cjw : > Carl, > > I have already sent the test result for numpy. Here is the test result > for scipy. > > I hope that it helps. > > Colin W. > > ------------------------------ > *** Python 2.7.9 (default, Dec 10 2014, 12:28:03) [MSC v.1500 64 bit > (AMD64)] on win32. *** > >>> > [Dbg]>>> > C:\Python27\lib\site-packages\numpy\core\__init__.py:6: Warning: Numpy > 64bit experimental build with Mingw-w64 and OpenBlas. > from . import multiarray > Running unit tests for scipy > NumPy version 1.9.1 > NumPy is installed in C:\Python27\lib\site-packages\numpy > SciPy version 0.15.1 > SciPy is installed in C:\Python27\lib\site-packages\scipy > Python version 2.7.9 (default, Dec 10 2014, 12:28:03) [MSC v.1500 64 bit > (AMD64)] > nose version 1.3.4 > C:\Python27\lib\site-packages\numpy\lib\utils.py:95: DeprecationWarning: > `scipy.lib.blas` is deprecated, use `scipy.linalg.blas` instead! > warnings.warn(depdoc, DeprecationWarning) > C:\Python27\lib\site-packages\numpy\lib\utils.py:95: DeprecationWarning: > `scipy.lib.lapack` is deprecated, use `scipy.linalg.lapack` instead! > warnings.warn(depdoc, DeprecationWarning) > C:\Python27\lib\site-packages\numpy\lib\utils.py:95: DeprecationWarning: > `scipy.weave` is deprecated, use `weave` instead! > warnings.warn(depdoc, DeprecationWarning) > ...............................................................................................................................................................................................................................................................K......................................................................................................................................................................................................................................................................................EE.........................................................................................................................................F.....K..K............................................................................................................................................................................................................................................................................................................... > ....... > ....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................SSSSSS......SSSSSS......SSSS......F....................... > ........ > ............................................................................F............S..........K..........................................................................................................................................................................................................................................F....................................................K.............................................................................................................................................................................................K...............................................................................................SSSSSSSS.K.................S................................................................................................................................................................................................................................................................................................ > ........ > .....................................S........................................................................................................................................................................................C:\Python27\lib\site-packages\scipy\optimize\minpack.py:604: > OptimizeWarning: Covariance of the parameters could not be estimated > category=OptimizeWarning) > .....................................................................................................................................................F.................K.F.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... > ....... > ......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................F.....................................................................................................................................................................................................................................................SS..........S....S...SSSSS.SSSSSSSSSS.....SSS..S.....S....K...........S..S........SSSSK...SSSS..S.........S........SS..........S...S...SSSSS.SSSSSSSSSS.....SSS........S................S..S...... > ..SSSS.. > .SSSS..S............S..............................S.................................................S........................S..........S................................S...................................................S........................S..........S..........SS.......S...S...SSSSS.SSSSSSSSSS.....SSS..S.....S............SS..S..S........SSSS....SSSS..S.........S......K.....................S.....K...................S..................SS...S.........................S..........S.........................S.................................................SS..S...........................S.........S........SS..........S....S...SSSSS.SSSSSSSSSS.....SSS..S.....S....K...........S..S........SSSSK...SSSS..S.........S........SS..........S...S...SSSSS.SSSSSSSSSS.....SSS........S................S..S........SSSS...SSSS..S............S..............................S.................................................S........................S..........S................................S... > ........ > ........................................S........................S..........S..........SS.......S...S...SSSSS.SSSSSSSSSS.....SSS..S.....S............SS..S..S........SSSS....SSSS..S.........S......K.....................S.....K...................S..................SS...S.........................S..........S.........................S.................................................SS..S...........................S.........S........SS..........S....S...SSSSS.SSSSSSSSSS.....SSS..S.....S....K...........S..S........SSSSK...SSSS..S.........S........SS..........S...S...SSSSS.SSSSSSSSSS.....SSS........S................S..S........SSSS...SSSS..S............S..............................S.................................................S........................S..........S................................S...................................................S........................S..........S..........SS.......S...S...SSSSS.SSSSSSSSSS.....SSS..S.....S............SS..S..S........SSSS....S > SSS..S.. > .......S......K.....................S.....K...................S..................SS...S.........................S..........S.........................S.................................................SS..S...........................S.........S........SS..........S....S...SSSSS.SSSSSSSSSS.....SSS..S.....S....K...........S..S........SSSSK...SSSS..S.........S........SS..........S...S...SSSSS.SSSSSSSSSS.....SSS........S................S..S........SSSS...SSSS..S............S..............................S.................................................S........................S..........S................................S...................................................S........................S..........S..........SS.......S...S...SSSSS.SSSSSSSSSS.....SSS..S.....S............SS..S..S........SSSS....SSSS..S.........S......K.....................S.....K...................S..................SS...S.........................S..........S.........................S........................ > ........ > .................SS..S...........................S.........S........SS..........S...S...SSSSS.SSSSSSSSSS.....SSS..S.....S....K...........S..S........SSSSK...SSSS..S.........S........SS..........S...S...SSSSS.SSSSSSSSSS.....SSS........S................S..S........SSSS...SSSS..S............S..............................S.................................................S........................S..........S................................S...................................................S........................S..........S..........SS.......S...S...SSSSS.SSSSSSSSSS.....SSS..S.....S............SS..S..S........SSSS....SSSS..S.........S......K.....................S.....K...................S..................SS...S.........................S..........S.........................S.................................................SS..S...........................S.........S..............SSSSSSSSSSSSSSSSSSSSSSSSSSSSS...........S....S......SSSS.SSSSSSSSSS.......SSS..S..........S....K..... > ........ > .........................................S....S..............SSSSK...SSSS...................................S.........S.........K............SSK.K......KS....SK.KKSSSS.SSSSSSSSSS....KSSS..K.K......S....KKK...K...K.....SK.S..............SSSSK...SSSS...K...............................SK.............SSSSSSSSSSSSSSSSSSSSSSSSSSSSS...........S...S..SSS.SSSS.SSSSSSSSSS....SSSSSS....SSS......S.....SSSSSS................................................SSSS.S..............SSSS...SSSS......................................S.........S.........K............SSK.K.......S...SK.SSS.SSSS.SSSSSSSSSS....SSSSSS..K.SSS......S.....SSSSSS..........................................K.....SSSS.S..............SSSS....SSSS...K..................................SK..................SSSSSSSSSSSSSSSSSSSSSSSSSSS............................................................................................................................................................................S.............................. > ........ > ....................................................................................................................................................................................................................................................................................................................................S..........S..........K.................K.K...........K..K...................................K...........................K...............K.................................................................K......S......K.........................................................................................................................................................................................................................................................................................................................K.K.......................................S.K...................SSSSSSSSSSSSSSSSSSSSSSSSSSS............................................ > ........ > ..........................................................................................................................S..........................................................................................................................................................................................................................................................................................................................................................................S..........S..........K.................K.K...........K..K...................................K...........................K.................K.................................................................K......S.................................................................................................................................................................................................................................................................................................... > ........ > ....................K.K.......................................S.K..............SSSSSSSSSSSSSSSSSSSSSSSSSSSSS........S...S..SSS.SSSS.SSSSSSSSSS....SSSSSS..S.SSS......S.....SSSSSS..........................................SS....SSSS.S..............SSSS....SSSS...................................S.........S....................K....SSSSSSSSSSSSSSSSSSSSSSSSSSS.SSS.............SSS....................K.............SSS...................S.SSS............SSSSSS..........................................SS.....SSSS.............................................................................................................................................................................................................................................................................................................................................................SSSSSSSSSSSSSSSSSSSS...............SSSSSSSSSSSSSSSSSSSSSSSSSSS...............SSS...................................SSS................ > .....SSS > .............SSS...SSS..........................................SS....SSSS..............................................................................................................................................................................................................................................................................................................................................................SSSSSSSSSSSSSSSSSSSS.................................................................S...S............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ > ........ > .........................................................F..F............................................................................................................................................................................................................................................................................................................................................................................................F...........................................F......................................K.K.............................K.....K..K...KK........................SSSSSSSS................................................................................................................................................................................................................................................................................................................................................................................................... > ........ > .......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................K...........K.....................................................................................................................................................................................................................................................................................................................................K......................................K.................................................................................S........................................................... > ........ > .......................................................................................................................................................................................................................................................S..S..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................S.SSS....SSSSSS................................................................................................. > ====================================================================== > ERROR: test_improvement (test_quadpack.TestCtypesQuad) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "C:\Python27\lib\site-packages\numpy\testing\decorators.py", line > 146, in skipper_func > return f(*args, **kwargs) > File > "C:\Python27\lib\site-packages\scipy\integrate\tests\test_quadpack.py", > line 42, in setUp > self.lib = ctypes.CDLL(file) > File "C:\Python27\Lib\ctypes\__init__.py", line 365, in __init__ > self._handle = _dlopen(self._name, mode) > WindowsError: [Error 193] %1 is not a valid Win32 application > > ====================================================================== > ERROR: test_typical (test_quadpack.TestCtypesQuad) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "C:\Python27\lib\site-packages\numpy\testing\decorators.py", line > 146, in skipper_func > return f(*args, **kwargs) > File > "C:\Python27\lib\site-packages\scipy\integrate\tests\test_quadpack.py", > line 42, in setUp > self.lib = ctypes.CDLL(file) > File "C:\Python27\Lib\ctypes\__init__.py", line 365, in __init__ > self._handle = _dlopen(self._name, mode) > WindowsError: [Error 193] %1 is not a valid Win32 application > > ====================================================================== > FAIL: test_interpolate.TestInterp1D.test_circular_refs > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest > self.test(*self.arg) > File > "C:\Python27\lib\site-packages\scipy\interpolate\tests\test_interpolate.py", > line 353, in test_circular_refs > del interp > File "C:\Python27\Lib\contextlib.py", line 24, in __exit__ > self.gen.next() > File "C:\Python27\lib\site-packages\scipy\lib\_gcutils.py", line 96, in > assert_deallocated > raise ReferenceError("Remaining reference(s) to object") > ReferenceError: Remaining reference(s) to object > > ====================================================================== > FAIL: test__gcutils.test_assert_deallocated > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest > self.test(*self.arg) > File "C:\Python27\lib\site-packages\scipy\lib\tests\test__gcutils.py", > line 57, in test_assert_deallocated > del c > File "C:\Python27\Lib\contextlib.py", line 24, in __exit__ > self.gen.next() > File "C:\Python27\lib\site-packages\scipy\lib\_gcutils.py", line 96, in > assert_deallocated > raise ReferenceError("Remaining reference(s) to object") > ReferenceError: Remaining reference(s) to object > > ====================================================================== > FAIL: test_syr_her (test_blas.TestFBLAS2Simple) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "C:\Python27\lib\site-packages\scipy\linalg\tests\test_blas.py", > line 316, in test_syr_her > resz_reverse, rtol=rtol) > File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 1297, > in assert_allclose > verbose=verbose, header=header) > File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 665, > in assert_array_compare > raise AssertionError(msg) > AssertionError: > Not equal to tolerance rtol=1e-07, atol=0 > > (mismatch 100.0%) > x: array([[ 0.000000e+00+0.j, 0.000000e+00+0.j, 0.000000e+00+0.j, > 0.000000e+00+0.j], > [ 0.000000e+00+0.j, 0.000000e+00+0.j, 0.000000e+00+0.j,... > y: array([[-15.+112.j, -13. +82.j, -11. +52.j, -9. +22.j], > [ 0. +0.j, -11. +60.j, -9. +38.j, -7. +16.j], > [ 0. +0.j, 0. +0.j, -7. +24.j, -5. +10.j], > [ 0. +0.j, 0. +0.j, 0. +0.j, -3. +4.j]]) > > ====================================================================== > FAIL: test_decomp.test_eigh('general ', 6, 'F', True, False, False, (2, 4)) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest > self.test(*self.arg) > File "C:\Python27\lib\site-packages\scipy\linalg\tests\test_decomp.py", > line 648, in eigenhproblem_general > assert_array_almost_equal(diag2_, ones(diag2_.shape[0]), DIGITS[dtype]) > File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 842, > in assert_array_almost_equal > precision=decimal) > File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 665, > in assert_array_compare > raise AssertionError(msg) > AssertionError: > Arrays are not almost equal to 4 decimals > > (mismatch 100.0%) > x: array([ 0., 0., 0.], dtype=float32) > y: array([ 1., 1., 1.]) > > ====================================================================== > FAIL: Tests for the minimize wrapper. > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest > self.test(*self.arg) > File > "C:\Python27\lib\site-packages\scipy\optimize\tests\test_optimize.py", line > 512, in test_minimize > self.test_powell(True) > File > "C:\Python27\lib\site-packages\scipy\optimize\tests\test_optimize.py", line > 241, in test_powell > atol=1e-14, rtol=1e-7) > File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 1297, > in assert_allclose > verbose=verbose, header=header) > File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 665, > in assert_array_compare > raise AssertionError(msg) > AssertionError: > Not equal to tolerance rtol=1e-07, atol=1e-14 > > (mismatch 100.0%) > x: array([[ 0.750776, -0.441569, 0.47101 ], > [ 0.750776, -0.441569, 0.480525], > [ 1.501553, -0.883139, 0.951535],... > y: array([[ 0.72949 , -0.441569, 0.47101 ], > [ 0.72949 , -0.441569, 0.480525], > [ 1.45898 , -0.883139, 0.951535],... > > ====================================================================== > FAIL: Powell (direction set) optimization routine > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest > self.test(*self.arg) > File > "C:\Python27\lib\site-packages\scipy\optimize\tests\test_optimize.py", line > 241, in test_powell > atol=1e-14, rtol=1e-7) > File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 1297, > in assert_allclose > verbose=verbose, header=header) > File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 665, > in assert_array_compare > raise AssertionError(msg) > AssertionError: > Not equal to tolerance rtol=1e-07, atol=1e-14 > > (mismatch 100.0%) > x: array([[ 0.750776, -0.441569, 0.47101 ], > [ 0.750776, -0.441569, 0.480525], > [ 1.501553, -0.883139, 0.951535],... > y: array([[ 0.72949 , -0.441569, 0.47101 ], > [ 0.72949 , -0.441569, 0.480525], > [ 1.45898 , -0.883139, 0.951535],... > > ====================================================================== > FAIL: test_arpack.test_linearoperator_deallocation > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest > self.test(*self.arg) > File > "C:\Python27\lib\site-packages\scipy\sparse\linalg\eigen\arpack\tests\test_arpack.py", > line 798, in test_linearoperator_deallocation > pass > File "C:\Python27\Lib\contextlib.py", line 24, in __exit__ > self.gen.next() > File "C:\Python27\lib\site-packages\scipy\lib\_gcutils.py", line 96, in > assert_deallocated > raise ReferenceError("Remaining reference(s) to object") > ReferenceError: Remaining reference(s) to object > > ====================================================================== > FAIL: test_beta (test_basic.TestCephes) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "C:\Python27\lib\site-packages\scipy\special\tests\test_basic.py", > line 144, in test_beta > assert_allclose(cephes.beta(0.0342, 171), 24.070498359873497, > rtol=1e-14, atol=0) > File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 1297, > in assert_allclose > verbose=verbose, header=header) > File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 665, > in assert_array_compare > raise AssertionError(msg) > AssertionError: > Not equal to tolerance rtol=1e-14, atol=0 > > (mismatch 100.0%) > x: array(24.07049835987154) > y: array(24.070498359873497) > > ====================================================================== > FAIL: test_betaincinv (test_basic.TestCephes) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "C:\Python27\lib\site-packages\scipy\special\tests\test_basic.py", > line 157, in test_betaincinv > assert_allclose(cephes.betaincinv(0.0342, 171, 0.25), > 8.4231316935498957e-21, rtol=1e-12, atol=0) > File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 1297, > in assert_allclose > verbose=verbose, header=header) > File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 665, > in assert_array_compare > raise AssertionError(msg) > AssertionError: > Not equal to tolerance rtol=1e-12, atol=0 > > (mismatch 100.0%) > x: array(8.423131693529815e-21) > y: array(8.423131693549896e-21) > > ====================================================================== > FAIL: test_data.test_boost( binomial_data_ipp-binomial_data>,) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest > self.test(*self.arg) > File "C:\Python27\lib\site-packages\scipy\special\tests\test_data.py", > line 481, in _test_factory > test.check(dtype=dtype) > File "C:\Python27\lib\site-packages\scipy\special\_testutils.py", line > 292, in check > assert_(False, "\n".join(msg)) > File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 53, in > assert_ > raise AssertionError(smsg) > AssertionError: > Max |adiff|: 9.70919e+24 > Max |rdiff|: 7.10315e-14 > Bad results (70 out of 159) for the following points (in output 0): > 42.0 21.0 > => 538257874440.0024 != 538257874440.0 > (rdiff 4.422361858108481e-15) > 46.0 21.0 > => 6943526580276.032 != 6943526580276.0 > (rdiff 4.6412384438109286e-15) > 82.0 21.0 => > 1.8330655594514242e+19 != 1.8330655594514647e+19 (rdiff > 2.21216310518291e-14) > 82.0 29.0 => > 1.2576607627509685e+22 != 1.2576607627510093e+22 (rdiff > 3.2516291524073137e-14) > 82.0 33.0 => > 8.999865905487466e+22 != 8.999865905487887e+22 (rdiff > 4.679048843863541e-14) > 82.0 42.0 => > 4.146706622571353e+23 != 4.1467066225715386e+23 (rdiff > 4.482872076557015e-14) > 82.0 46.0 => > 2.322318339533674e+23 != 2.3223183395337372e+23 (rdiff > 2.71635163388777e-14) > 95.0 21.0 => > 6.112305389981363e+20 != 6.112305389981495e+20 (rdiff > 2.165839426429588e-14) > 95.0 29.0 => > 2.146280142106093e+24 != 2.1462801421060996e+24 (rdiff > 3.0016822210721096e-15) > 95.0 33.0 => > 3.7802224438384017e+25 != 3.780222443838485e+25 (rdiff > 2.204165674911803e-14) > 95.0 42.0 => > 1.7198752232706476e+27 != 1.7198752232706825e+27 (rdiff > 2.0297690035618223e-14) > 95.0 46.0 => > 3.086205608690921e+27 != 3.08620560869098e+27 (rdiff > 1.923839025264666e-14) > 122.0 21.0 > => 2.05054879430629e+23 != 2.050548794306221e+23 > (rdiff 3.370908808019204e-14) > 122.0 29.0 => > 9.654999814547143e+27 != 9.65499981454662e+27 (rdiff > 5.4093012244985185e-14) > 122.0 33.0 => > 6.889061799493289e+29 != 6.88906179949298e+29 (rdiff > 4.473977857490729e-14) > 122.0 42.0 => > 9.820488930268602e+32 != 9.820488930268152e+32 (rdiff > 4.5785845286258346e-14) > 122.0 46.0 => > 9.517963588769975e+33 != 9.517963588769497e+33 (rdiff > 5.0148279981909684e-14) > 122.0 82.0 => > 2.546052685625168e+32 != 2.5460526856250765e+32 (rdiff > 3.594314640260528e-14) > 122.0 95.0 > => 8.77923835320522e+26 != 8.779238353204766e+26 > (rdiff 5.1661491374367e-14) > 125.0 21.0 => > 3.577965774451988e+23 != 3.577965774451971e+23 (rdiff > 4.689037586607358e-15) > 125.0 29.0 => > 2.1471697865847835e+28 != 2.1471697865846783e+28 (rdiff > 4.895435483124065e-14) > 125.0 33.0 => > 1.7431114722001235e+30 != 1.7431114722001072e+30 (rdiff > 9.365751364490987e-15) > 125.0 42.0 => > 3.3961976443365834e+33 != 3.396197644336376e+33 (rdiff > 6.110535739146696e-14) > 125.0 46.0 => > 3.824445086978443e+34 != 3.8244450869782213e+34 (rdiff > 5.800111975502932e-14) > 125.0 82.0 => > 6.555451266975118e+33 != 6.555451266974865e+33 (rdiff > 3.851600739996281e-14) > 125.0 95.0 => > 6.870943317071316e+28 != 6.870943317070972e+28 (rdiff > 5.018333445043497e-14) > 135.0 29.0 => > 2.6545984839545678e+29 != 2.654598483954587e+29 (rdiff > 7.289767083732218e-15) > 135.0 33.0 => > 3.2226838569785926e+31 != 3.222683856978583e+31 (rdiff > 2.9346841443966352e-15) > 135.0 42.0 > => 1.65539796634103e+35 != 1.6553979663410087e+35 > (rdiff 1.2814897756371891e-14) > 135.0 46.0 => > 2.9618674395653716e+36 != 2.9618674395653185e+36 (rdiff > 1.7936867201619373e-14) > 135.0 82.0 => > 1.3239687524906834e+38 != 1.3239687524906744e+38 (rdiff > 6.8483063743406465e-15) > 135.0 95.0 => > 3.1921559888457253e+34 != 3.1921559888457077e+34 (rdiff > 5.4898341219098594e-15) > 137.0 21.0 => > 2.891654383984857e+24 != 2.8916543839848387e+24 (rdiff > 6.312514769778831e-15) > 137.0 29.0 => > 4.280069137507773e+29 != 4.28006913750795e+29 (rdiff > 4.126698476389006e-14) > 137.0 33.0 => > 5.605400076850502e+31 != 5.605400076850724e+31 (rdiff > 3.9689909469779893e-14) > 137.0 42.0 => > 3.453905364934577e+35 != 3.453905364934566e+35 (rdiff > 3.2045019404969756e-15) > 137.0 46.0 => > 6.738158013916879e+36 != 6.738158013917095e+36 (rdiff > 3.206340162178697e-14) > 137.0 95.0 => > 3.453905364934577e+35 != 3.453905364934566e+35 (rdiff > 3.2045019404969756e-15) > 143.0 21.0 => > 7.639598299464504e+24 != 7.639598299464654e+24 (rdiff > 1.967693188404081e-14) > 143.0 29.0 => > 1.7138496567724698e+30 != 1.713849656772535e+30 (rdiff > 3.810263889765402e-14) > 143.0 33.0 => > 2.7947962468926867e+32 != 2.7947962468928852e+32 (rdiff > 7.10315364832748e-14) > 143.0 42.0 > => 2.91038363316998e+36 != 2.910383633170114e+36 > (rdiff 4.60410605063329e-14) > 143.0 46.0 => > 7.281844590777859e+37 != 7.281844590778005e+37 (rdiff > 2.010388427601942e-14) > 143.0 95.0 => > 3.0056975544912714e+38 != 3.005697554491347e+38 (rdiff > 2.5138212463529437e-14) > 143.0 122.0 => > 7.639598299464504e+24 != 7.639598299464654e+24 (rdiff > 1.967693188404081e-14) > 144.0 21.0 => > 8.943919960348817e+24 != 8.943919960348863e+24 (rdiff > 5.162266504697016e-15) > 144.0 29.0 => > 2.1460378310892105e+30 != 2.1460378310890872e+30 (rdiff > 5.744821363969209e-14) > 144.0 33.0 => > 3.6256816175907336e+32 != 3.62568161759077e+32 (rdiff > 1.0135852188743159e-14) > 144.0 42.0 => > 4.1087768938873574e+36 != 4.10877689388722e+36 (rdiff > 3.347442009280187e-14) > 144.0 46.0 => > 1.0699853276245263e+38 != 1.0699853276245233e+38 (rdiff > 2.8246317692471727e-15) > 144.0 122.0 > => 5.00046434146803e+25 != 5.000464341467773e+25 > (rdiff 5.136303886238907e-14) > 145.0 21.0 => > 1.045861608266637e+25 != 1.045861608266601e+25 (rdiff > 3.44956971374012e-14) > 145.0 29.0 > => 2.68254728886152e+30 != 2.6825472888613593e+30 > (rdiff 5.980909913173422e-14) > 145.0 33.0 => > 4.693962808488381e+32 != 4.69396280848805e+32 (rdiff > 7.046164831898694e-14) > 145.0 42.0 => > 5.784200481685929e+36 != 5.784200481685892e+36 (rdiff > 6.327294560089766e-15) > 145.0 46.0 => > 1.5671502273289442e+38 != 1.5671502273288471e+38 (rdiff > 6.195440181461709e-14) > 145.0 122.0 => > 3.152466650055975e+26 != 3.1524666500557704e+26 (rdiff > 6.495993880527083e-14) > 145.0 125.0 => > 1.7570475018879216e+24 != 1.7570475018878894e+24 (rdiff > 1.8333172373193666e-14) > 148.0 21.0 > => 1.66081725375304e+25 != 1.6608172537529971e+25 > (rdiff 2.586056525059899e-14) > 148.0 29.0 => > 5.186381531354783e+30 != 5.18638153135483e+30 (rdiff > 8.900597825568916e-15) > 148.0 33.0 => > 1.0064458536532161e+33 != 1.0064458536531621e+33 (rdiff > 5.369707206034169e-14) > 148.0 42.0 => > 1.5872551307291334e+37 != 1.5872551307291024e+37 (rdiff > 1.9487415642240105e-14) > 148.0 122.0 => > 6.418858594896177e+28 != 6.418858594895863e+28 (rdiff > 4.892155143322956e-14) > 148.0 125.0 > => 5.25225250880544e+26 != 5.252252508805427e+26 > (rdiff 2.3550858972873946e-15) > 149.0 21.0 => > 1.9332950844468164e+25 != 1.9332950844468484e+25 (rdiff > 1.655076176038335e-14) > 149.0 29.0 => > 6.439757068098981e+30 != 6.439757068098914e+30 (rdiff > 1.0490146397789524e-14) > 149.0 33.0 => > 1.2927623465028022e+33 != 1.2927623465027686e+33 (rdiff > 2.5974487045134947e-14) > 149.0 42.0 => > 2.2102898549405485e+37 != 2.210289854940526e+37 (rdiff > 1.0255378527439437e-14) > 149.0 122.0 => > 3.542259002368573e+29 != 3.542259002368458e+29 (rdiff > 3.2380764064090125e-14) > 149.0 125.0 => > 3.2607734325499903e+27 != 3.260773432550036e+27 (rdiff > 1.3993530521689755e-14) > > ====================================================================== > FAIL: test_data.test_boost( test_gamma_data_ipp-factorials>,) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest > self.test(*self.arg) > File "C:\Python27\lib\site-packages\scipy\special\tests\test_data.py", > line 481, in _test_factory > test.check(dtype=dtype) > File "C:\Python27\lib\site-packages\scipy\special\_testutils.py", line > 292, in check > assert_(False, "\n".join(msg)) > File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 53, in > assert_ > raise AssertionError(smsg) > AssertionError: > Max |adiff|: 3.36416e+140 > Max |rdiff|: 3.56868e-14 > Bad results (35 out of 198) for the following points (in output 0): > 47.0 => 5.502622159812033e+57 > != 5.502622159812089e+57 (rdiff 1.0258520328008188e-14) > 50.0 => 6.082818640342751e+62 > != 6.082818640342675e+62 (rdiff 1.2463859588664233e-14) > 51.0 => 3.0414093201712955e+64 > != 3.0414093201713376e+64 (rdiff 1.3839389152907178e-14) > 52.0 => 1.5511187532873519e+66 > != 1.5511187532873822e+66 (rdiff 1.9537961157045428e-14) > 53.0 => 8.065817517094281e+67 > != 8.065817517094388e+67 (rdiff 1.3210853128030717e-14) > 55.0 => 2.308436973392385e+71 > != 2.308436973392414e+71 (rdiff 1.2533812334944586e-14) > 56.0 => 1.2696403353658055e+73 > != 1.2696403353658276e+73 (rdiff 1.742759977049922e-14) > 57.0 => 7.10998587804854e+74 > != 7.109985878048635e+74 (rdiff 1.3278171253713691e-14) > 59.0 => 2.350561331282903e+78 > != 2.3505613312828785e+78 (rdiff 1.050071233255035e-14) > 60.0 => 1.386831185456872e+80 > != 1.3868311854568984e+80 (rdiff 1.8984338680317015e-14) > 62.0 => 5.075802138772193e+83 > != 5.075802138772248e+83 (rdiff 1.083538910645766e-14) > 67.0 => 5.443449390774354e+92 > != 5.443449390774431e+92 (rdiff 1.403940285109478e-14) > 68.0 => 3.6471110918189365e+94 > != 3.647111091818868e+94 (rdiff 1.869380461041295e-14) > 71.0 => 1.1978571669969762e+100 != > 1.1978571669969892e+100 (rdiff 1.0865971283156417e-14) > 72.0 => 8.504785885678504e+101 > != 8.504785885678623e+101 (rdiff 1.4034165979338454e-14) > 73.0 => 6.123445837688725e+103 != > 6.1234458376886085e+103 (rdiff 1.8972113268364947e-14) > 74.0 => 4.470115461512762e+105 != > 4.4701154615126844e+105 (rdiff 1.7316637068366878e-14) > 79.0 => 1.1324281178206174e+115 != > 1.1324281178206297e+115 (rdiff 1.0816230951910165e-14) > 80.0 => 8.946182130782832e+116 > != 8.946182130782976e+116 (rdiff 1.611677089398549e-14) > 81.0 => 7.156945704626265e+118 > != 7.156945704626381e+118 (rdiff 1.614806559475051e-14) > 83.0 => 4.7536433370127884e+122 > != 4.753643337012842e+122 (rdiff 1.127054006184548e-14) > 84.0 => 3.9455239697205287e+124 > != 3.945523969720659e+124 (rdiff 3.297642089724322e-14) > 85.0 => 3.314240134565268e+126 > != 3.314240134565353e+126 (rdiff 2.5759817183612377e-14) > 87.0 => 2.422709538367351e+130 != > 2.4227095383672734e+130 (rdiff 3.21206322943374e-14) > 89.0 => 1.8548264225739368e+134 != > 1.8548264225739844e+134 (rdiff 2.5668295182660588e-14) > 90.0 => 1.6507955160908254e+136 > != 1.650795516090846e+136 (rdiff 1.2513966038394803e-14) > 91.0 => 1.485715964481777e+138 != > 1.4857159644817615e+138 (rdiff 1.0456113845414324e-14) > 92.0 => 1.3520015276784442e+140 > != 1.352001527678403e+140 (rdiff 3.05102410078959e-14) > 93.0 => 1.243841405464157e+142 != > 1.2438414054641308e+142 (rdiff 2.1115671814606213e-14) > 94.0 => 1.1567725070816173e+144 != > 1.1567725070816416e+144 (rdiff 2.0972887646477295e-14) > 95.0 => 1.0873661566567573e+146 > != 1.087366156656743e+146 (rdiff 1.3055463191484954e-14) > 96.0 => 1.0329978488238868e+148 > != 1.032997848823906e+148 (rdiff 1.8552500324741778e-14) > 97.0 => 9.916779348709327e+149 > != 9.916779348709496e+149 (rdiff 1.7040815113096152e-14) > 98.0 => 9.61927596824809e+151 > != 9.619275968248212e+151 (rdiff 1.2694188843809571e-14) > 99.0 => 9.426890448883584e+153 > != 9.426890448883248e+153 (rdiff 3.568683137742694e-14) > > ---------------------------------------------------------------------- > Ran 17009 tests in 2055.336s > > FAILED (KNOWNFAIL=95, SKIP=1211, errors=2, failures=11) > > >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Fri Jan 30 08:58:01 2015 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 30 Jan 2015 14:58:01 +0100 Subject: [Numpy-discussion] question np.partition In-Reply-To: References: Message-ID: <54CB8DE9.7000607@googlemail.com> On 01/30/2015 02:57 AM, Neal Becker wrote: > It sounds like np.partition could be used to answer the question: > give me the highest K elements in a vector. > > Is this a correct interpretation? Something like partial sort, but returned > elements are unsorted. > > I could really make some use of this, but in my case it is a list of objects I > need to sort on a particular key. Is this algorithm available in general python > code (not specific to numpy arrays)? > This is a correct interpretation, it is a selection algorithm, (np.select was already taken and I don't like c++ name nth_element). It guarantees all elements after the kth element are larger/equal to the kth and everything before smaller/equal. It can also select multiple orders in one go, e.g. the 10 largest and the 10 smallest (used for e.g. np.percentile) Note you can create a partial sort by running selection first and then sorting only the sections you want sorted. Its not perfectly efficient as it won't share selection pivots with quickselect but its faster than a full sort. Unfortunately python does not have something similar for objects. There is an issue for it though: http://bugs.python.org/issue21592 As a workaround you can generate a metric ndarray of your objects e.g. by selecting only your keys from the object that can be sorted with less than (ideally into a native numpy type like int/float) and then use argpartition to get the indices that would sort the original object array/list. If that is actually going to be faster than just using pythons full sort would need to be tested. Also be aware of https://github.com/numpy/numpy/issues/5524 jaime has just found likely after when looking at your mail :) From jaime.frio at gmail.com Fri Jan 30 10:34:03 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Fri, 30 Jan 2015 07:34:03 -0800 Subject: [Numpy-discussion] question np.partition In-Reply-To: <54CB8DE9.7000607@googlemail.com> References: <54CB8DE9.7000607@googlemail.com> Message-ID: On Fri, Jan 30, 2015 at 5:58 AM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > On 01/30/2015 02:57 AM, Neal Becker wrote: > > It sounds like np.partition could be used to answer the question: > > give me the highest K elements in a vector. > > > > Is this a correct interpretation? Something like partial sort, but > returned > > elements are unsorted. > > > > I could really make some use of this, but in my case it is a list of > objects I > > need to sort on a particular key. Is this algorithm available in > general python > > code (not specific to numpy arrays)? > > > > This is a correct interpretation, it is a selection algorithm, > (np.select was already taken and I don't like c++ name nth_element). > It guarantees all elements after the kth element are larger/equal to the > kth and everything before smaller/equal. > It can also select multiple orders in one go, e.g. the 10 largest and > the 10 smallest (used for e.g. np.percentile) > Note you can create a partial sort by running selection first and then > sorting only the sections you want sorted. Its not perfectly efficient > as it won't share selection pivots with quickselect but its faster than > a full sort. > > Unfortunately python does not have something similar for objects. There > is an issue for it though: > http://bugs.python.org/issue21592 > > As a workaround you can generate a metric ndarray of your objects e.g. > by selecting only your keys from the object that can be sorted with less > than (ideally into a native numpy type like int/float) and then use > argpartition to get the indices that would sort the original object > array/list. > As of right now, np.partition ends up calling np.sort unless the dtype is one of the numeric types. But it is smoking fast compared to sorting when the real thing kicks in! Jaime > If that is actually going to be faster than just using pythons full sort > would need to be tested. > Also be aware of https://github.com/numpy/numpy/issues/5524 jaime has > just found likely after when looking at your mail :) > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Fri Jan 30 22:52:49 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Fri, 30 Jan 2015 19:52:49 -0800 Subject: [Numpy-discussion] Views of a different dtype In-Reply-To: References: Message-ID: On Thu, Jan 29, 2015 at 8:57 AM, Nathaniel Smith wrote: > On Thu, Jan 29, 2015 at 12:56 AM, Jaime Fern?ndez del R?o > wrote: > [...] > > With all these in mind, my proposal for the new behavior is that taking a > > view of an array with a different dtype would require: > > > > That the newtype and oldtype be compatible, as defined by the algorithm > > checking object offsets linked above. > > If newtype.itemsize == oldtype.itemsize no more checks are needed, make > it > > happen! > > If the array is C/Fortran contiguous, check that the size in bytes of the > > last/first dimension is evenly divided by newtype.itemsize. If it does, > go > > for it. > > For non-contiguous arrays: > > > > Ignoring dimensions of size 1, check that no stride is smaller than > either > > oldtype.itemsize or newtype.itemsize. If any is found this is an > as_strided > > product, sorry, can't do it! > > Ignoring dimensions of size 1, find a contiguous dimension, i.e. stride > == > > oldtype.itemsize > > > > If found, check that it is the only one with that stride, that it is the > > minimal stride, and that the size in bytes of that dimension is evenly > > divided by newitem,itemsize. > > If none is found, check if there is a size 1 dimension that is also > unique > > (unless we agree on a default, as mentioned above) and that > newtype.itemsize > > evenly divides oldtype.itemsize. > > I'm really wary of this idea that we go grovelling around looking for > some suitable dimension somewhere to absorb the new items. Basically > nothing in numpy produces semantically different arrays (e.g., ones > with different shapes) depending on the *strides* of the input array. > In a convoluted way, changing the dtype already does different thing depending on the strides, as right now the expansion/contraction happens along the last/first axis depending if the array is C/Fortran contiguous, and those flags are calculated from the strides: >>> a = np.ones((2, 2), dtype=complex) >>> a.view(float).shape (2, 4) >>> a.T.view(float).shape (4, 2) A user unaware that transposition has changed the memory layout will be surprised to find his complex values being unpacked along the first, not the last dimension. But that's the way it already is. With my proposal above, the intention is that the same happens not only for the standard "reverse axes order" transposition, but with any other one, even if you have sliced the array. > > Could we make it more like: check to see if the last dimension works. > If not, raise an error (and let the user transpose some other > dimension there if that's what they wanted)? Or require the user to > specify which dimension will absorb the shape change? (If we were > doing this from scratch, then it would be tempting to just say that we > always add a new dimension at the end with newtype.itemsize / > oldtype.itemsize entries, or absorb such a dimension if shrinking. As > a bonus, this would always work, regardless of contiguity! Except that > when shrinking the last dimension would have to be contiguous, of > course.) > When we roll @ in and people start working with stacks of matrices, we will probably find ourselves having to create an alias, similar to .T, for .swapaxes(-1, -2). Searching for the smallest stride allows to take views of such arrays, which does not work right now because the array is no longer contiguous globally. > > I guess the main consideration for this is that we may be stuck with > stuff b/c of backwards compatibility. Can you maybe say a little bit > about what is allowed now, and what constraints that puts on things? > E.g. are we already grovelling around in strides and picking random > dimensions in some cases? > Just to restate it: right now we only allow new views if the array is globally contiguous, so either along the first or last dimension. Jaime > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sat Jan 31 04:17:02 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 31 Jan 2015 10:17:02 +0100 Subject: [Numpy-discussion] Views of a different dtype In-Reply-To: References: Message-ID: <1422695822.12798.14.camel@sebastian-t440> On Fr, 2015-01-30 at 19:52 -0800, Jaime Fern?ndez del R?o wrote: > On Thu, Jan 29, 2015 at 8:57 AM, Nathaniel Smith > wrote: > On Thu, Jan 29, 2015 at 12:56 AM, Jaime Fern?ndez del R?o > wrote: > [...] > > Could we make it more like: check to see if the last dimension > works. > If not, raise an error (and let the user transpose some other > dimension there if that's what they wanted)? Or require the > user to > specify which dimension will absorb the shape change? (If we > were > doing this from scratch, then it would be tempting to just say > that we > always add a new dimension at the end with newtype.itemsize / > oldtype.itemsize entries, or absorb such a dimension if > shrinking. As > a bonus, this would always work, regardless of contiguity! > Except that > when shrinking the last dimension would have to be contiguous, > of > course.) > > > When we roll @ in and people start working with stacks of matrices, we > will probably find ourselves having to create an alias, similar to .T, > for .swapaxes(-1, -2). Searching for the smallest stride allows to > take views of such arrays, which does not work right now because the > array is no longer contiguous globally. > That is true, but I agree with Nathaniel at least as far as that I would prefer a user to be able to safely use `view` even he has not even an inkling about what his memory layout is. One option would be an `axis=-1` default (maybe FutureWarn this from `axis=None` which would look at order, see below -- or maybe have axis='A', 'C' and 'F' and default to 'A' for starters). This even now could start creating bugs when enabling relaxed strides :(, because your good old fortran order complex array being viewed as a float one could expand along the wrong axis, and even without such arrays swap order pretty fast when operating on them, which can create impossibly to find bugs, because even a poweruser is likely to forget about such things. Of course you could argue that view is a poweruser feature and a user using it should keep these things in mind.... Though if you argue that, you can almost just use `np.ndarray` directly ;) -- ok, not really considering how cumbersome it is, but still. - Sebastian > > I guess the main consideration for this is that we may be > stuck with > stuff b/c of backwards compatibility. Can you maybe say a > little bit > about what is allowed now, and what constraints that puts on > things? > E.g. are we already grovelling around in strides and picking > random > dimensions in some cases? > > > Just to restate it: right now we only allow new views if the array is > globally contiguous, so either along the first or last dimension. > > > Jaime > > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of > Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus > planes de dominaci?n mundial. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From ben.root at ou.edu Sat Jan 31 09:02:23 2015 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 31 Jan 2015 09:02:23 -0500 Subject: [Numpy-discussion] Numpy 'None' comparison FutureWarning In-Reply-To: References: <541F40FC.6010105@hawaii.edu> Message-ID: Finally got off my butt and hunted down an example, and it was right under my nose in mplot3d. lib/python2.7/site-packages/mpl_toolkits/mplot3d/axes3d.py:1094: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future. if self.button_pressed in self._rotate_btn: self._rotate_btn is the 1d numpy array, and self.button_pressed usually will have an integer (for which mouse button), but could be None if no mouse button was pressed. I have no clue why the "in" operator is triggering this future warning. Is this intentional? Ben Root On Sun, Sep 21, 2014 at 10:53 PM, Nathaniel Smith wrote: > On 22 Sep 2014 03:02, "Demitri Muna" wrote: > > > > > > On Sep 21, 2014, at 5:19 PM, Eric Firing wrote: > > > >> I think what you are missing is that the standard Python idiom for this > >> use case is "if self._some_array is None:". This will continue to > work, > >> regardless of whether the object being checked is an ndarray or any > >> other Python object. > > > > > > That's an alternative, but I think it's a subtle distinction that will > be lost on many users. I still think that this is something that can easily > trip up many people; it's not clear from looking at the code that this is > the behavior; it's "hidden". At the very least, I strongly suggest that the > warning point this out, e.g. > > > > "FutureWarning: comparison to `None` will result in an elementwise > object comparison in the future; use 'value is None' as an alternative." > > Making messages clearer is always welcome, and we devs aren't always in > the best position to do so because we're to close to the issues to see > which parts are confusing to outsiders - perhaps you'd like to submit a > pull request with this? > > > Assume: > > > > a = np.array([1, 2, 3, 4]) > > b = np.array([None, None, None, None]) > > > > What is the result of "a == None"? Is it "np.array([False, False, False, > False])"? > > After this change, yes. > > > What about the second case? Is the result of "b == None" -> > np.array([True, True, True, True])? > > Yes again. > > (Notice that this is also a subtle and confusing point for many users - > how many people realize that if they want to get the latter result they > have to write np.equal(b, None)?) > > > If so, then > > > > if (b == None): > > ... > > > > will always evaluate to "True" if b is "None" or *any* Numpy array, and > that's clearly unexpected behavior. > > No, that's not how numpy arrays interact with if statements. This is > independent of the handling of 'arr == None': 'if multi_element_array' is > always an error, because an if statement by definition requires a single > true/false decision (it can't execute both branches after all!), but a > multi-element array by definition contains multiple values that might have > contradictory truthiness. > > Currently, 'b == x' returns an array in every situation *except* when x > happens to be 'None'. After this change, 'b == x' will *always* return an > array, so 'if b == x' will always raise an error. > > > > > On Sep 21, 2014, at 9:30 PM, Benjamin Root wrote: > > > >> That being said, I do wonder about related situations where the lhs of > the equal sign might be an array, or it might be a None and you are > comparing against another numpy array. In those situations, you aren't > trying to compare against None, you are just checking if two objects are > equivalent. > > Benjamin, can you give a more concrete example? Right now the *only* time > == on arrays checks for equivalence is when the object being compared > against is None, in which case == pretends to be 'is' because of this > mysterious special case. In every other case it does a broadcasted ==, > which is very different. > > > Right. With this change, using "==" with numpy arrays now sometimes > means "are these equivalent" and other times "element-wise comparison". > > Err, you have this backwards :-). Right now == means element-wise > comparison except in this one special case, where it doesn't. After the > change, it will mean element-wise comparison consistently in all cases. > > > The potential for inadvertent bugs is far greater than what convenience > this redefinition of a very basic operator might offer. Any scenario where > > > > (a == b) != (b == a) > > > > is asking for trouble. > > That would be unfortunate, yes, but fortunately it doesn't apply here :-). > 'a == b' and 'b == a' currently always return the same thing, and there are > no plans to change this - we'll be changing what both of them mean at the > same time. > > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastien.gouezel at univ-rennes1.fr Sat Jan 31 15:53:53 2015 From: sebastien.gouezel at univ-rennes1.fr (Sebastien Gouezel) Date: Sat, 31 Jan 2015 21:53:53 +0100 Subject: [Numpy-discussion] missing FloatingPointError for numpy on cygwin64 Message-ID: Dear all, I tried to use numpy (version 1.9.1, installed by `pip install numpy`) on cygwin64. I encountered the following weird bug: >>> import numpy >>> with numpy.errstate(all='raise'): ... print 1/float64(0.0) inf I was expecting a FloatingPointError, but it didn't show up. Curiously, with different numerical types (all intxx, or float128), I indeed get the FloatingPointError. Same thing with the most recent git version, or with 1.7.1 provided as a precompiled package by cygwin. This behavior does not happen on cygwin32 (I always get the FloatingPointError there). I wonder if there is something weird with my config, or if this is a genuine reproducible bug. If so, where should I start looking if I want to fix it? (I don't know anything about numpy's code) Sebastien From sebastian at sipsolutions.net Sat Jan 31 16:59:08 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 31 Jan 2015 22:59:08 +0100 Subject: [Numpy-discussion] Numpy 'None' comparison FutureWarning In-Reply-To: References: <541F40FC.6010105@hawaii.edu> Message-ID: <1422741548.18416.4.camel@sebastian-t440> On Sa, 2015-01-31 at 09:02 -0500, Benjamin Root wrote: > Finally got off my butt and hunted down an example, and it was right > under my nose in mplot3d. > > lib/python2.7/site-packages/mpl_toolkits/mplot3d/axes3d.py:1094: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future. > if self.button_pressed in self._rotate_btn: > self._rotate_btn is the 1d numpy array, and self.button_pressed > usually will have an integer (for which mouse button), but could be > None if no mouse button was pressed. I have no clue why the "in" > operator is triggering this future warning. Is this intentional? If I remember right, the in operator just does `np.any(arr == other)` in C-code (which is actually a bit broken, in more ways then just this?.). So in this case the FutureWarning is admittingly bogus, since the result won't actually change unless your array really *does* include None, and then I hope you want that change ;). - Sebastian > Ben Root > > > On Sun, Sep 21, 2014 at 10:53 PM, Nathaniel Smith > wrote: > On 22 Sep 2014 03:02, "Demitri Muna" > wrote: > > > > > > On Sep 21, 2014, at 5:19 PM, Eric Firing > wrote: > > > >> I think what you are missing is that the standard Python > idiom for this > >> use case is "if self._some_array is None:". This will > continue to work, > >> regardless of whether the object being checked is an > ndarray or any > >> other Python object. > > > > > > That's an alternative, but I think it's a subtle distinction > that will be lost on many users. I still think that this is > something that can easily trip up many people; it's not clear > from looking at the code that this is the behavior; it's > "hidden". At the very least, I strongly suggest that the > warning point this out, e.g. > > > > "FutureWarning: comparison to `None` will result in an > elementwise object comparison in the future; use 'value is > None' as an alternative." > > Making messages clearer is always welcome, and we devs aren't > always in the best position to do so because we're to close to > the issues to see which parts are confusing to outsiders - > perhaps you'd like to submit a pull request with this? > > > Assume: > > > > a = np.array([1, 2, 3, 4]) > > b = np.array([None, None, None, None]) > > > > What is the result of "a == None"? Is it "np.array([False, > False, False, False])"? > > After this change, yes. > > > What about the second case? Is the result of "b == None" -> > np.array([True, True, True, True])? > > Yes again. > > (Notice that this is also a subtle and confusing point for > many users - how many people realize that if they want to get > the latter result they have to write np.equal(b, None)?) > > > If so, then > > > > if (b == None): > > ... > > > > will always evaluate to "True" if b is "None" or *any* Numpy > array, and that's clearly unexpected behavior. > > No, that's not how numpy arrays interact with if statements. > This is independent of the handling of 'arr == None': 'if > multi_element_array' is always an error, because an if > statement by definition requires a single true/false decision > (it can't execute both branches after all!), but a > multi-element array by definition contains multiple values > that might have contradictory truthiness. > > Currently, 'b == x' returns an array in every situation > *except* when x happens to be 'None'. After this change, 'b == > x' will *always* return an array, so 'if b == x' will always > raise an error. > > > > > On Sep 21, 2014, at 9:30 PM, Benjamin Root > wrote: > > > >> That being said, I do wonder about related situations where > the lhs of the equal sign might be an array, or it might be a > None and you are comparing against another numpy array. In > those situations, you aren't trying to compare against None, > you are just checking if two objects are equivalent. > > Benjamin, can you give a more concrete example? Right now the > *only* time == on arrays checks for equivalence is when the > object being compared against is None, in which case == > pretends to be 'is' because of this mysterious special case. > In every other case it does a broadcasted ==, which is very > different. > > > Right. With this change, using "==" with numpy arrays now > sometimes means "are these equivalent" and other times > "element-wise comparison". > > Err, you have this backwards :-). Right now == means > element-wise comparison except in this one special case, where > it doesn't. After the change, it will mean element-wise > comparison consistently in all cases. > > > The potential for inadvertent bugs is far greater than what > convenience this redefinition of a very basic operator might > offer. Any scenario where > > > > (a == b) != (b == a) > > > > is asking for trouble. > > That would be unfortunate, yes, but fortunately it doesn't > apply here :-). 'a == b' and 'b == a' currently always return > the same thing, and there are no plans to change this - we'll > be changing what both of them mean at the same time. > > -n > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: