From yw5aj at virginia.edu Thu Jan 1 13:00:00 2015 From: yw5aj at virginia.edu (Yuxiang Wang) Date: Thu, 1 Jan 2015 13:00:00 -0500 Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes Message-ID: Dear all, I am currently using a piece of C code, where one of the input argument of a function is **double. So, in numpy, I tried np.ctypeslib.ndpointer(ctypes.c_double), but obviously this wouldn't work because this is only *double, not **double. Then I tried np.ctypeslib.ndpointer(np.ctypeslib.ndpointer(ctypes.c_double)), but this didn't work either because it says "ArgumentError: argument 4: : array must have data type uint64 ". np.ctypeslib.ndpointer(ctypes.c_double, ndim=2) wound't work too, because **double is not the same with *double[]. Could anyone please give any thoughts to help? Thanks, Shawn -- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia yw5aj at virginia.edu +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/ From sturla.molden at gmail.com Thu Jan 1 13:30:42 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 01 Jan 2015 19:30:42 +0100 Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes In-Reply-To: References: Message-ID: You can pretend double** is an array of dtype np.intp. This is because on all modern systems, double** has the size of void*, and np.intp is an integer with the size of void* (np.intp maps to Py_intptr_t). Now you just need to fill in the adresses. If you have a 2d ndarray in C order, or at least one which is contiguous along the second dimension, you set up the array of double* like so: xpp = (x.__array_interface__['data'][0] + np.arange(x.shape[0])*x.strides[0]).astype(np.intp) (The last cast to np.intp is probably only required on Windows 64 with older NumPy versions, but it never hurts.) Next, make a dtype that corresponds to this Py_intptr_t array: doublepp = np.ctypeslib.ndpointer(dtype=np.intp) Declare your function with doublepp instead of ndpointer with dtype np.double, and pass xpp instead of the 2d array x. Sturla On 01/01/15 19:00, Yuxiang Wang wrote: > Dear all, > > I am currently using a piece of C code, where one of the input > argument of a function is **double. > > So, in numpy, I tried np.ctypeslib.ndpointer(ctypes.c_double), but > obviously this wouldn't work because this is only *double, not > **double. > > Then I tried np.ctypeslib.ndpointer(np.ctypeslib.ndpointer(ctypes.c_double)), > but this didn't work either because it says "ArgumentError: argument > 4: : array must have data type uint64 > ". > > np.ctypeslib.ndpointer(ctypes.c_double, ndim=2) wound't work too, > because **double is not the same with *double[]. > > Could anyone please give any thoughts to help? > > Thanks, > > Shawn > From sturla.molden at gmail.com Thu Jan 1 13:35:58 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 01 Jan 2015 19:35:58 +0100 Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes In-Reply-To: References: Message-ID: On 01/01/15 19:30, Sturla Molden wrote: > You can pretend double** is an array of dtype np.intp. This is because > on all modern systems, double** has the size of void*, and np.intp is an > integer with the size of void* (np.intp maps to Py_intptr_t). Well, it also requires that the user space is the lower half of the address space, which is usually true. But to be safe against this you should use np.uintp instead of np.intp: xpp = (x.__array_interface__['data'][0] + np.arange(x.shape[0])*x.strides[0]).astype(np.uintp) doublepp = np.ctypeslib.ndpointer(dtype=np.uintp) Sturla From sturla.molden at gmail.com Thu Jan 1 13:55:00 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 01 Jan 2015 19:55:00 +0100 Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes In-Reply-To: References: Message-ID: On 01/01/15 19:00, Yuxiang Wang wrote: > Could anyone please give any thoughts to help? Say you want to call "void foobar(int m, int n, double **x)" from dummy.so or dummpy.dll with ctypes. Here is a fully worked out example (no tested, but is will work unless I made a typo): import numpy as np from numpy.ctypeslib import ndpointer import ctypes __all__ = ['foobar'] _doublepp = ndpointer(dtype=np.uintp, ndim=1, order='c') _dll = ctypes.CDLL('dummpy.so') # or dummpy.dll _foobar = _dll.foobar _foobar.argtypes = [ctypes.c_int, ctypes.c_int, _doublepp] _foobar.restype = None def foobar(x): assert(x.flags['C_CONTIGUOUS']) assert(x.ndim == 2) xpp = (x.__array_interface__['data'][0] + np.arange(x.shape[0])*x.strides[0]).astype(np.uintp) m = ctype.c_int(x.shape[0]) n = ctype.c_int(x.shape[1]) _foobar(m,n,xpp) Sturla From njs at pobox.com Thu Jan 1 13:56:36 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 1 Jan 2015 18:56:36 +0000 Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes In-Reply-To: References: Message-ID: On Thu, Jan 1, 2015 at 6:00 PM, Yuxiang Wang wrote: > Dear all, > > I am currently using a piece of C code, where one of the input > argument of a function is **double. As you discovered, Numpy's ctypes utilities are helpful for getting a *double out of an ndarray, but they don't really have anything to do with **double's -- for that you should refer to the plain-old-ctypes documentation: https://docs.python.org/2/library/ctypes.html#ctypes.pointer However, I suspect that this question can't really be answered in a useful way without more information about why exactly the C code wants a **double (instead of a *double) and what it expects to do with it. E.g., is it going to throw away the passed in array and return a new one? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From yw5aj at virginia.edu Thu Jan 1 14:25:54 2015 From: yw5aj at virginia.edu (Yuxiang Wang) Date: Thu, 1 Jan 2015 14:25:54 -0500 Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes In-Reply-To: References: Message-ID: Great thanks to both Strula and also Nathaniel! @Strula, thanks for your help! And I do think your solution makes total sense. However, the code doesn't work well on my computer. ---------------- // dummy.c #include __declspec(dllexport) void foobar(const int m, const int n, const double **x, double **y) { size_t i, j; y = (** double)malloc(sizeof(double *) * m); for(i=0; i wrote: > On Thu, Jan 1, 2015 at 6:00 PM, Yuxiang Wang wrote: >> Dear all, >> >> I am currently using a piece of C code, where one of the input >> argument of a function is **double. > > As you discovered, Numpy's ctypes utilities are helpful for getting a > *double out of an ndarray, but they don't really have anything to do > with **double's -- for that you should refer to the plain-old-ctypes > documentation: https://docs.python.org/2/library/ctypes.html#ctypes.pointer > > However, I suspect that this question can't really be answered in a > useful way without more information about why exactly the C code wants > a **double (instead of a *double) and what it expects to do with it. > E.g., is it going to throw away the passed in array and return a new > one? > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia yw5aj at virginia.edu +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/ From sturla.molden at gmail.com Thu Jan 1 14:35:28 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 01 Jan 2015 20:35:28 +0100 Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes In-Reply-To: References: Message-ID: On 01/01/15 20:25, Yuxiang Wang wrote: > #include > > __declspec(dllexport) void foobar(const int m, const int n, const > double **x, double **y) > { > size_t i, j; > y = (** double)malloc(sizeof(double *) * m); > for(i=0; i y[i] = (*double)calloc(sizeof(double), n); > for(i=0; i for(j=0; j y[i][j] = x[i][j]; > } > Was I doing something wrong here? You are not getting the data back because of the malloc/calloc statements. The numpy array y after calling _foobar is still pointing to its original buffer, not the new memory you allocated. You just created a memory leak. Try this instead: __declspec(dllexport) void foobar(const int m, const int n, const double **x, double **y) { size_t i, j; for(i=0; i References: <20141229221017.GA31208@kudu.in-berlin.de> <20141230230339.GA6317@kudu.in-berlin.de> Message-ID: <20150101193810.GA2824@kudu.in-berlin.de> Hi, * Nathaniel Smith [2014-12-31]: > On Tue, Dec 30, 2014 at 11:03 PM, Valentin Haenel wrote: > > * Eric Moore [2014-12-30]: > >> On Monday, December 29, 2014, Valentin Haenel wrote: > >> > >> > Hi, > >> > > >> > how do I access the kind of the data from cython, i.e. the single > >> > character string: > >> > > >> > 'b' boolean > >> > 'i' (signed) integer > >> > 'u' unsigned integer > >> > 'f' floating-point > >> > 'c' complex-floating point > >> > 'O' (Python) objects > >> > 'S', 'a' (byte-)string > >> > 'U' Unicode > >> > 'V' raw data (void) > >> > > >> > In regular Python I can do: > >> > > >> > In [7]: d = np.dtype('S') > >> > > >> > In [8]: d.kind > >> > Out[8]: 'S' > >> > > >> > Looking at the definition of dtype that comes with cython, I see: > >> > > >> > ctypedef class numpy.dtype [object PyArray_Descr]: > >> > # Use PyDataType_* macros when possible, however there are no macros > >> > # for accessing some of the fields, so some are defined. Please > >> > # ask on cython-dev if you need more. > >> > cdef int type_num > >> > cdef int itemsize "elsize" > >> > cdef char byteorder > >> > cdef object fields > >> > cdef tuple names > >> > > >> > I.e. no kind. > > The problem is just that whoever wrote numpy.pxd was feeling a bit > lazy that day and only filled in the fields they felt were most > important :-). There are a bunch of public fields in PyArray_Descr > that are just being left out of the Cython file you quote: > > https://github.com/numpy/numpy/blob/master/numpy/core/include/numpy/ndarraytypes.h#L566 > > In particular, there's a 'char kind' field. > > The quick workaround is > > cdef extern from "*": > cdef struct my_numpy_dtype [object PyArray_Descr]: > cdef char kind > # ... whatever other fields you might need > > and then cast to my_numpy_dtype when you need to get at the kind field > from Cython. > > If feeling generous, then submit a PR to Cython adding 'cdef char > kind' to the definition above. If feeling extra generous, it would be > awesome if someone systematically went through and added all the > missing fields that are in the numpy header but not cython -- I've run > into these missing field issues annoyingly often myself, and it's > silly that we should all keep making our own individual workarounds > for numpy.pxd's limitations... Thanks for the suggestions, it got me thinking. So, I actually discovered an additional ugly workaround. Basically it turns out, that my dtype instance does have a 'kind' attribute, but it is a Python str object. Hence I needed to do: ord(dtype_.kind[0]) To cast it to a Cython char... This is because---for reasons I don't understand---when you define a char in cython and you try to assign a python object to it, that object needs to be an integer. Otherwise you get: TypeError: an integer is required During run-time. Using the hack above my code now compiles and the tests all pass. I would guess that it probably won't perform very well due to various python to c back and forth activities. V- PS: none the less I may look into getting some patches into cython as suggested, as the solution above isn't exactly clean code... From sturla.molden at gmail.com Thu Jan 1 14:52:23 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 01 Jan 2015 20:52:23 +0100 Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes In-Reply-To: References: Message-ID: On 01/01/15 19:56, Nathaniel Smith wrote: > However, I suspect that this question can't really be answered in a > useful way without more information about why exactly the C code wants > a **double (instead of a *double) and what it expects to do with it. > E.g., is it going to throw away the passed in array and return a new > one? That is an important question. The solution I provided only allows a 2D array to be passed in and possibly modified inplace. It does not allow the C function pass back a freshly allocated array. The problem is of course that the meaning of double** is ambiguous. It could mean a pointer to an array of pointers. But it could also mean a double* passed by reference, in which case the function would modify the pointer instead of the data it points to. Sturla From fomcl at yahoo.com Thu Jan 1 14:57:50 2015 From: fomcl at yahoo.com (Albert-Jan Roskam) Date: Thu, 1 Jan 2015 19:57:50 +0000 (UTC) Subject: [Numpy-discussion] numpy.fromiter in numpypy Message-ID: <645635478.812990.1420142270937.JavaMail.yahoo@jws10729.mail.gq1.yahoo.com> Hi, I would like to use the numpy implementation for Pypy. In particular, I would like to use numpy.fromiter, which is available according to this overview: http://buildbot.pypy.org/numpy-status/latest.html. However, contrary to what this website says, this function is not yet available. Conclusion: the website is wrong. Or am I missing something? albertjan at debian:~$ sudo pypy $(which pip) install -U git+https://bitbucket.org/pypy/numpy.git albertjan at debian:~$ sudo pypy -c 'import numpy' # sudo: as per the installation instructions albertjan at debian:~$ pypy Python 2.7.8 (f5dcc2477b97386c11e4b67f08a2d00fbd2fce5d, Sep 19 2014, 10:37:41) [PyPy 2.4.0 with GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>>> import sys >>>> import numpy as np >>>> np.__version__, sys.version ('1.9.0', '2.7.8 (f5dcc2477b97386c11e4b67f08a2d00fbd2fce5d, Sep 19 2014, 10:37:41)\n[PyPy 2.4.0 with GCC 4.8.2]') >>>> np.fromiter >>>> np.fromiter((i for i in range(10)), np.float) Traceback (most recent call last): File "", line 1, in File "/opt/pypy-2.4/site-packages/numpy/core/multiarray.py", line 55, in tmp raise NotImplementedError("%s not implemented yet" % func) NotImplementedError: fromiter not implemented yet The same also applies to numpy.fromfile Thanks in advance and happy 2015. Regards, Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From sturla.molden at gmail.com Thu Jan 1 15:06:34 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 01 Jan 2015 21:06:34 +0100 Subject: [Numpy-discussion] npymath on Windows In-Reply-To: References: Message-ID: On 28/12/14 01:59, Matthew Brett wrote: > As far as I can see, 'acosf' is defined in the msvc runtime library. > I guess that '_acosf' is defined in some mingw runtime library? AFAIK it is a GCC built-in function. When the GCC compiler or linker sees it the binary code will be inlined. https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Other-Builtins.html Sturla From sturla.molden at gmail.com Thu Jan 1 15:34:16 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 01 Jan 2015 21:34:16 +0100 Subject: [Numpy-discussion] npymath on Windows In-Reply-To: References:

Message-ID: On 28/12/14 17:17, David Cournapeau wrote: > This is not really supported. You should avoid mixing compilers when > building C extensions using numpy C API. Either all mingw, or all MSVC. That is not really good enough. Even if we build binary wheels with MinGW (see link) the binary npymath library should be useable from MSVC. https://github.com/numpy/numpy/pull/5328 Sturla From ndarray at mac.com Thu Jan 1 16:35:08 2015 From: ndarray at mac.com (Alexander Belopolsky) Date: Thu, 1 Jan 2015 16:35:08 -0500 Subject: [Numpy-discussion] The future of ndarray.diagonal() Message-ID: A discussion [1] is currently underway at GitHub which will benefit from a larger forum. In version 1.9, the diagonal() method was changed to return a read-only (non-contiguous) view into the original array instead of a plain copy. Also, it has been announced [2] that in 1.10 the view will become read/write. A concern has now been raised [3] that this change breaks backward compatibility too much. Consider the following code: x = numy.eye(2) d = x.diagonal() d[0] = 2 In 1.8, this code runs without errors and results in [2, 1] stored in array d. In 1.9, this is an error. With the current plan, in 1.10 this will become valid again, but the result will be different: x[0,0] will be 2 while it is 1 in 1.8. Two alternatives are suggested for discussion: 1. Add copy=True flag to diagonal() method. 2. Roll back 1.9 change to diagonal() and introduce an additional diagonal_view() method to return a view. [1] https://github.com/numpy/numpy/pull/5409 [2] http://docs.scipy.org/doc/numpy/reference/generated/numpy.diagonal.html [3] http://khinsen.wordpress.com/2014/09/12/the-state-of-numpy/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Jan 1 17:16:34 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 1 Jan 2015 22:16:34 +0000 Subject: [Numpy-discussion] Access dtype kind from cython In-Reply-To: <20150101193810.GA2824@kudu.in-berlin.de> References: <20141229221017.GA31208@kudu.in-berlin.de> <20141230230339.GA6317@kudu.in-berlin.de> <20150101193810.GA2824@kudu.in-berlin.de> Message-ID: On Thu, Jan 1, 2015 at 7:38 PM, Valentin Haenel wrote: [...] > So, I actually discovered an additional ugly workaround. Basically it > turns out, that my dtype instance does have a 'kind' attribute, but it > is a Python str object. Hence I needed to do: > > ord(dtype_.kind[0]) Your Cython dtype object is simultaneously a C and Python object -- if you ask for a C-level attribute that Cython knows about, then it'll access the C struct field directly; if you ask for anything else, then it'll do a normal Python attribute lookup. > To cast it to a Cython char... This is because---for reasons I don't > understand---when you define a char in cython and you try to assign a > python object to it, that object needs to be an integer. Otherwise you > get: > > TypeError: an integer is required > > During run-time. This is because in C, char is the name for an integer type. It was the 70s, they didn't know any better... > Using the hack above my code now compiles and the tests all pass. I > would guess that it probably won't perform very well due to various > python to c back and forth activities. Eh, there's an excellent chance that it won't matter. Usually this kidn of thing only matters if you're accessing the field from inside a tight inner loop that gets called thousands of times per second. This is one of the nice things about Cython: you can be lazy and write ordinary Python code everywhere outside those loops, as compared to regular extension modules where you have to laboriously write everything in C, even the parts that aren't speed critical. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From yw5aj at virginia.edu Thu Jan 1 21:56:07 2015 From: yw5aj at virginia.edu (Yuxiang Wang) Date: Thu, 1 Jan 2015 21:56:07 -0500 Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes In-Reply-To: References: Message-ID: 1) @Strula Sorry about my stupid mistake! That piece of code totally gave away how green I am in coding C :) And yes, that piece of code works like a charm now! I am able to run my model. Thanks a million! 2) @Strula and also thanks for your insight on the limitation of the method. Currently I am just passing in 2d ndarray for data input, so I can get away with this method; but it is really important to keep that piece of knowledge in mind. 3) @Nathaniel Could you please give a hint on how should this be done with the ctypes library (only for reading a 2d ndarray)? I noticed that it wouldn't work if I set: _doublepp = ctypes.POINTER(ctypes.POINTER(ctypes.c_double)) xpp = x.ctypes.data_as(ctypes.POINTER(ctypes.POINTER(ctypes.c_double))) Could you please give a hint if possible? (Complete code is attached at the end of this message) 4) I wanted to say that it seems to me, as the project gradually scales up, Cython is easier to deal with, especially when I am using a lot of numpy arrays. If it is even higher dimensional data, it would be verbose while it is really succinct to use Cython. Attached is the complete code. Code #1: From Strula, and it worked: // dummy.c #include __declspec(dllexport) void foobar(const int m, const int n, const double **x, double **y) { size_t i, j; for(i=0; i wrote: > On 01/01/15 19:56, Nathaniel Smith wrote: > >> However, I suspect that this question can't really be answered in a >> useful way without more information about why exactly the C code wants >> a **double (instead of a *double) and what it expects to do with it. > >> E.g., is it going to throw away the passed in array and return a new >> one? > > That is an important question. > > The solution I provided only allows a 2D array to be passed in and > possibly modified inplace. It does not allow the C function pass back a > freshly allocated array. > > The problem is of course that the meaning of double** is ambiguous. It > could mean a pointer to an array of pointers. But it could also mean a > double* passed by reference, in which case the function would modify the > pointer instead of the data it points to. > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia yw5aj at virginia.edu +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/ From charlesr.harris at gmail.com Thu Jan 1 23:13:34 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 1 Jan 2015 21:13:34 -0700 Subject: [Numpy-discussion] numpy developers Message-ID: Hi All, I've invited Alex Griffing onto the team to be a numpy developer. He has been contributing fixes and reviews for a while and it is time to give him more opportunity to contribute. I think he will do well. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Fri Jan 2 01:56:57 2015 From: matti.picus at gmail.com (Matti Picus) Date: Fri, 02 Jan 2015 08:56:57 +0200 Subject: [Numpy-discussion] numpy.fromiter in numpypy In-Reply-To: References: Message-ID: <54A64139.2040506@gmail.com> An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Jan 2 04:17:18 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 2 Jan 2015 10:17:18 +0100 Subject: [Numpy-discussion] numpy developers In-Reply-To: References: Message-ID: On Fri, Jan 2, 2015 at 5:13 AM, Charles R Harris wrote: > Hi All, > > I've invited Alex Griffing onto the team to be a numpy developer. He has > been contributing fixes and reviews for a while and it is time to give him > more opportunity to contribute. I think he will do well. > +1 Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From simlangen at gmail.com Fri Jan 2 06:06:09 2015 From: simlangen at gmail.com (Simen Langseth) Date: Fri, 2 Jan 2015 20:06:09 +0900 Subject: [Numpy-discussion] Extracting required indices from the array of tuples Message-ID: import numpy as np from scipy import signal y = np.array([[2, 1, 2, 3, 2, 0, 1, 0], [2, 1, 2, 3, 2, 0, 1, 0]]) maximas = signal.argrelmax(y, axis=1) print maximas (array([0, 0, 1, 1], dtype=int64), array([3, 6, 3, 6], dtype=int64)) I want to extract only the first maxima of both rows, i.e., [3, 3] using the tuples (maximas). How would you do it? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Fri Jan 2 07:29:53 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Fri, 2 Jan 2015 04:29:53 -0800 Subject: [Numpy-discussion] Extracting required indices from the array of tuples In-Reply-To: References: Message-ID: On Fri, Jan 2, 2015 at 3:06 AM, Simen Langseth wrote: > import numpy as np > from scipy import signal > > y = np.array([[2, 1, 2, 3, 2, 0, 1, 0], > [2, 1, 2, 3, 2, 0, 1, 0]]) > > maximas = signal.argrelmax(y, axis=1) > > print maximas > > (array([0, 0, 1, 1], dtype=int64), array([3, 6, 3, 6], dtype=int64)) > > > I want to extract only the first maxima of both rows, i.e., [3, 3] using > the tuples (maximas). How would you do it? > > Something like this should work: >>> rows, cols = maximas >>> first_in_row = np.concatenate(([True], rows[:-1] != rows[1:])) >>> rows = rows[first_in_row] >>> cols = cols[first_in_row] >>> y[rows, cols] array([3, 3]) Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmkleffner at gmail.com Fri Jan 2 07:36:39 2015 From: cmkleffner at gmail.com (Carl Kleffner) Date: Fri, 2 Jan 2015 13:36:39 +0100 Subject: [Numpy-discussion] npymath on Windows In-Reply-To: References:

Message-ID: Hi, without further testing; this approach may help: (1) create a shared library with all symbols from libnpymath.a: $ gcc -shared -o libnpymath.dll -Wl,--whole-archive libnpymath.a -Wl,--no-whole-archive -lm (2) create a def file: gendef libnpymath.dll There are now two files created by mings-w64 tools: libnpymath.dll, libnpymath.def (3) create import libs for MSVC: first open a new command Window with the VC command prompt: > lib /machine:i386 /def:libnpymath.def (for 64bit use: /machine:X64) Microsoft (R) Library Manager Version 9.00.30729.01 Copyright (C) Microsoft Corporation. All rights reserved. Creating library libnpymath.lib and object libnpymath.exp libnpymath.dll, libnpymath.lib and libnpymath.exp should be sufficient for MSVC. libnpymath.dll has to be deployed. -- carlkl 2015-01-01 21:34 GMT+01:00 Sturla Molden : > On 28/12/14 17:17, David Cournapeau wrote: > > > This is not really supported. You should avoid mixing compilers when > > building C extensions using numpy C API. Either all mingw, or all MSVC. > > That is not really good enough. Even if we build binary wheels with > MinGW (see link) the binary npymath library should be useable from MSVC. > > https://github.com/numpy/numpy/pull/5328 > > > Sturla > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From simlangen at gmail.com Fri Jan 2 07:39:18 2015 From: simlangen at gmail.com (Simen Langseth) Date: Fri, 2 Jan 2015 21:39:18 +0900 Subject: [Numpy-discussion] Extracting required indices from the array of tuples In-Reply-To: References: Message-ID: Dear Jaime: Thank you so much. Your codes are always great. By the way, I have been waiting for several hours to get satisfactory answer at: http://codereview.stackexchange.com/questions/75457/faster-way-of-using-interp1d-in-2d-array?noredirect=1#comment137329_75457 http://stackoverflow.com/questions/27735832/faster-way-of-using-interp1d-in-2d-array Please help me there if you have time. Simen On Fri, Jan 2, 2015 at 9:29 PM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > On Fri, Jan 2, 2015 at 3:06 AM, Simen Langseth > wrote: > >> import numpy as np >> from scipy import signal >> >> y = np.array([[2, 1, 2, 3, 2, 0, 1, 0], >> [2, 1, 2, 3, 2, 0, 1, 0]]) >> >> maximas = signal.argrelmax(y, axis=1) >> >> print maximas >> >> (array([0, 0, 1, 1], dtype=int64), array([3, 6, 3, 6], dtype=int64)) >> >> >> I want to extract only the first maxima of both rows, i.e., [3, 3] using >> the tuples (maximas). How would you do it? >> >> Something like this should work: > > >>> rows, cols = maximas > >>> first_in_row = np.concatenate(([True], rows[:-1] != rows[1:])) > >>> rows = rows[first_in_row] > >>> cols = cols[first_in_row] > >>> y[rows, cols] > array([3, 3]) > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Fri Jan 2 08:22:48 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 2 Jan 2015 13:22:48 +0000 (UTC) Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes References: Message-ID: <964675802441893669.982718sturla.molden-gmail.com@news.gmane.org> Yuxiang Wang wrote: > 4) I wanted to say that it seems to me, as the project gradually > scales up, Cython is easier to deal with, especially when I am using a > lot of numpy arrays. If it is even higher dimensional data, it would > be verbose while it is really succinct to use Cython. The easiest way to speed up NumPy code is to use Numba which is an LLVM based JIT compiler. Numba will often give you performance comparable to C for free. All you have to do is to add the @numba.jit decorator to your Python function and possibly include some type hints. If all you want is to speed up NumPy code, just use Numba and it will take care of it in at least 9 of 10 cases. Numexpr is also a JIT compiler which can speed up Numpy code, but it does not give as dramatic results as Numba. Cython is easier to work with than ctypes, particularly when the problem scales up. If you use typed memoryviews in Cython you can also avoid having to work with pointer arithmetics. Cython is mainly a competitior to using the Python C API manually for C extension modules to Python. Cython also allows you to wrap external C and C++ code, and e.g. use Python and C++ objects together. The drawback is that you need to learn the Cython language as well as Python and C and C++ and know how they differ. Cython also have many of the same hidden dangers as C++, due to the possibility of exceptions being raised between C statements. But because Cython is the most flexible tool for writing C extensions to Python you will in the long run do yourself a favor by learning to use it. ctypes is good when you have a DLL, possibly form a foreign source, and you just want to use it without any build step. CFFI is easier to work with than ctypes and has the same usecase. It can parse C headers and does not require you to define the C API with Python statements like ctypes do. Generally I would say it is alway better to use CFFI than ctypes. ctypes is also currently an orphan, albeit in the Python standard library, while CFFI is actively maintained. Numba will also JIT compile ctypes and CFFI calls to remove the extra overhead. This is good to know if you need to call a C function in a tight loop. In that case Numba can JIT compile away the Python as well as the ctypes/CFFI overhead. Fortran 90/95 is also underrated. It is easier to work with than C, and gives similar results performance wise. You can call Fortran with f2py, ctypes, CFFI, or Cython (use fwrap). Generally I would say that it is better for a student to learn C than Fortran if you have to choose, because C is also useful for other things than numerical computing. But if you want fast and robust numerical code, it is easier to get good results with Fortran than C or Cython. Sturla From sturla.molden at gmail.com Fri Jan 2 08:35:08 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 2 Jan 2015 13:35:08 +0000 (UTC) Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes References: Message-ID: <1048628219441898315.286638sturla.molden-gmail.com@news.gmane.org> Yuxiang Wang wrote: > 1) @Strula Sorry about my stupid mistake! That piece of code totally > gave away how green I am in coding C :) Don't worry. C is a high-level assember. It will bite you again and again, it happens to everyone. Those who say they have never made a stupid mistake while coding in C are lying. Sturla From charlesr.harris at gmail.com Fri Jan 2 21:04:57 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 2 Jan 2015 19:04:57 -0700 Subject: [Numpy-discussion] diag, diagonal, ravel and all that Message-ID: Hi All, The diag, diagonal, and ravel functions have recently been changed to preserve subtypes. However, this causes lots of backward compatibility problems for matrix users, in particular, scipy.sparse. One possibility for fixing this is to special case matrix and so that these functions continue to return 1-d arrays for matrix instances. This is kind of ugly as `a..ravel` will still return a matrix when a is a matrix, an ugly inconsistency. This may be a case where practicality beats beauty. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dgasmith at icloud.com Fri Jan 2 21:45:53 2015 From: dgasmith at icloud.com (Daniel Smith) Date: Fri, 02 Jan 2015 20:45:53 -0600 Subject: [Numpy-discussion] Optimizing multi-tensor contractions in numpy In-Reply-To: <0193F3A5-B3AA-41BF-9A47-536671801090@icloud.com> References: <0193F3A5-B3AA-41BF-9A47-536671801090@icloud.com> Message-ID: <8F357109-C3BD-44EE-804C-87254BD2D1A4@icloud.com> Hello everyone, I have been working on a chunk of code that basically sets out to provide a single function that can take an arbitrary einsum expression and computes it in the most optimal way. While np.einsum can compute arbitrary expressions, there are two drawbacks to using pure einsum: einsum does not consider building intermediate arrays for possible reductions in overall rank and is not currently capable of using a vendor BLAS. I have been working on a project that aims to solve both issues simultaneously: https://github.com/dgasmith/opt_einsum This program first builds the optimal way to contract the tensors together, or using my own nomenclature a ?path.? This path is then iterated over and uses tensordot when possible and einsum for everything else. In test cases the worst case scenario adds a 20 microsecond overhead penalty and, in the best case scenario, it can reduce the overall rank of the tensor. The primary (if somewhat exaggerated) example is a 5-term N^8 index transformation that can be reduced to N^5; even when N is very small (N=10) there is a 2,000 fold speed increase over pure einsum or, if using tensordot, a 2,400 fold speed increase. This is somewhat similar to the new np.linalg.multi_dot function. If you are interested in this function please head over to the github repo and check it out. I believe the README is starting to become self-explanatory, but feel free to email me with any questions. This originally started because I was looking into using numpy to rapidly prototype quantum chemistry codes. The results of which can be found here: https://github.com/dgasmith/psi4numpy As such, I am very interested in implementing this into numpy. While I think opt_einsum is in a pretty good place, there is still quite a bit to do (see outstanding issues in the README). Even if this is not something that would fit into numpy I would still be very interested in your comments. Thank you for your time, -Daniel Smith -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jan 2 22:21:49 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 2 Jan 2015 20:21:49 -0700 Subject: [Numpy-discussion] Optimizing multi-tensor contractions in numpy In-Reply-To: <8F357109-C3BD-44EE-804C-87254BD2D1A4@icloud.com> References: <0193F3A5-B3AA-41BF-9A47-536671801090@icloud.com> <8F357109-C3BD-44EE-804C-87254BD2D1A4@icloud.com> Message-ID: On Fri, Jan 2, 2015 at 7:45 PM, Daniel Smith wrote: > Hello everyone, > > I have been working on a chunk of code that basically sets out to provide > a single function that can take an arbitrary einsum expression and computes > it in the most optimal way. While np.einsum can compute arbitrary > expressions, there are two drawbacks to using pure einsum: einsum does not > consider building intermediate arrays for possible reductions in overall > rank and is not currently capable of using a vendor BLAS. I have been > working on a project that aims to solve both issues simultaneously: > > https://github.com/dgasmith/opt_einsum > > This program first builds the optimal way to contract the tensors > together, or using my own nomenclature a ?path.? This path is then iterated > over and uses tensordot when possible and einsum for everything else. In > test cases the worst case scenario adds a 20 microsecond overhead penalty > and, in the best case scenario, it can reduce the overall rank of the > tensor. The primary (if somewhat exaggerated) example is a 5-term N^8 index > transformation that can be reduced to N^5; even when N is very small (N=10) > there is a 2,000 fold speed increase over pure einsum or, if using > tensordot, a 2,400 fold speed increase. This is somewhat similar to the new > np.linalg.multi_dot function. > > If you are interested in this function please head over to the github repo > and check it out. I believe the README is starting to become > self-explanatory, but feel free to email me with any questions. > > This originally started because I was looking into using numpy to rapidly > prototype quantum chemistry codes. The results of which can be found here: > https://github.com/dgasmith/psi4numpy > > As such, I am very interested in implementing this into numpy. While I > think opt_einsum is in a pretty good place, there is still quite a bit to > do (see outstanding issues in the README). Even if this is not something > that would fit into numpy I would still be very interested in your comments. > > Sounds interesting. I wouldn't be opposed to including an optimized einsum in numpy, there has even been some mention of using blas. Note that cblas is used in multiarray in the current development branch, so that might be useful. I also looked into using einsum to implement the '@' operator coming in Python 3.5, but there was a rather large fixed overhead involved in parsing the input string, and the multiplications were much slower than the numpy dot operator. If those times could be reduced einsum might become a real possibility. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Sat Jan 3 10:05:17 2015 From: alan.isaac at gmail.com (Alan G Isaac) Date: Sat, 03 Jan 2015 10:05:17 -0500 Subject: [Numpy-discussion] diag, diagonal, ravel and all that In-Reply-To: References: Message-ID: <54A8052D.8020804@gmail.com> Would this really be practicality beating purity? It would be nice to have know the principle governing this. For example, is there a way to convincingly group these as array operations vs matrix operations? Personally I am puzzled by preserving subtype of `diagonal` and very especially of `ravel`. Has anyone requested this? (I can see the argument for `diag`.) Alan Isaac On 1/2/2015 9:04 PM, Charles R Harris wrote: > The diag, diagonal, and ravel functions have recently been changed to preserve subtypes. However, this causes lots of backward compatibility problems > for matrix users, in particular, scipy.sparse. One possibility for fixing this is to special case matrix and so that these functions continue to > return 1-d arrays for matrix instances. This is kind of ugly as `a..ravel` will still return a matrix when a is a matrix, an ugly inconsistency. This > may be a case where practicality beats beauty. From charlesr.harris at gmail.com Sat Jan 3 10:32:09 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 3 Jan 2015 08:32:09 -0700 Subject: [Numpy-discussion] diag, diagonal, ravel and all that In-Reply-To: <54A8052D.8020804@gmail.com> References: <54A8052D.8020804@gmail.com> Message-ID: On Sat, Jan 3, 2015 at 8:05 AM, Alan G Isaac wrote: > Would this really be practicality beating purity? > It would be nice to have know the principle governing this. > For example, is there a way to convincingly group these as > array operations vs matrix operations? > > Personally I am puzzled by preserving subtype of `diagonal` and > very especially of `ravel`. Has anyone requested this? > (I can see the argument for `diag`.) > > Alan Isaac > In [1]: from astropy import units as u In [2]: a = eye(2) * u.m In [3]: a Out[3]: In [4]: diagonal(a) Out[4]: In [5]: diag(a) Out[5]: In [6]: ravel(a) Out[6]: None of those examples keep the units without the recent changes. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Sat Jan 3 10:49:28 2015 From: shoyer at gmail.com (Stephan Hoyer) Date: Sat, 3 Jan 2015 17:49:28 +0200 Subject: [Numpy-discussion] Add a function to broadcast arrays to a given shape to numpy's stride_tricks? In-Reply-To: References: Message-ID: Here is an update on a new function for broadcasting arrays to a given shape (now named np.broadcast_to). I have a pull request up for review, which has received some feedback now: https://github.com/numpy/numpy/pull/5371 There is still at least one design decision to settle: should we expose "broadcast_shape" in the public API? In the current implementation, it is exposed as a public function in numpy.lib.tride_tricks (like as_strided), but it is not exported into the main numpy namespace. The alternatives would be to either make it a private function (_broadcast_shape) or expose it publicly (np.broadcast_shape). Please do speak if you have any thoughts to share on the implementation, either here or in the pull request. Best, Stephan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Jan 3 11:57:45 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 3 Jan 2015 16:57:45 +0000 Subject: [Numpy-discussion] Optimizing multi-tensor contractions in numpy In-Reply-To: <8F357109-C3BD-44EE-804C-87254BD2D1A4@icloud.com> References: <0193F3A5-B3AA-41BF-9A47-536671801090@icloud.com> <8F357109-C3BD-44EE-804C-87254BD2D1A4@icloud.com> Message-ID: On 3 Jan 2015 02:46, "Daniel Smith" wrote: > > Hello everyone, > > I have been working on a chunk of code that basically sets out to provide a single function that can take an arbitrary einsum expression and computes it in the most optimal way. While np.einsum can compute arbitrary expressions, there are two drawbacks to using pure einsum: einsum does not consider building intermediate arrays for possible reductions in overall rank and is not currently capable of using a vendor BLAS. I have been working on a project that aims to solve both issues simultaneously: > >https://github.com/dgasmith/opt_einsum > > This program first builds the optimal way to contract the tensors together, or using my own nomenclature a ?path.? This path is then iterated over and uses tensordot when possible and einsum for everything else. In test cases the worst case scenario adds a 20 microsecond overhead penalty and, in the best case scenario, it can reduce the overall rank of the tensor. The primary (if somewhat exaggerated) example is a 5-term N^8 index transformation that can be reduced to N^5; even when N is very small (N=10) there is a 2,000 fold speed increase over pure einsum or, if using tensordot, a 2,400 fold speed increase. This is somewhat similar to the new np.linalg.multi_dot function. This sounds super awesome. Who *wouldn't* want a free 2,000x speed increase? And I especially like your test suite. I'd also be interested in hearing more about the memory requirements of this approach. How does the temporary memory required typically scale with the size of the input arrays? Is there an upper bound on the temporary memory required? > As such, I am very interested in implementing this into numpy. While I think opt_einsum is in a pretty good place, there is still quite a bit to do (see outstanding issues in the README). Even if this is not something that would fit into numpy I would still be very interested in your comments. We would definitely be interested in integrating this functionality into numpy. After all, half the point of having an interface like einsum is that it provides a clean boundary where we can swap in complicated, sophisticated machinery without users having to care. No one wants to curate their own pile of custom optimized libraries. :-) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From yw5aj at virginia.edu Sat Jan 3 12:51:03 2015 From: yw5aj at virginia.edu (Yuxiang Wang) Date: Sat, 3 Jan 2015 12:51:03 -0500 Subject: [Numpy-discussion] Pass 2d ndarray into C **double using ctypes In-Reply-To: <964675802441893669.982718sturla.molden-gmail.com@news.gmane.org> References: <964675802441893669.982718sturla.molden-gmail.com@news.gmane.org> Message-ID: Hi Sturla, First of all, my apologies to have spelled your name wrong for the past year - I just realized it! Thanks to Eric Firing who pointed this out to me. Thank you Sturla for bearing with me! And then - thank you for pointing out Numba! I tried to use it years ago, but ended up Cython eventually because the loop-jitting constraint (http://numba.pydata.org/numba-doc/0.16.0/arrays.html#loop-jitting-constraints) was too strict by that time. After seeing your email, I went to the latest version and saw that it has been greatly relaxed! I really look forward to using it for any new projects. Most of the time, loop-jitting is all I need. Lastly, your comments on Fortran 90/95 is convincing me to move away from Fortran 77. I am writing a small section of code that was called by some other legacy code written in F77. I heard that as long as I compile it correctly, it will interface with the legacy code with no problem. I'll definitely give it a try! Thanks again for all the help Sturla, Shawn On Fri, Jan 2, 2015 at 8:22 AM, Sturla Molden wrote: > Yuxiang Wang wrote: > >> 4) I wanted to say that it seems to me, as the project gradually >> scales up, Cython is easier to deal with, especially when I am using a >> lot of numpy arrays. If it is even higher dimensional data, it would >> be verbose while it is really succinct to use Cython. > > The easiest way to speed up NumPy code is to use Numba which is an LLVM > based JIT compiler. Numba will often give you performance comparable to C > for free. All you have to do is to add the @numba.jit decorator to your > Python function and possibly include some type hints. If all you want is to > speed up NumPy code, just use Numba and it will take care of it in at least > 9 of 10 cases. > > Numexpr is also a JIT compiler which can speed up Numpy code, but it does > not give as dramatic results as Numba. > > Cython is easier to work with than ctypes, particularly when the problem > scales up. If you use typed memoryviews in Cython you can also avoid having > to work with pointer arithmetics. Cython is mainly a competitior to using > the Python C API manually for C extension modules to Python. Cython also > allows you to wrap external C and C++ code, and e.g. use Python and C++ > objects together. The drawback is that you need to learn the Cython > language as well as Python and C and C++ and know how they differ. Cython > also have many of the same hidden dangers as C++, due to the possibility of > exceptions being raised between C statements. But because Cython is the > most flexible tool for writing C extensions to Python you will in the long > run do yourself a favor by learning to use it. > > ctypes is good when you have a DLL, possibly form a foreign source, and you > just want to use it without any build step. CFFI is easier to work with > than ctypes and has the same usecase. It can parse C headers and does not > require you to define the C API with Python statements like ctypes do. > Generally I would say it is alway better to use CFFI than ctypes. ctypes is > also currently an orphan, albeit in the Python standard library, while CFFI > is actively maintained. > > Numba will also JIT compile ctypes and CFFI calls to remove the extra > overhead. This is good to know if you need to call a C function in a tight > loop. In that case Numba can JIT compile away the Python as well as the > ctypes/CFFI overhead. > > Fortran 90/95 is also underrated. It is easier to work with than C, and > gives similar results performance wise. You can call Fortran with f2py, > ctypes, CFFI, or Cython (use fwrap). Generally I would say that it is > better for a student to learn C than Fortran if you have to choose, because > C is also useful for other things than numerical computing. But if you want > fast and robust numerical code, it is easier to get good results with > Fortran than C or Cython. > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia yw5aj at virginia.edu +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/ From alan.isaac at gmail.com Sat Jan 3 12:54:27 2015 From: alan.isaac at gmail.com (Alan G Isaac) Date: Sat, 03 Jan 2015 12:54:27 -0500 Subject: [Numpy-discussion] diag, diagonal, ravel and all that In-Reply-To: References: <54A8052D.8020804@gmail.com> Message-ID: <54A82CD3.8060504@gmail.com> > On Sat, Jan 3, 2015 at 8:05 AM, Alan G Isaac wrote: >> Would this really be practicality beating purity? >> It would be nice to have know the principle governing this. >> For example, is there a way to convincingly group these as >> array operations vs matrix operations? >> Personally I am puzzled by preserving subtype of >> `diagonal` and >> very especially of `ravel`. Has anyone requested this? >> (I can see the argument for `diag`.) On 1/3/2015 10:32 AM, Charles R Harris wrote: > In [1]: from astropy import units as u > In [2]: a = eye(2) * u.m > In [3]: a > Out[3]: > [ 0., 1.]] m> > In [4]: diagonal(a) > Out[4]: > In [5]: diag(a) > Out[5]: > In [6]: ravel(a) > Out[6]: > None of those examples keep the units without the recent changes. Thanks for a nice example. It seems that the core principle you are proposing is that design considerations generally require that subtypes determine the return types of numpy functions. If that is correct, then it seems matrices should then be subject to this; more special casing of the behavior of matrix objects seems highly undesirable. Cheers, Alan From sturla.molden at gmail.com Sat Jan 3 13:15:27 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 03 Jan 2015 19:15:27 +0100 Subject: [Numpy-discussion] Correct C string handling in the NumPy C API? Message-ID: Here is an example: NPY_NO_EXPORT NpyIter_IterNextFunc * NpyIter_GetIterNext(NpyIter *iter, char **errmsg) { npy_uint32 itflags = NIT_ITFLAGS(iter); int ndim = NIT_NDIM(iter); int nop = NIT_NOP(iter); if (NIT_ITERSIZE(iter) < 0) { if (errmsg == NULL) { PyErr_SetString(PyExc_ValueError, "iterator is too large"); } else { *errmsg = "iterator is too large"; } return NULL; } After NpyIter_GetIterNext returns, *errmsg points to a local variable in a returned function. Either I am wrong about C, or this code has undefied behavior... My gutfeeling is that *errmsg = "iterator is too large"; puts the string "iterator is too large" on the stack and points *errmsg to the string. Shouldn't this really be strcpy(*errmsg, "iterator is too large"); and then *errmsg should point to a char buffer allocated before NpyIter_GetIterNext is called? Or will the statement *errmsg = "iterator is too large"; put the string on the stack in the calling C function? Before I open an issue I will ask if my understanding of C is correct or not. I am a bit confused here... Regards, Sturla From njs at pobox.com Sat Jan 3 13:29:13 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 3 Jan 2015 18:29:13 +0000 Subject: [Numpy-discussion] Correct C string handling in the NumPy C API? In-Reply-To: References: Message-ID: 2015-01-03 18:15 GMT+00:00 Sturla Molden : > > Here is an example: > > NPY_NO_EXPORT NpyIter_IterNextFunc * > NpyIter_GetIterNext(NpyIter *iter, char **errmsg) > { > npy_uint32 itflags = NIT_ITFLAGS(iter); > int ndim = NIT_NDIM(iter); > int nop = NIT_NOP(iter); > > if (NIT_ITERSIZE(iter) < 0) { > if (errmsg == NULL) { > PyErr_SetString(PyExc_ValueError, "iterator is too large"); > } > else { > *errmsg = "iterator is too large"; > } > return NULL; > } > > > After NpyIter_GetIterNext returns, *errmsg points to a local variable in > a returned function. > > Either I am wrong about C, or this code has undefied behavior... > > My gutfeeling is that > > *errmsg = "iterator is too large"; > > puts the string "iterator is too large" on the stack and points *errmsg > to the string. No, this code is safe (fortunately!). C string literals have "static storage" (see paragraph 6.4.5.5 in C99), which means that their lifetime is the same as the lifetime of a 'static char[]'. They aren't stack allocated. There's lots more details about this available around the web, e.g.: https://stackoverflow.com/questions/4836534/returning-a-pointer-to-a-literal-or-constant-character-array-string -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From sturla.molden at gmail.com Sat Jan 3 13:32:46 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 3 Jan 2015 18:32:46 +0000 (UTC) Subject: [Numpy-discussion] Correct C string handling in the NumPy C API? References: Message-ID: <877788910442002719.289084sturla.molden-gmail.com@news.gmane.org> Nathaniel Smith wrote: > No, this code is safe (fortunately!). C string literals have "static > storage" (see paragraph 6.4.5.5 in C99), which means that their > lifetime is the same as the lifetime of a 'static char[]'. They aren't > stack allocated. Thanks. That explains it. Sturla From charlesr.harris at gmail.com Sat Jan 3 13:51:54 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 3 Jan 2015 11:51:54 -0700 Subject: [Numpy-discussion] diag, diagonal, ravel and all that In-Reply-To: <54A82CD3.8060504@gmail.com> References: <54A8052D.8020804@gmail.com> <54A82CD3.8060504@gmail.com> Message-ID: On Sat, Jan 3, 2015 at 10:54 AM, Alan G Isaac wrote: > > On Sat, Jan 3, 2015 at 8:05 AM, Alan G Isaac wrote: > >> Would this really be practicality beating purity? > >> It would be nice to have know the principle governing this. > >> For example, is there a way to convincingly group these as > >> array operations vs matrix operations? > >> Personally I am puzzled by preserving subtype of > >> `diagonal` and > >> very especially of `ravel`. Has anyone requested this? > >> (I can see the argument for `diag`.) > > > > On 1/3/2015 10:32 AM, Charles R Harris wrote: > > In [1]: from astropy import units as u > > > In [2]: a = eye(2) * u.m > > > In [3]: a > > Out[3]: > > > [ 0., 1.]] m> > > > In [4]: diagonal(a) > > Out[4]: > > > In [5]: diag(a) > > Out[5]: > > > In [6]: ravel(a) > > Out[6]: > > > None of those examples keep the units without the recent changes. > > > > Thanks for a nice example. It seems that the core principle > you are proposing is that design considerations generally > require that subtypes determine the return types of numpy > functions. If that is correct, then it seems matrices should > then be subject to this; more special casing of the behavior > of matrix objects seems highly undesirable I would agree with you, except that the changes breaks code that uses matrices because matrices are always 2-d whereas the previous results were 1-d. If it were a few not widely used projects I'd stick with it, but scipy.sparse is one of the packages that is broken. Numpy/scipy are not released together, and numpy is often used to compile older versions of scipy, so breaking scipy is undesirable. Becaus we are hoping to phase matrices out over time, preserving the old behavior for matrices until we can dispense with them looks to be the easiest solution. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sat Jan 3 13:58:10 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 3 Jan 2015 18:58:10 +0000 (UTC) Subject: [Numpy-discussion] Correct C string handling in the NumPy C API? References: <877788910442002719.289084sturla.molden-gmail.com@news.gmane.org> Message-ID: <723999938442003413.672980sturla.molden-gmail.com@news.gmane.org> Sturla Molden wrote: > Thanks. That explains it. 20 years after learning C I still discover new things... On the other hand, Fortran is Fortran, and seems to be free of these gotchas... Python is better as well. I hate to say it but C++ would also be less confusing here. I would just pass in a reference to a std::string and assign to it, which I know is safe... On the other hand, implicit static storage of string litterals -- which may or may not be modified depending on compiler? Not so obvious without reading all the small details... Sturla From dgasmith at icloud.com Sat Jan 3 14:26:43 2015 From: dgasmith at icloud.com (Daniel Smith) Date: Sat, 03 Jan 2015 13:26:43 -0600 Subject: [Numpy-discussion] Optimizing multi-tensor contractions in numpy In-Reply-To: References: <0193F3A5-B3AA-41BF-9A47-536671801090@icloud.com> <8F357109-C3BD-44EE-804C-87254BD2D1A4@icloud.com> Message-ID: Hello Nathaniel, > I'd also be interested in hearing more about the memory requirements of this approach. How does the temporary memory required typically scale with the size of the input arrays? Is there an upper bound on the temporary memory required? > Currently the algorithm will not create an array larger than the largest input or output array (maximum_array_size). This gives a maximum upper bound of (number_of_terms/2 + 1) * maximum_array_size. In practice, this rarely goes beyond the maximum_array_size as building large outer products is not usually helpful. The views are also dereferenced after they are used, I believe this should delete the arrays correctly. However, this is one thing I am not sure is being handled in the best way and can use further testing. Figuring out cumulative memory should also be possible for the brute force path algorithm, but I am not sure if this is possible for the faster greedy path algorithm without large changes. Overall this sounds great. If anyone has a suggestion of where this should go I can start working on a PR and we can work out the remaining issues there? -Daniel Smith > On Jan 3, 2015, at 10:57 AM, Nathaniel Smith wrote: > > On 3 Jan 2015 02:46, "Daniel Smith" > wrote: > > > > Hello everyone, > > > > I have been working on a chunk of code that basically sets out to provide a single function that can take an arbitrary einsum expression and computes it in the most optimal way. While np.einsum can compute arbitrary expressions, there are two drawbacks to using pure einsum: einsum does not consider building intermediate arrays for possible reductions in overall rank and is not currently capable of using a vendor BLAS. I have been working on a project that aims to solve both issues simultaneously: > > > >https://github.com/dgasmith/opt_einsum > > > > This program first builds the optimal way to contract the tensors together, or using my own nomenclature a ?path.? This path is then iterated over and uses tensordot when possible and einsum for everything else. In test cases the worst case scenario adds a 20 microsecond overhead penalty and, in the best case scenario, it can reduce the overall rank of the tensor. The primary (if somewhat exaggerated) example is a 5-term N^8 index transformation that can be reduced to N^5; even when N is very small (N=10) there is a 2,000 fold speed increase over pure einsum or, if using tensordot, a 2,400 fold speed increase. This is somewhat similar to the new np.linalg.multi_dot function. > > This sounds super awesome. Who *wouldn't* want a free 2,000x speed increase? And I especially like your test suite. > > I'd also be interested in hearing more about the memory requirements of this approach. How does the temporary memory required typically scale with the size of the input arrays? Is there an upper bound on the temporary memory required? > > > As such, I am very interested in implementing this into numpy. While I think opt_einsum is in a pretty good place, there is still quite a bit to do (see outstanding issues in the README). Even if this is not something that would fit into numpy I would still be very interested in your comments. > > We would definitely be interested in integrating this functionality into numpy. After all, half the point of having an interface like einsum is that it provides a clean boundary where we can swap in complicated, sophisticated machinery without users having to care. No one wants to curate their own pile of custom optimized libraries. :-) > > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Jan 3 14:49:44 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 3 Jan 2015 19:49:44 +0000 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: Message-ID: On 1 Jan 2015 21:35, "Alexander Belopolsky" wrote: > > A discussion [1] is currently underway at GitHub which will benefit from a larger forum. > > In version 1.9, the diagonal() method was changed to return a read-only (non-contiguous) view into the original array instead of a plain copy. Also, it has been announced [2] that in 1.10 the view will become read/write. > > A concern has now been raised [3] that this change breaks backward compatibility too much. > > Consider the following code: > > x = numy.eye(2) > d = x.diagonal() > d[0] = 2 > > In 1.8, this code runs without errors and results in [2, 1] stored in array d. In 1.9, this is an error. With the current plan, in 1.10 this will become valid again, but the result will be different: x[0,0] will be 2 while it is 1 in 1.8. Further context: In 1.7 and 1.8, the code above works as described, but also issues a visible-by-default warning: >>> np.__version__ '1.7.2' >>> x = np.eye(2) >>> x.diagonal()[0] = 2 __main__:1: FutureWarning: Numpy has detected that you (may be) writing to an array returned by numpy.diagonal or by selecting multiple fields in a record array. This code will likely break in the next numpy release -- see numpy.diagonal or arrays.indexing reference docs for details. The quick fix is to make an explicit copy (e.g., do arr.diagonal().copy() or arr[['f0','f1']].copy()). 1.7 was released in Feb. 2013, ~22 months ago. (I'm not implying this number is particularly large or small, it's just something that I find useful to calculate when thinking about things like this.) The choice of "1.10" as the target for completing this change is more-or-less a strawman and we shouldn't feel bound by it. The schedule was originally written in between the 1.6 and 1.7 releases, when our release process was kinda broken and we had no idea what the future release schedule would look like (1.6 -> 1.7 ultimately ended up being a ~21 month gap). We've already adjusted the schedule for this deprecation once before (see issue #596: The original schedule called for the change to returning a ro-view to happen in 1.8, rather than 1.9 as it actually did). Now that our release frequency is higher, 1.11 might well be a more reasonable target than 1.10. As for the overall question, this is really a bigger question about what strategy we should use in general to balance between conservatism (which is a Good Thing) and making improvements (which is also a Good Thing). The post you cite brings this up explicitly: > [3] http://khinsen.wordpress.com/2014/09/12/the-state-of-numpy/ I have huge respect for the problems and pain that Konrad describes in this blog post, but I really can't agree with the argument or the conclusions. His conclusion is that when it comes to compatibility breaks, slow-incremental-change is bad, and that we should instead prefer big all-at-once compatibility breaks like the Numeric->Numpy or Py2->Py3 transitions. But when describing his own experiences that he uses to motivate this, he says: *"The two main dependencies of my code, NumPy and Python itself, did sometimes introduce incompatible changes (by design or as consequences of bug fixes) that required changes on my own code base, but they were surprisingly minor and never required more than about a day of work."* i.e., slow-incremental-change has actually worked well in his experience. (And in particular, the np.diagonal issue only comes in as an example to illustrate what he means by the phrase "slow continuous change" -- this particular change hasn't actually broken anything in his code.) OTOH the big problem that motivated his post was that his code is all written against the APIs of the ancient and long-abandoned Numeric project, and he finds the costs of transitioning them to the "new" numpy APIs to be prohibitively expensive, i.e. this big-bang transition broke his code. (It did manage to limp on for some years b/c numpy used to contain some compatibility code to emulate the Numeric API, but this doesn't really change the basic situation: there were two implementations of the API he needed -- numpy.numeric and Numeric itself -- and both implementations still exist in the sense that you can download them, but neither is usable because no-one's willing to maintain them anymore.) Maybe I'm missing something, but his data seems to be pi radians off from his conclusion. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Sat Jan 3 15:39:20 2015 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 3 Jan 2015 15:39:20 -0500 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: Message-ID: Wasn't all of this discussed way back when the deprecation plan was made? This was known to happen and was entirely the intent, right? What new argument is there to deviate from the plan? As for that particular blog post, I remember reading it back when it was posted. I, again, sympathize with the author's plight, but I pointed out that the reason for some of the changes he noted was because they could cause bugs, which would mean that results could be wrong. Reproducibility is nigh useless without a test suite to ensure the component parts are reproducible on their own. OTOH, there is an argument for slow, carefully-considered changes to APIs (which I think the diagonal() changes were). As an example of a potentially poor change is in matplotlib. We are starting to move to using properties, away from get/setters(). In my upcoming book, I ran into a problem where I needed to use an Artist's get_axes() or use its property "axes", but there will only be one release of matplotlib where both of them will be valid. I was faced with either using the get_axes() and have my code obsolete sometime in the summer, use the propery, and have my code invalid for all but the most recent version of matplotlib, or to have some version checking code that would distract from the lesson at hand. I now think that a single release cycle for deprecation of get_axes() was not a wise decision, especially since the old code was merely verbose, not buggy. To conclude, unless someone can present a *new* argument to deviate from the diagonal() plan that was set a couple years ago, I don't see any reason why the decisions that were agreed upon then are invalid now. The pros-and-cons were weighed, and this particular con was known then and was considered acceptable at that time. Cheers! Ben Root On Sat, Jan 3, 2015 at 2:49 PM, Nathaniel Smith wrote: > On 1 Jan 2015 21:35, "Alexander Belopolsky" wrote: > > > > A discussion [1] is currently underway at GitHub which will benefit from > a larger forum. > > > > In version 1.9, the diagonal() method was changed to return a read-only > (non-contiguous) view into the original array instead of a plain copy. > Also, it has been announced [2] that in 1.10 the view will become > read/write. > > > > A concern has now been raised [3] that this change breaks backward > compatibility too much. > > > > Consider the following code: > > > > x = numy.eye(2) > > d = x.diagonal() > > d[0] = 2 > > > > In 1.8, this code runs without errors and results in [2, 1] stored in > array d. In 1.9, this is an error. With the current plan, in 1.10 this > will become valid again, but the result will be different: x[0,0] will be 2 > while it is 1 in 1.8. > > Further context: > > In 1.7 and 1.8, the code above works as described, but also issues a > visible-by-default warning: > > >>> np.__version__ > '1.7.2' > >>> x = np.eye(2) > >>> x.diagonal()[0] = 2 > __main__:1: FutureWarning: Numpy has detected that you (may be) writing to > an array returned > by numpy.diagonal or by selecting multiple fields in a record > array. This code will likely break in the next numpy release -- > see numpy.diagonal or arrays.indexing reference docs for details. > The quick fix is to make an explicit copy (e.g., do > arr.diagonal().copy() or arr[['f0','f1']].copy()). > > 1.7 was released in Feb. 2013, ~22 months ago. (I'm not implying this > number is particularly large or small, it's just something that I find > useful to calculate when thinking about things like this.) > > The choice of "1.10" as the target for completing this change is > more-or-less a strawman and we shouldn't feel bound by it. The schedule was > originally written in between the 1.6 and 1.7 releases, when our release > process was kinda broken and we had no idea what the future release > schedule would look like (1.6 -> 1.7 ultimately ended up being a ~21 month > gap). We've already adjusted the schedule for this deprecation once before > (see issue #596: The original schedule called for the change to returning a > ro-view to happen in 1.8, rather than 1.9 as it actually did). Now that our > release frequency is higher, 1.11 might well be a more reasonable target > than 1.10. > > As for the overall question, this is really a bigger question about what > strategy we should use in general to balance between conservatism (which is > a Good Thing) and making improvements (which is also a Good Thing). The > post you cite brings this up explicitly: > > > [3] http://khinsen.wordpress.com/2014/09/12/the-state-of-numpy/ > > I have huge respect for the problems and pain that Konrad describes in > this blog post, but I really can't agree with the argument or the > conclusions. His conclusion is that when it comes to compatibility breaks, > slow-incremental-change is bad, and that we should instead prefer big > all-at-once compatibility breaks like the Numeric->Numpy or Py2->Py3 > transitions. But when describing his own experiences that he uses to > motivate this, he says: > > *"The two main dependencies of my code, NumPy and Python itself, did > sometimes introduce incompatible changes (by design or as consequences of > bug fixes) that required changes on my own code base, but they were > surprisingly minor and never required more than about a day of work."* > > i.e., slow-incremental-change has actually worked well in his experience. > (And in particular, the np.diagonal issue only comes in as an example to > illustrate what he means by the phrase "slow continuous change" -- this > particular change hasn't actually broken anything in his code.) OTOH the > big problem that motivated his post was that his code is all written > against the APIs of the ancient and long-abandoned Numeric project, and he > finds the costs of transitioning them to the "new" numpy APIs to be > prohibitively expensive, i.e. this big-bang transition broke his code. (It > did manage to limp on for some years b/c numpy used to contain some > compatibility code to emulate the Numeric API, but this doesn't really > change the basic situation: there were two implementations of the API he > needed -- numpy.numeric and Numeric itself -- and both implementations > still exist in the sense that you can download them, but neither is usable > because no-one's willing to maintain them anymore.) Maybe I'm missing > something, but his data seems to be pi radians off from his conclusion. > > -n > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From maniteja.modesty067 at gmail.com Sat Jan 3 16:44:10 2015 From: maniteja.modesty067 at gmail.com (Maniteja Nandana) Date: Sun, 4 Jan 2015 03:14:10 +0530 Subject: [Numpy-discussion] Regarding np.ma.masked_equal behavior Message-ID: Hello friends, This is an issue related to the working of *masked_equal* method. I was thinking if anyone related to an old ticket #1851 , regarding the modification of *masked_equal *function effect on *fill_value *could clarify the situation, since right now, the documentation and implementation conflict. There is an issue raised regarding this #5408 . Cheers*,* N.Maniteja _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Jan 3 16:55:38 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 3 Jan 2015 21:55:38 +0000 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: Message-ID: Hi, On Thu, Jan 1, 2015 at 9:35 PM, Alexander Belopolsky wrote: > A discussion [1] is currently underway at GitHub which will benefit from a > larger forum. > > In version 1.9, the diagonal() method was changed to return a read-only > (non-contiguous) view into the original array instead of a plain copy. > Also, it has been announced [2] that in 1.10 the view will become > read/write. > > A concern has now been raised [3] that this change breaks backward > compatibility too much. > > Consider the following code: > > x = numy.eye(2) > d = x.diagonal() > d[0] = 2 > > In 1.8, this code runs without errors and results in [2, 1] stored in array > d. In 1.9, this is an error. With the current plan, in 1.10 this will > become valid again, but the result will be different: x[0,0] will be 2 while > it is 1 in 1.8. > > Two alternatives are suggested for discussion: > > 1. Add copy=True flag to diagonal() method. > 2. Roll back 1.9 change to diagonal() and introduce an additional > diagonal_view() method to return a view. I think this point is a good one, from Konrad Hinsen's blog post: If you get a Python script, say as a reviewer for a submitted article, and see ?import numpy?, you don?t know which version of numpy the authors had in mind. If that script calls array.diag() and modifies the return value, does it expect to modify a copy or a view? The result is very different, but there is no way to tell. It is possible, even quite probable, that the code would execute fine with both NumPy 1.8 and the upcoming NumPy 1.10, but yield different results. That rules out the current 1.10 plan I think. copy=True as default seems like a nice compact and explicit solution to me. Cheers, Matthew From charlesr.harris at gmail.com Sat Jan 3 17:08:29 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 3 Jan 2015 15:08:29 -0700 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: Message-ID: On Sat, Jan 3, 2015 at 2:55 PM, Matthew Brett wrote: > Hi, > > On Thu, Jan 1, 2015 at 9:35 PM, Alexander Belopolsky > wrote: > > A discussion [1] is currently underway at GitHub which will benefit from > a > > larger forum. > > > > In version 1.9, the diagonal() method was changed to return a read-only > > (non-contiguous) view into the original array instead of a plain copy. > > Also, it has been announced [2] that in 1.10 the view will become > > read/write. > > > > A concern has now been raised [3] that this change breaks backward > > compatibility too much. > > > > Consider the following code: > > > > x = numy.eye(2) > > d = x.diagonal() > > d[0] = 2 > > > > In 1.8, this code runs without errors and results in [2, 1] stored in > array > > d. In 1.9, this is an error. With the current plan, in 1.10 this will > > become valid again, but the result will be different: x[0,0] will be 2 > while > > it is 1 in 1.8. > > > > Two alternatives are suggested for discussion: > > > > 1. Add copy=True flag to diagonal() method. > > 2. Roll back 1.9 change to diagonal() and introduce an additional > > diagonal_view() method to return a view. > > I think this point is a good one, from Konrad Hinsen's blog post: > > > If you get a Python script, say as a reviewer for a submitted article, > and see ?import numpy?, you don?t know which version of numpy the > authors had in mind. If that script calls array.diag() and modifies > the return value, does it expect to modify a copy or a view? The > result is very different, but there is no way to tell. It is possible, > even quite probable, that the code would execute fine with both NumPy > 1.8 and the upcoming NumPy 1.10, but yield different results. > > > That rules out the current 1.10 plan I think. > > copy=True as default seems like a nice compact and explicit solution to me. > > Bear in mind that this also affects the C-API via the PyArray_Diagonal function, so the rollback proposal would be 1) Roll back the change to PyArray_Diagonal 2) Introduce a new C-API function PyArray_Diagonal2 that has a 'copy' argument 3) Make PyArray_Diagonal call PyArray_Diagonal2 with 'copy=1' 4) Add a copy argument to do the diagonal method. I'm thinking we should have a rule that functions in the C-API can be refactored or deprecated, but they don't change otherwise. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sat Jan 3 18:54:58 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 04 Jan 2015 00:54:58 +0100 Subject: [Numpy-discussion] diag, diagonal, ravel and all that In-Reply-To: References: Message-ID: On 03/01/15 03:04, Charles R Harris wrote: > The diag, diagonal, and ravel functions have recently been changed to > preserve subtypes. However, this causes lots of backward compatibility > problems for matrix users, in particular, scipy.sparse. One possibility > for fixing this is to special case matrix and so that these functions > continue to return 1-d arrays for matrix instances. This is kind of ugly > as `a..ravel` will still return a matrix when a is a matrix, an ugly > inconsistency. This may be a case where practicality beats beauty. > > Thoughts? What about fixing scipy.sparse? Sturla From charlesr.harris at gmail.com Sat Jan 3 19:28:41 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 3 Jan 2015 17:28:41 -0700 Subject: [Numpy-discussion] diag, diagonal, ravel and all that In-Reply-To: References: Message-ID: On Sat, Jan 3, 2015 at 4:54 PM, Sturla Molden wrote: > On 03/01/15 03:04, Charles R Harris wrote: > > > The diag, diagonal, and ravel functions have recently been changed to > > preserve subtypes. However, this causes lots of backward compatibility > > problems for matrix users, in particular, scipy.sparse. One possibility > > for fixing this is to special case matrix and so that these functions > > continue to return 1-d arrays for matrix instances. This is kind of ugly > > as `a..ravel` will still return a matrix when a is a matrix, an ugly > > inconsistency. This may be a case where practicality beats beauty. > > > > Thoughts? > > What about fixing scipy.sparse? > PR already in. The problem is that versions of scipy <=15 will not work with numpy 1.10 if we don't fix this in numpy. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Jan 4 03:44:32 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 4 Jan 2015 09:44:32 +0100 Subject: [Numpy-discussion] diag, diagonal, ravel and all that In-Reply-To: References: Message-ID: On Sun, Jan 4, 2015 at 1:28 AM, Charles R Harris wrote: > > > On Sat, Jan 3, 2015 at 4:54 PM, Sturla Molden > wrote: > >> On 03/01/15 03:04, Charles R Harris wrote: >> >> > The diag, diagonal, and ravel functions have recently been changed to >> > preserve subtypes. However, this causes lots of backward compatibility >> > problems for matrix users, in particular, scipy.sparse. One possibility >> > for fixing this is to special case matrix and so that these functions >> > continue to return 1-d arrays for matrix instances. This is kind of ugly >> > as `a..ravel` will still return a matrix when a is a matrix, an ugly >> > inconsistency. This may be a case where practicality beats beauty. >> > >> > Thoughts? >> > I think it makes sense to special-case matrix here. Arguable, ravel() is an operation that should return a 1-D array (ndarray or other array-like object). np.matrix doesn't allow 1-D objects, hence can't be returned. The method is also documented to return a 1-D array, so maybe the matrix.ravel method is wrong here: In [1]: x = np.matrix(np.eye(3)) In [2]: x.ravel() Out[2]: matrix([[ 1., 0., 0., 0., 1., 0., 0., 0., 1.]]) # 2-D In [3]: print(x.ravel.__doc__) a.ravel([order]) Return a flattened array. Refer to `numpy.ravel` for full documentation. Ralf > >> What about fixing scipy.sparse? >> > > PR already in. The problem is that versions of scipy <=15 will not work > with numpy 1.10 if we don't fix this in numpy. > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Jan 4 04:46:17 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 4 Jan 2015 10:46:17 +0100 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References:

Message-ID: On Sat, Jan 3, 2015 at 11:08 PM, Charles R Harris wrote: > > > On Sat, Jan 3, 2015 at 2:55 PM, Matthew Brett > wrote: > >> Hi, >> >> On Thu, Jan 1, 2015 at 9:35 PM, Alexander Belopolsky >> wrote: >> > A discussion [1] is currently underway at GitHub which will benefit >> from a >> > larger forum. >> > >> > In version 1.9, the diagonal() method was changed to return a read-only >> > (non-contiguous) view into the original array instead of a plain copy. >> > Also, it has been announced [2] that in 1.10 the view will become >> > read/write. >> > >> > A concern has now been raised [3] that this change breaks backward >> > compatibility too much. >> > >> > Consider the following code: >> > >> > x = numy.eye(2) >> > d = x.diagonal() >> > d[0] = 2 >> > >> > In 1.8, this code runs without errors and results in [2, 1] stored in >> array >> > d. In 1.9, this is an error. With the current plan, in 1.10 this will >> > become valid again, but the result will be different: x[0,0] will be 2 >> while >> > it is 1 in 1.8. >> > >> > Two alternatives are suggested for discussion: >> > >> > 1. Add copy=True flag to diagonal() method. >> > 2. Roll back 1.9 change to diagonal() and introduce an additional >> > diagonal_view() method to return a view. >> >> I think this point is a good one, from Konrad Hinsen's blog post: >> >> >> If you get a Python script, say as a reviewer for a submitted article, >> and see ?import numpy?, you don?t know which version of numpy the >> authors had in mind. If that script calls array.diag() and modifies >> the return value, does it expect to modify a copy or a view? The >> result is very different, but there is no way to tell. It is possible, >> even quite probable, that the code would execute fine with both NumPy >> 1.8 and the upcoming NumPy 1.10, but yield different results. >> >> >> That rules out the current 1.10 plan I think. >> > I think maybe making the change in 1.10 is too quick, but it doesn't rule it out long-term. This issue and the copy=True alternative were extensively discussed when making the change: http://thread.gmane.org/gmane.comp.python.numeric.general/49887/focus=49888 It's not impossible that we made the wrong decision a while back, but rehashing that whole discussion based on an argument that was already brought up back then doesn't sound good to me. > copy=True as default seems like a nice compact and explicit solution to me. >> > > Bear in mind that this also affects the C-API via the PyArray_Diagonal > function, so the rollback proposal would be > > 1) Roll back the change to PyArray_Diagonal > 2) Introduce a new C-API function PyArray_Diagonal2 that has a 'copy' > argument > 3) Make PyArray_Diagonal call PyArray_Diagonal2 with 'copy=1' > 4) Add a copy argument to do the diagonal method. > > I'm thinking we should have a rule that functions in the C-API can be > refactored or deprecated, but they don't change otherwise. > Makes sense. It's time to document the policy on deprecations and incompatible changes in more detail I think. We had a few sentences long statement on this on the Trac wiki, IIRC written by Robert Kern, but that's gone now. Do we have anything else written down anywhere? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Jan 4 09:56:28 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 4 Jan 2015 07:56:28 -0700 Subject: [Numpy-discussion] diag, diagonal, ravel and all that In-Reply-To: References:

Message-ID: On Sun, Jan 4, 2015 at 1:44 AM, Ralf Gommers wrote: > > > On Sun, Jan 4, 2015 at 1:28 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sat, Jan 3, 2015 at 4:54 PM, Sturla Molden >> wrote: >> >>> On 03/01/15 03:04, Charles R Harris wrote: >>> >>> > The diag, diagonal, and ravel functions have recently been changed to >>> > preserve subtypes. However, this causes lots of backward compatibility >>> > problems for matrix users, in particular, scipy.sparse. One possibility >>> > for fixing this is to special case matrix and so that these functions >>> > continue to return 1-d arrays for matrix instances. This is kind of >>> ugly >>> > as `a..ravel` will still return a matrix when a is a matrix, an ugly >>> > inconsistency. This may be a case where practicality beats beauty. >>> > >>> > Thoughts? >>> >> > I think it makes sense to special-case matrix here. Arguable, ravel() is > an operation that should return a 1-D array (ndarray or other array-like > object). np.matrix doesn't allow 1-D objects, hence can't be returned. > > The method is also documented to return a 1-D array, so maybe the > matrix.ravel method is wrong here: > > In [1]: x = np.matrix(np.eye(3)) > > In [2]: x.ravel() > Out[2]: matrix([[ 1., 0., 0., 0., 1., 0., 0., 0., 1.]]) # 2-D > > In [3]: print(x.ravel.__doc__) > a.ravel([order]) > > Return a flattened array. > > Refer to `numpy.ravel` for full documentation. > Just to clarify the previous behavior for matrix m. 1) m.diagonal() and m.ravel() both return matrices 2) diagonal(m) and ravel(m) both return 1-D arrays Currently in master, which is incompatible with scipy master 1) m.diagonal() and m.ravel() both return matrices 2) diagonal(m) and ravel(m) both return matrices There is a PR to revert to the previous behavior. Another option is to change m.ravel() to return a 1-D array and leave diagonal(m) returning a matrix. The incompatibilites with diagonal didn't seem to be as troublesome. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Jan 4 11:08:17 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 4 Jan 2015 17:08:17 +0100 Subject: [Numpy-discussion] numpy.fromiter in numpypy In-Reply-To: <645635478.812990.1420142270937.JavaMail.yahoo@jws10729.mail.gq1.yahoo.com> References: <645635478.812990.1420142270937.JavaMail.yahoo@jws10729.mail.gq1.yahoo.com> Message-ID: Hi Albert-Jan, On Thu, Jan 1, 2015 at 8:57 PM, Albert-Jan Roskam wrote: > Hi, > > I would like to use the numpy implementation for Pypy. In particular, I > would like to use numpy.fromiter, which is available according to this > overview: http://buildbot.pypy.org/numpy-status/latest.html. However, > contrary to what this website says, this function is not yet available. > Conclusion: the website is wrong. Or am I missing something? > No idea to be honest. Note that numpypy is developed by the PyPy team and not by the Numpy team. So you may want to ask on the PyPy mailing list: https://mail.python.org/mailman/listinfo/pypy-dev Cheers, Ralf > > albertjan at debian:~$ sudo pypy $(which pip) install -U git+ > https://bitbucket.org/pypy/numpy.git > albertjan at debian:~$ sudo pypy -c 'import numpy' # sudo: as per the > installation instructions > albertjan at debian:~$ pypy > Python 2.7.8 (f5dcc2477b97386c11e4b67f08a2d00fbd2fce5d, Sep 19 2014, > 10:37:41) > [PyPy 2.4.0 with GCC 4.8.2] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>>> import sys > >>>> import numpy as np > >>>> np.__version__, sys.version > ('1.9.0', '2.7.8 (f5dcc2477b97386c11e4b67f08a2d00fbd2fce5d, Sep 19 2014, > 10:37:41)\n[PyPy 2.4.0 with GCC 4.8.2]') > >>>> np.fromiter > > >>>> np.fromiter((i for i in range(10)), np.float) > Traceback (most recent call last): > File "", line 1, in > File "/opt/pypy-2.4/site-packages/numpy/core/multiarray.py", line 55, in > tmp > raise NotImplementedError("%s not implemented yet" % func) > NotImplementedError: fromiter not implemented yet > > The same also applies to numpy.fromfile > > Thanks in advance and happy 2015. > > > > Regards, > > Albert-Jan > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > All right, but apart from the sanitation, the medicine, education, wine, > public order, irrigation, roads, a > > fresh water system, and public health, what have the Romans ever done for > us? > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From konrad.hinsen at fastmail.net Sun Jan 4 11:22:30 2015 From: konrad.hinsen at fastmail.net (Konrad Hinsen) Date: Sun, 04 Jan 2015 17:22:30 +0100 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: Message-ID: <54A968C6.1040502@fastmail.net> On 03/01/15 20:49, Nathaniel Smith wrote: > The post you cite brings this up explicitly: > > > [3] http://khinsen.wordpress.com/2014/09/12/the-state-of-numpy/ > > I have huge respect for the problems and pain that Konrad describes in > this blog post, but I really can't agree with the argument or the > conclusions. His conclusion is that when it comes to compatibility > breaks, slow-incremental-change is bad, and that we should instead > prefer big all-at-once compatibility breaks like the Numeric->Numpy or > Py2->Py3 transitions. But when describing his own experiences that he > uses to motivate this, he says: ... There are two different scenarios to consider here, and perhaps I didn't make that distinction clear enough. One scenario is that of a maintained library or application that depends on NumPy. The other scenario is a set of scripts written for a specific study (let's say a thesis) that is then published for its documentation value. Such scripts are in general not maintained. In the first scenario, gradual API changes work reasonably well, as long as the effort involved in applying the fixes are sufficiently minor that developers can integrate them into their routine maintenance efforts. That is the situation I have described for my own past experience as a library author. It's the second scenario where gradual changes are a real problem. Suppose I have a set of scripts from a thesis published in year X, and I need to understand them in detail in year X+5 for a related scientific project. If the script produces different results with NumPy 1.7 and NumPy 1.10, which result should I assume the author intended? People rarely write down which versions of all dependencies they used. Yes, they should, but it's actually a pain to do this, in particular when you work on multiple machines and don't manage the Python installation yourself. In this rather frequent situation, the published scripts are ambiguous - I can't really know what they did when the author ran them. There is a third scenario where this problem shows up: outdated legacy system installations, which are particularly frequent on clusters and supercomputer. For example, the cluster at my lab runs a CentOS version that is a few years old. CentOS is known for its conservatism, and therefore the default Python installation on that machine is based on Python 2.6 with correspondingly old NumPy versions. People do install recent application libraries there. Suppose someone runs code there that assumes the future semantics for diagonal() - this will silently yield wrong results. In summary, my point of view on breaking changes is 1) Changes that can make legacy code fail can be introduced gradually. The right compromise between stability and progress must be figured out by the community. 2) Changes that yield to different results for unmodified legacy code should never be allowed. 3) The best overall solution for API evolution is a version number visible in client code with a version number change whenever some breaking change is introduced. This guarantees point 2). So if the community decides that it is important to change the behavior of diagonal(), this should be done in one of two ways: a) Deprecate diagonal() and introduce a differently-named method with the new functionality. This will make old code fail rather than produce wrong results. b) Accumulate this change with other such changes and call the new API "numpy2". Konrad. From ben.root at ou.edu Sun Jan 4 14:37:44 2015 From: ben.root at ou.edu (Benjamin Root) Date: Sun, 4 Jan 2015 14:37:44 -0500 Subject: [Numpy-discussion] Regarding np.ma.masked_equal behavior In-Reply-To: References: Message-ID: Personally, I have never depended upon an implicit fill value. I would always handle it explicitly. Off the top of my head, a project that might have really good insight into how fill_value should work is the python-netCDF4 project (so, talk to Jeff Whitaker, I think), and/or the HDF5 people. I know the netCDF4 package, which supports masked arrays, takes advantage of fill_value attributes. Cheers! Ben Root On Sat, Jan 3, 2015 at 4:44 PM, Maniteja Nandana < maniteja.modesty067 at gmail.com> wrote: > Hello friends, > > This is an issue related to the working of *masked_equal* method. I was > thinking if anyone related to an old ticket #1851 > , regarding the modification > of *masked_equal *function effect on *fill_value *could clarify the > situation, since right now, the documentation and implementation conflict. > There is an issue raised regarding this #5408 > . > > Cheers*,* > N.Maniteja > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sun Jan 4 15:28:41 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 04 Jan 2015 21:28:41 +0100 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: <54A968C6.1040502@fastmail.net> References: <54A968C6.1040502@fastmail.net> Message-ID: On 04/01/15 17:22, Konrad Hinsen wrote: > There are two different scenarios to consider here, and perhaps I didn't > make that distinction clear enough. One scenario is that of a maintained > library or application that depends on NumPy. The other scenario is a > set of scripts written for a specific study (let's say a thesis) that is > then published for its documentation value. Such scripts are in general > not maintained. > It's the second scenario where gradual changes are a real problem. > Suppose I have a set of scripts from a thesis published in year X, and I > need to understand them in detail in year X+5 for a related scientific > project. If the script produces different results with NumPy 1.7 and > NumPy 1.10, which result should I assume the author intended? A scientific paper or thesis should be written so it is completely reproducible. That would include describing the computer, OS, Python version and NumPy version, as well as C or Fortran compiler. I will happily fail any student who writes a thesis without providing such details, and if I review a research paper for a journal you can be sure I will ask that is corrected. Sturla From sturla.molden at gmail.com Sun Jan 4 15:55:47 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 04 Jan 2015 21:55:47 +0100 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: Message-ID: On 03/01/15 20:49, Nathaniel Smith wrote: > i.e., slow-incremental-change has actually worked well in his > experience. (And in particular, the np.diagonal issue only comes in as > an example to illustrate what he means by the phrase "slow continuous > change" -- this particular change hasn't actually broken anything in his > code.) OTOH the big problem that motivated his post was that his code is > all written against the APIs of the ancient and long-abandoned Numeric > project, and he finds the costs of transitioning them to the "new" numpy > APIs to be prohibitively expensive, i.e. this big-bang transition broke > his code. Given that a big-bang transition broke his code everywhere, I don't really see why he wants more of them. The question of reproducible research is orthogonal to this, I think. Sturla From valentin at haenel.co Sun Jan 4 15:59:40 2015 From: valentin at haenel.co (Valentin Haenel) Date: Sun, 4 Jan 2015 21:59:40 +0100 Subject: [Numpy-discussion] [ANN] bcolz 0.7.3 Message-ID: <20150104205940.GB9729@kudu.in-berlin.de> ====================== Announcing bcolz 0.7.3 ====================== What's new ========== This release includes the support for pickling persistent carray/ctable objects contributed by Matthew Rocklin. Also, the included version of Blosc is updated to ``v1.5.2``. Lastly, several minor issues and typos have been fixed, please see the release notes for details. ``bcolz`` is a renaming of the ``carray`` project. The new goals for the project are to create simple, yet flexible compressed containers, that can live either on-disk or in-memory, and with some high-performance iterators (like `iter()`, `where()`) for querying them. Together, bcolz and the Blosc compressor, are finally fulfilling the promise of accelerating memory I/O, at least for some real scenarios: http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots For more detailed info, see the release notes in: https://github.com/Blosc/bcolz/wiki/Release-Notes What it is ========== bcolz provides columnar and compressed data containers. Column storage allows for efficiently querying tables with a large number of columns. It also allows for cheap addition and removal of column. In addition, bcolz objects are compressed by default for reducing memory/disk I/O needs. The compression process is carried out internally by Blosc, a high-performance compressor that is optimized for binary data. bcolz can use numexpr internally so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr optimizes the memory usage and use several cores for doing the computations, so it is blazing fast. Moreover, the carray/ctable containers can be disk-based, and it is possible to use them for seamlessly performing out-of-memory computations. bcolz has minimal dependencies (NumPy), comes with an exhaustive test suite and fully supports both 32-bit and 64-bit platforms. Also, it is typically tested on both UNIX and Windows operating systems. Installing ========== bcolz is in the PyPI repository, so installing it is easy:: $ pip install -U bcolz Resources ========= Visit the main bcolz site repository at: http://github.com/Blosc/bcolz Manual: http://bcolz.blosc.org Home of Blosc compressor: http://blosc.org User's mail list: bcolz at googlegroups.com http://groups.google.com/group/bcolz License is the new BSD: https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt ---- **Enjoy data!** From konrad.hinsen at fastmail.net Sun Jan 4 23:22:13 2015 From: konrad.hinsen at fastmail.net (Konrad Hinsen) Date: Mon, 05 Jan 2015 05:22:13 +0100 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: Message-ID: <54AA1175.4050900@fastmail.net> On 04/01/15 21:55, Sturla Molden wrote: > On 03/01/15 20:49, Nathaniel Smith wrote: > >> OTOH the big problem that motivated his post was that his code is >> all written against the APIs of the ancient and long-abandoned Numeric >> project, and he finds the costs of transitioning them to the "new" numpy >> APIs to be prohibitively expensive, i.e. this big-bang transition broke >> his code. > > Given that a big-bang transition broke his code everywhere, I don't > really see why he wants more of them. I am not asking for "big-bang transitions" as such. I am asking for breaking changes to go along with a clearly visible and clearly announced change in the API name and/or major version. A change as important as dropping support for an API that has been around for 20 years shouldn't happen as one point in the change list from version 1.8 to 1.9. It can happen in the transition from "numpy" to "numpy2", which ideally should be done in a way that permits users to install both "numpy" and "numpy2" in parallel to ease the transition. There is a tacit convention in computing that "higher" version numbers of a package indicate improvements and extensions but not reduction in functionality. This convention also underlies most of today's package management systems. Major breaking changes violate this tacit convention. > The question of reproducible research is orthogonal to this, I think. Indeed. My blog post addresses two distinct issues, whose common point is that they relate to the evolution of NumPy. Konrad. From konrad.hinsen at fastmail.net Sun Jan 4 23:31:21 2015 From: konrad.hinsen at fastmail.net (Konrad Hinsen) Date: Mon, 05 Jan 2015 05:31:21 +0100 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: <54A968C6.1040502@fastmail.net> Message-ID: <54AA1399.4000201@fastmail.net> On 04/01/15 21:28, Sturla Molden wrote: > A scientific paper or thesis should be written so it is completely > reproducible. That would include describing the computer, OS, Python > version and NumPy version, as well as C or Fortran compiler. I completely agree and we should all work towards this goal. But we aren't there yet. Most of the scientific community is just beginning to realize that there is a problem. Anyone writing scientific software for use in today's environment has to take this into account. More importantly, there is not only the technical problem of reproducibility, but also the meta-level problem of human understanding. Scientific communication depends more and more on scripts as the only precise documentation of a computational method. Our programming languages are becoming a major form of scientific notation, alongside traditional mathematics. Humans don't read written text with version numbers in mind. This is a vast problem which can't be solved merely by "fixing" software technology, but it's something to keep in mind nevertheless when writing software. For those interested in this aspect, I have written a much more detailed account in a recent paper: http://dx.doi.org/10.12688/f1000research.3978.2 Konrad. From antony.lee at berkeley.edu Mon Jan 5 02:34:09 2015 From: antony.lee at berkeley.edu (Antony Lee) Date: Mon, 5 Jan 2015 00:34:09 -0700 Subject: [Numpy-discussion] edge-cases of ellipsis indexing Message-ID: While trying to reproduce various fancy indexings for astropy's FITS sections (a loaded-on-demand array), I found the following interesting behavior: >>> np.array([1])[..., 0] array(1) >>> np.array([1])[0] 1 >>> np.array([1])[(0,)] 1 The docs say "Ellipsis expand to the number of : objects needed to make a selection tuple of the same length as x.ndim.", so it's not totally clear to me how to explain that difference in the results. Antony -------------- next part -------------- An HTML attachment was scrubbed... URL: From maniteja.modesty067 at gmail.com Mon Jan 5 03:43:33 2015 From: maniteja.modesty067 at gmail.com (Maniteja Nandana) Date: Mon, 5 Jan 2015 14:13:33 +0530 Subject: [Numpy-discussion] edge-cases of ellipsis indexing In-Reply-To: References: Message-ID: Hi Anthony, I am not sure whether the following section in documentation is relevant to the behavior you were referring to. When an ellipsis (...) is present but has no size (i.e. replaces zero :) the result will still always be an array. A view if no advanced index is present, otherwise a copy. Here, ...replaces zero : Advanced indexing always returns a *copy* of the data (contrast with basic slicing that returns a *view* ). And I think it is a view that is returned in this case. >>> a = array([1]) >>>a array([1]) >>>a[:,0] # zero : are present Traceback (most recent call last): File "", line 1, in IndexError: too many indices for array >>>a[...,0]=2 >>>a array([2]) >>>a[0] = 3 >>>a array([3]) >>>a[(0,)] = 4 >>>a array([4]) >>>a[: array([1]) Hope I helped. Cheers, N.Maniteja. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Mon Jan 5 03:43:45 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 5 Jan 2015 08:43:45 +0000 (UTC) Subject: [Numpy-discussion] The future of ndarray.diagonal() References: <54A968C6.1040502@fastmail.net> <54AA1399.4000201@fastmail.net> Message-ID: <1893055725442139350.794591sturla.molden-gmail.com@news.gmane.org> Konrad Hinsen wrote: > Scientific communication depends more and more on scripts as the only > precise documentation of a computational method. Our programming > languages are becoming a major form of scientific notation, alongside > traditional mathematics. To me it seems that algorithms in scientific papers and books are described in various forms of pseudo-code. Perhaps we need a notation which is universal and ethernal like the language mathematics. But I am not sure Python could or should try to be that "scripting" language. I also think it is reasonable to ask if journals should require code as algorithmic documentation to be written in some ISO standard language like C or Fortran 90. The behavior of Python and NumPy are not dictated by standards, and as such is not better than pseudo-code. Sturla From konrad.hinsen at fastmail.net Mon Jan 5 04:08:19 2015 From: konrad.hinsen at fastmail.net (Konrad Hinsen) Date: Mon, 05 Jan 2015 10:08:19 +0100 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: <1893055725442139350.794591sturla.molden-gmail.com@news.gmane.org> References: <54A968C6.1040502@fastmail.net> <54AA1399.4000201@fastmail.net> <1893055725442139350.794591sturla.molden-gmail.com@news.gmane.org> Message-ID: <34E233C6626E440F136224C0@Ordinateur-de-Catherine--Konrad.local> --On 5 janvier 2015 08:43:45 +0000 Sturla Molden wrote: > To me it seems that algorithms in scientific papers and books are > described in various forms of pseudo-code. That's indeed what people do when they write a paper about an algorithm. But many if not most algorithms in computational science are never published in a specific article. Very often, a scientific article gives only an outline of a method in plain English. The only full documentation of the method is the implementation. > Perhaps we need a notation > which is universal and ethernal like the language mathematics. But I am > not sure Python could or should try to be that "scripting" language. Neither Python nor any other programming was designed for that task, and none of them is really a good fit. But today's de facto situation is that programming languages fulfill the role of algorithmic specification languages in computational science. And I don't expect this to change rapidly, in particular because to the best of my knowledge there is no better choice available at the moment. I wrote an article on this topic that will appear in the March 2015 issue of "Computing in Science and Engineering". It concludes that for now, a simple Python script is probably the best you can do for an executable specification of an algorithm. However, I also recommend not using big libraries such as NumPy in such scripts. > I also think it is reasonable to ask if journals should require code as > algorithmic documentation to be written in some ISO standard language like > C or Fortran 90. The behavior of Python and NumPy are not dictated by > standards, and as such is not better than pseudo-code. True, but the ISO specifications of C and Fortran have so many holes ("undefined behavior") that they are not really much better for the job. And again, we can't ignore the reality of the de facto use today: there are no such requirements or even guidelines, so Python scripts are often the best we have as algorithmic documentation. Konrad. From sebastian at sipsolutions.net Mon Jan 5 04:14:56 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 05 Jan 2015 10:14:56 +0100 Subject: [Numpy-discussion] edge-cases of ellipsis indexing In-Reply-To: References:

Message-ID: <1420449296.31170.3.camel@sebastian-t440> On Mo, 2015-01-05 at 14:13 +0530, Maniteja Nandana wrote: > Hi Anthony, > > > I am not sure whether the following section in documentation is > relevant to the behavior you were referring to. > > > When an ellipsis (...) is present but has no size (i.e. replaces > zero :) the result will still always be an array. A view if no > advanced index is present, otherwise a copy. > Exactly. There are actually three forms of indexing to distinguish. 1. Indexing with integers (also scalar arrays) matching the number of dimensions. This will return a *scalar*. 2. Slicing, etc. which returns a view. This also occurs as soon there is an ellipsis in there (even if it replaces 0 `:`). You should see it as a feature to get a view if the result might be a scalar otherwise ;)! 3. Advanced indexing which cannot be view based and returns a copy. - Sebastian > Here, ...replaces zero : > > > > Advanced indexing always returns a copy of the data (contrast with > basic slicing that returns a view). > And I think it is a view that is returned in this case. > > > >>> a = array([1]) > >>>a > array([1]) > >>>a[:,0] # zero : are present > Traceback (most recent call last): > File "", line 1, in > IndexError: too many indices for array > >>>a[...,0]=2 > >>>a > array([2]) > >>>a[0] = 3 > >>>a > array([3]) > >>>a[(0,)] = 4 > >>>a > array([4]) > >>>a[: > array([1]) > > > Hope I helped. > > > Cheers, > N.Maniteja. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From josef.pktd at gmail.com Mon Jan 5 10:48:55 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 5 Jan 2015 10:48:55 -0500 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: <34E233C6626E440F136224C0@Ordinateur-de-Catherine--Konrad.local> References: <54A968C6.1040502@fastmail.net> <54AA1399.4000201@fastmail.net> <1893055725442139350.794591sturla.molden-gmail.com@news.gmane.org> <34E233C6626E440F136224C0@Ordinateur-de-Catherine--Konrad.local> Message-ID: On Mon, Jan 5, 2015 at 4:08 AM, Konrad Hinsen wrote: > --On 5 janvier 2015 08:43:45 +0000 Sturla Molden > wrote: > > > To me it seems that algorithms in scientific papers and books are > > described in various forms of pseudo-code. > > That's indeed what people do when they write a paper about an algorithm. > But many if not most algorithms in computational science are never > published in a specific article. Very often, a scientific article gives > only an outline of a method in plain English. The only full documentation > of the method is the implementation. > > > Perhaps we need a notation > > which is universal and ethernal like the language mathematics. But I am > > not sure Python could or should try to be that "scripting" language. > > Neither Python nor any other programming was designed for that task, and > none of them is really a good fit. But today's de facto situation is that > programming languages fulfill the role of algorithmic specification > languages in computational science. And I don't expect this to change > rapidly, in particular because to the best of my knowledge there is no > better choice available at the moment. > > I wrote an article on this topic that will appear in the March 2015 issue > of "Computing in Science and Engineering". It concludes that for now, a > simple Python script is probably the best you can do for an executable > specification of an algorithm. However, I also recommend not using big > libraries such as NumPy in such scripts. > > > I also think it is reasonable to ask if journals should require code as > > algorithmic documentation to be written in some ISO standard language > like > > C or Fortran 90. The behavior of Python and NumPy are not dictated by > > standards, and as such is not better than pseudo-code. > > True, but the ISO specifications of C and Fortran have so many holes > ("undefined behavior") that they are not really much better for the job. > And again, we can't ignore the reality of the de facto use today: there are > no such requirements or even guidelines, so Python scripts are often the > best we have as algorithmic documentation. > Matlab is more "well defined" than numpy. numpy has too many features. I think, if you want a runnable python script as algorithmic documentation, then it will be necessary and relatively easy in most cases to stick to the "stable" basic features. The same for a library, if we want to minimize compatibility problems, then we shouldn't use features that are most likely a moving target. One of the issues is whether we want to write "safe" or "fancy" code. (Fancy code might or will be faster, with a specific version.) For example in most of my use cases having a view or copy of an array makes a difference to the performance but not the results. I didn't participate in the `diagonal` debate because I don't have a strong opinion and don't use it with an assignment. There is an explicit np.fill_diagonal that is inplace. Having views or copies of arrays never sounded like having a clear cut answer, there are too many functions that "return views if possible". When our (statsmodels) code correctness depends on whether it's a view or copy, then we usually make sure and write the matching unit tests. Other cases, the behavior of numpy in edge cases like empty arrays is still in flux. We usually try to avoid relying on implicit behavior. Dtypes are a mess (in terms of code compatibility). Matlab is much nicer, it's all just doubles. Now pandas and numpy are making object arrays popular and introduce strange things like datetime dtypes, and users think a program written a while ago can handle them. Related compatibility issue python 2 and python 3: For non-string manipulation scientific code the main limitation is to avoid version specific features, and decide when to use lists versus iterators for range, zip, map. Other than that, it looks much simpler to me than expected. Overall I think the current policy of incremental changes in numpy works very well. Statsmodels needs a few minor adjustments in each version. But most of those are for cases where numpy became more strict or where we used a specific behavior in edge cases, AFAIR. One problem with accumulating changes for a larger version change like numpy 2 or 3 or 4 is to decide what changes would require this. Most changes will break some code, if the code requires or uses some exotic or internal behavior. If we want to be strict, then we don't change the policy but change the version numbers, instead of 1.8, 1.9 we have numpy 18 and numpy 19. However, from my perspective none of the recent changes were fundamental enough. BTW: Stata is versioning scripts. Each script can define for which version of Stata it was written, but I have no idea how they handle the compatibility issues. It looks to me that it would be way too much work to do something like this in an open source project. Legacy cleanups like removal of numeric compatibility in numpy or weave (and maxentropy) in scipy have been announced for a long time, and eventually all legacy code needs to run in a legacy environment. But that's a different issue from developing numpy and the current scientific python related packages which need the improvements. It is always possible just to "freeze" a package, with it's own frozen python and frozen versions of dependencies. Josef > > Konrad. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Mon Jan 5 11:13:36 2015 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 05 Jan 2015 11:13:36 -0500 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: References: <54A968C6.1040502@fastmail.net> <54AA1399.4000201@fastmail.net> <1893055725442139350.794591sturla.molden-gmail.com@news.gmane.org> <34E233C6626E440F136224C0@Ordinateur-de-Catherine--Konrad.local> Message-ID: <54AAB830.2040409@gmail.com> On 1/5/2015 10:48 AM, josef.pktd at gmail.com wrote: > Dtypes are a mess (in terms of code compatibility). Matlab is much nicer, it's all just doubles. 1. Thank goodness for dtypes. 2. http://www.mathworks.com/help/matlab/numeric-types.html 3. After translating Matlab code to much nicer NumPy, I cannot find any way to say MATLAB is "nicer". Cheers, Alan From josef.pktd at gmail.com Mon Jan 5 12:26:39 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 5 Jan 2015 12:26:39 -0500 Subject: [Numpy-discussion] The future of ndarray.diagonal() In-Reply-To: <54AAB830.2040409@gmail.com> References: <54A968C6.1040502@fastmail.net> <54AA1399.4000201@fastmail.net> <1893055725442139350.794591sturla.molden-gmail.com@news.gmane.org> <34E233C6626E440F136224C0@Ordinateur-de-Catherine--Konrad.local> <54AAB830.2040409@gmail.com> Message-ID: On Mon, Jan 5, 2015 at 11:13 AM, Alan G Isaac wrote: > On 1/5/2015 10:48 AM, josef.pktd at gmail.com wrote: > > Dtypes are a mess (in terms of code compatibility). Matlab is much > nicer, it's all just doubles. > > > 1. Thank goodness for dtypes. > 2. http://www.mathworks.com/help/matlab/numeric-types.html > 3. After translating Matlab code to much nicer NumPy, > I cannot find any way to say MATLAB is "nicer". > Maybe it's my selection bias in matlab, I only wrote or read code in matlab that used exclusively double. Of course they are a necessary and great feature. However, life and code would be simpler if we could just do x = np.asarray(x, float) or even x = np.array(x, float) at the beginning of every function, instead of worrying why a user doesn't have float and trying to accommodate that choice. https://github.com/statsmodels/statsmodels/search?q=dtype&type=Issues&utf8=%E2%9C%93 AFAIK, matlab and R still have copy on write, so they don't have to worry about inplace modifications. 5 lines of code to implement an algorithm, and 50 lines of code for input checking. My response was to the issue of code as algorithmic documentation: There are packages or code supplements to books that come with the disclaimer that the code is written for educational purposes, to help understand the algorithm, but is not designed for efficiency or performance or generality. The more powerful the language and the "fancier" the code, the larger is the maintenance and wrapping work. another example: a dot product of a float/double 2d array is independent of any numpy version, and it will produce the same result in numpy 19.0 (except for different machine precision rounding errors) a dot product of an array (without dtype and shape restriction) might be anything and change within a few numpy versions. Josef > > Cheers, > Alan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjwilliams43 at gmail.com Mon Jan 5 13:40:44 2015 From: cjwilliams43 at gmail.com (Colin J. Williams) Date: Mon, 5 Jan 2015 13:40:44 -0500 Subject: [Numpy-discussion] Characteristic of a Matrix. Message-ID: One of the essential characteristics of a matrix is that it be rectangular. This is neither spelt out or checked currently. The Doc description refers to a class: - *class *numpy.matrix[source] Returns a matrix from an array-like object, or from a string of data. A matrix is a specialized 2-D array that retains its 2-D nature through operations. It has certain special operators, such as * (matrix multiplication) and ** (matrix power). This illustrates a failure, which is reported later in the calculation: A2= np.matrix([[1, 2, -2], [-3, -1, 4], [4, 2 -6]]) Here 2 - 6 is treated as an expression. Wikipedia offers: In mathematics , a *matrix* (plural *matrices*) is a rectangular *array *[1] of numbers , symbols , or expressions , arranged in *rows * and *columns *.[2] [3] The individual items in a matrix are called its *elements* or *entries*. An example of a matrix with 2 rows and 3 columns is [image: \begin{bmatrix}1 & 9 & -13 \\20 & 5 & -6 \end{bmatrix}.]In the Numpy context, the symbols or expressions need to be evaluable. Colin W. -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Mon Jan 5 13:56:41 2015 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Mon, 5 Jan 2015 20:56:41 +0200 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: References: Message-ID: On 5 January 2015 at 20:40, Colin J. Williams wrote: > This illustrates a failure, which is reported later in the calculation: > > A2= np.matrix([[1, 2, -2], [-3, -1, 4], [4, 2 -6]]) > > Here 2 - 6 is treated as an expression. > There should be a comma between 2 and -6. The rectangularity is checked, and in this case, it is not fulfilled. As such, NumPy creates a square matrix of size 1x1 of dtype object. If you want to make sure what you have manually inputed is correct, you should include a couple of assertions afterwards. /David. -------------- next part -------------- An HTML attachment was scrubbed... URL: From e.antero.tammi at gmail.com Mon Jan 5 13:57:43 2015 From: e.antero.tammi at gmail.com (eat) Date: Mon, 5 Jan 2015 20:57:43 +0200 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: References: Message-ID: Hi, On Mon, Jan 5, 2015 at 8:40 PM, Colin J. Williams wrote: > One of the essential characteristics of a matrix is that it be rectangular. > > This is neither spelt out or checked currently. > > The Doc description refers to a class: > > - *class *numpy.matrix[source] > > > Returns a matrix from an array-like object, or from a string of data. A > matrix is a specialized 2-D array that retains its 2-D > nature through operations. It has certain special operators, such as * > (matrix multiplication) and ** (matrix power). > > This illustrates a failure, which is reported later in the calculation: > > A2= np.matrix([[1, 2, -2], [-3, -1, 4], [4, 2 -6]]) > > Here 2 - 6 is treated as an expression. > FWIW, here A2 is definitely rectangular, with shape== (1, 3) and dtype== object, i.e elements are just python lists. > Wikipedia offers: > > In mathematics , a *matrix* > (plural *matrices*) is a rectangular > *array > *[1] > of > numbers , symbols > , or expressions > , arranged in *rows > * and *columns > *.[2] > [3] > > (and in this context also python objects). -eat > The individual items in a matrix are called its *elements* or *entries*. > An example of a matrix with 2 rows and 3 columns is > [image: \begin{bmatrix}1 & 9 & -13 \\20 & 5 & -6 \end{bmatrix}.]In the > Numpy context, the symbols or expressions need to be evaluable. > > Colin W. > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Jan 5 13:58:25 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 5 Jan 2015 18:58:25 +0000 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: References: Message-ID: I'm afraid that I really don't understand what you're trying to say. Is there something that you think numpy should be doing differently? On Mon, Jan 5, 2015 at 6:40 PM, Colin J. Williams wrote: > One of the essential characteristics of a matrix is that it be rectangular. > > This is neither spelt out or checked currently. > > The Doc description refers to a class: > > - *class *numpy.matrix[source] > > > Returns a matrix from an array-like object, or from a string of data. A > matrix is a specialized 2-D array that retains its 2-D > nature through operations. It has certain special operators, such as * > (matrix multiplication) and ** (matrix power). > > This illustrates a failure, which is reported later in the calculation: > > A2= np.matrix([[1, 2, -2], [-3, -1, 4], [4, 2 -6]]) > > Here 2 - 6 is treated as an expression. > > Wikipedia offers: > > In mathematics , a *matrix* > (plural *matrices*) is a rectangular > *array > *[1] > of > numbers , symbols > , or expressions > , arranged in *rows > * and *columns > *.[2] > [3] > The > individual items in a matrix are called its *elements* or *entries*. An > example of a matrix with 2 rows and 3 columns is > [image: \begin{bmatrix}1 & 9 & -13 \\20 & 5 & -6 \end{bmatrix}.]In the > Numpy context, the symbols or expressions need to be evaluable. > > Colin W. > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Mon Jan 5 14:16:54 2015 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Mon, 5 Jan 2015 14:16:54 -0500 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: References: Message-ID: On Mon, Jan 5, 2015 at 1:58 PM, Nathaniel Smith wrote: > I'm afraid that I really don't understand what you're trying to say. Is > there something that you think numpy should be doing differently? > > This is a case similar to the issue discussed in https://github.com/numpy/numpy/issues/5303. Instead of getting an error (because the arguments don't create the expected 2-d matrix), a matrix with dtype object and shape (1, 3) is created. Warren > On Mon, Jan 5, 2015 at 6:40 PM, Colin J. Williams > wrote: > >> One of the essential characteristics of a matrix is that it be >> rectangular. >> >> This is neither spelt out or checked currently. >> >> The Doc description refers to a class: >> >> - *class *numpy.matrix[source] >> >> >> Returns a matrix from an array-like object, or from a string of data. A >> matrix is a specialized 2-D array that retains its 2-D >> nature through operations. It has certain special operators, such as * >> (matrix multiplication) and ** (matrix power). >> >> This illustrates a failure, which is reported later in the calculation: >> >> A2= np.matrix([[1, 2, -2], [-3, -1, 4], [4, 2 -6]]) >> >> Here 2 - 6 is treated as an expression. >> >> Wikipedia offers: >> >> In mathematics , a *matrix* >> (plural *matrices*) is a rectangular >> *array >> *[1] >> of >> numbers , symbols >> , or expressions >> , arranged in *rows >> * and *columns >> *.[2] >> [3] >> The >> individual items in a matrix are called its *elements* or *entries*. An >> example of a matrix with 2 rows and 3 columns is >> [image: \begin{bmatrix}1 & 9 & -13 \\20 & 5 & -6 \end{bmatrix}.]In the >> Numpy context, the symbols or expressions need to be evaluable. >> >> Colin W. >> >> >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Jan 5 14:18:06 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 5 Jan 2015 14:18:06 -0500 Subject: [Numpy-discussion] Characteristic of a Matrix. In-Reply-To: References: Message-ID: On Mon, Jan 5, 2015 at 1:58 PM, Nathaniel Smith wrote: > I'm afraid that I really don't understand what you're trying to say. Is > there something that you think numpy should be doing differently? > I liked it better when this raised an exception, instead of creating a rectangular object array. Josef > > On Mon, Jan 5, 2015 at 6:40 PM, Colin J. Williams > wrote: > >> One of the essential characteristics of a matrix is that it be >> rectangular. >> >> This is neither spelt out or checked currently. >> >> The Doc description refers to a class: >> >> - *class *numpy.matrix[source] >> >> >> Returns a matrix from an array-like object, or from a string of data. A >> matrix is a specialized 2-D array that retains its 2-D >> nature through operations. It has certain special operators, such as * >> (matrix multiplication) and ** (matrix power). >> >> This illustrates a failure, which is reported later in the calculation: >> >> A2= np.matrix([[1, 2, -2], [-3, -1, 4], [4, 2 -6]]) >> >> Here 2 - 6 is treated as an expression. >> >> Wikipedia offers: >> >> In mathematics , a *matrix* >> (plural *matrices*) is a rectangular >> *array >> *[1] >> of >> numbers , symbols >> , or expressions >> , arranged in *rows >> * and *columns >> *.[2] >> [3] >>