From david.froger at gmail.com Sat Oct 1 12:21:43 2011 From: david.froger at gmail.com (David Froger) Date: Sat, 01 Oct 2011 18:21:43 +0200 Subject: [Numpy-discussion] iterate over multiple arrays In-Reply-To: References: <1315810245-sup-1371@david-desktop> <1315896408-sup-2039@david-desktop> Message-ID: <1317485908-sup-8346@david-desktop> Thanks everybody for the different solutions proposed, I really appreciate. What about this solution? So simple that I didn't think to it... import numpy as np from numpy import * def f(arr): return arr*2 a = array( [1,1,1] ) b = array( [2,2,2] ) c = array( [3,3,3] ) d = array( [4,4,4] ) for x in (a,b,c,d): x[:] = x[:]*2 #instead of: x = x*2 print a print b print c print d -- From shish at keba.be Sat Oct 1 13:34:44 2011 From: shish at keba.be (Olivier Delalleau) Date: Sat, 1 Oct 2011 13:34:44 -0400 Subject: [Numpy-discussion] iterate over multiple arrays In-Reply-To: <1317485908-sup-8346@david-desktop> References: <1315810245-sup-1371@david-desktop> <1315896408-sup-2039@david-desktop> <1317485908-sup-8346@david-desktop> Message-ID: It'll work, it is equivalent to the suggestion I made in my previous post with the f_inplace wrapper function (and it has the same drawback that numpy will allocate temporary memory, which wouldn't be the case if f was working in-place directly, by implementing it as "arr *= 2"). Note that you don't need to write x[:] * 2, you can write x * 2 directly. -=- Olivier 2011/10/1 David Froger > > Thanks everybody for the different solutions proposed, I really appreciate. > What about this solution? So simple that I didn't think to it... > > import numpy as np > from numpy import * > > def f(arr): > return arr*2 > > a = array( [1,1,1] ) > b = array( [2,2,2] ) > c = array( [3,3,3] ) > d = array( [4,4,4] ) > > for x in (a,b,c,d): > x[:] = x[:]*2 > #instead of: x = x*2 > > print a > print b > print c > print d > -- > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Oct 1 15:11:32 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 1 Oct 2011 13:11:32 -0600 Subject: [Numpy-discussion] iterate over multiple arrays In-Reply-To: References: <1315810245-sup-1371@david-desktop> <1315896408-sup-2039@david-desktop> <1317485908-sup-8346@david-desktop> Message-ID: On Sat, Oct 1, 2011 at 11:34 AM, Olivier Delalleau wrote: > It'll work, it is equivalent to the suggestion I made in my previous post > with the f_inplace wrapper function (and it has the same drawback that numpy > will allocate temporary memory, which wouldn't be the case if f was working > in-place directly, by implementing it as "arr *= 2"). > > Note that you don't need to write x[:] * 2, you can write x * 2 directly. > Or even x *= 2 Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Sat Oct 1 16:45:27 2011 From: jsalvati at u.washington.edu (John Salvatier) Date: Sat, 1 Oct 2011 13:45:27 -0700 Subject: [Numpy-discussion] nditer: possible to manually handle dimensions with different lengths? In-Reply-To: References: Message-ID: I apologize, I picked a poor example of what I want to do. Your suggestion would work for the example I provided, but not for a more complex example. My actual task is something like a "group by" operation along a particular axis (with a known number of groups). Let me try again: What I would like to be able to do is to specify some of the iterator dimensions to be handled manually by me. For example lets say I have some kind of a 2d smoothing algorithm. If I start with an array of shape [a,b,c,d] and I'd like to do the 2d smoothing over the 2nd and 3rd dimensions, I'd like to be able to tell nditer to do normal broadcasting and iteration over the 1st and 4th dimensions but leave iteration over the 2nd and 3rd dimensions to me and my algorithm. Each iteration of nditer would give me a 2d array to which I apply my algorithm. This way I could write more arbitrary functions that operate on arrays and support broadcasting. Is clearer? On Fri, Sep 30, 2011 at 5:04 PM, Mark Wiebe wrote: > On Fri, Sep 30, 2011 at 8:03 AM, John Salvatier > wrote: > >> Using nditer, is it possible to manually handle dimensions with different >> lengths? >> >> For example, lets say I had an array A[5, 100] and I wanted to sample >> every 10 along the second axis so I would end up with an array B[5,10]. Is >> it possible to do this with nditer, handling the iteration over the second >> axis manually of course (probably in cython)? >> >> I want something like this (modified from >> http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#putting-the-inner-loop-in-cython >> ) >> >> @cython.boundscheck(False) >> def sum_squares_cy(arr): >> cdef np.ndarray[double] x >> cdef np.ndarray[double] y >> cdef int size >> cdef double value >> cdef int j >> >> axeslist = list(arr.shape) >> axeslist[1] = -1 >> >> out = zeros((arr.shape[0], 10)) >> it = np.nditer([arr, out], flags=['reduce_ok', 'external_loop', >> 'buffered', 'delay_bufalloc'], >> op_flags=[['readonly'], ['readwrite', 'no_broadcast']], >> op_axes=[None, axeslist], >> op_dtypes=['float64', 'float64']) >> it.operands[1][...] = 0 >> it.reset() >> for xarr, yarr in it: >> x = xarr >> y = yarr >> size = x.shape[0] >> j = 0 >> for i in range(size): >> #some magic here involving indexing into x[i] and y[j] >> return it.operands[1] >> >> Does this make sense? Is it possible to do? >> > > I'm not sure I understand precisely what you're asking. Maybe you could > reshape A to have shape [5, 10, 10], so that one of those 10's can match up > with the 10 in B, perhaps with the op_axes? > > -Mark > > >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Sat Oct 1 17:53:18 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sat, 1 Oct 2011 14:53:18 -0700 Subject: [Numpy-discussion] nditer: possible to manually handle dimensions with different lengths? In-Reply-To: References: Message-ID: On Sat, Oct 1, 2011 at 1:45 PM, John Salvatier wrote: > I apologize, I picked a poor example of what I want to do. Your suggestion > would work for the example I provided, but not for a more complex example. > My actual task is something like a "group by" operation along a particular > axis (with a known number of groups). > > Let me try again: What I would like to be able to do is to specify some of > the iterator dimensions to be handled manually by me. For example lets say I > have some kind of a 2d smoothing algorithm. If I start with an array of > shape [a,b,c,d] and I'd like to do the 2d smoothing over the 2nd and 3rd > dimensions, I'd like to be able to tell nditer to do normal broadcasting and > iteration over the 1st and 4th dimensions but leave iteration over the 2nd > and 3rd dimensions to me and my algorithm. Each iteration of nditer would > give me a 2d array to which I apply my algorithm. This way I could write > more arbitrary functions that operate on arrays and support broadcasting. > > Is clearer? > Maybe this will work for you: In [15]: a = np.arange(2*3*4*5).reshape(2,3,4,5) In [16]: it0, it1 = np.nested_iters(a, [[0,3], [1,2]], flags=['multi_index']) In [17]: for x in it0: ....: print it1.itviews[0] ....: [[ 0 5 10 15] [20 25 30 35] [40 45 50 55]] [[ 1 6 11 16] [21 26 31 36] [41 46 51 56]] [[ 2 7 12 17] [22 27 32 37] [42 47 52 57]] [[ 3 8 13 18] [23 28 33 38] [43 48 53 58]] [[ 4 9 14 19] [24 29 34 39] [44 49 54 59]] [[ 60 65 70 75] [ 80 85 90 95] [100 105 110 115]] [[ 61 66 71 76] [ 81 86 91 96] [101 106 111 116]] [[ 62 67 72 77] [ 82 87 92 97] [102 107 112 117]] [[ 63 68 73 78] [ 83 88 93 98] [103 108 113 118]] [[ 64 69 74 79] [ 84 89 94 99] [104 109 114 119]] Cheers, Mark > > > On Fri, Sep 30, 2011 at 5:04 PM, Mark Wiebe wrote: > >> On Fri, Sep 30, 2011 at 8:03 AM, John Salvatier < >> jsalvati at u.washington.edu> wrote: >> >>> Using nditer, is it possible to manually handle dimensions with >>> different lengths? >>> >>> For example, lets say I had an array A[5, 100] and I wanted to sample >>> every 10 along the second axis so I would end up with an array B[5,10]. Is >>> it possible to do this with nditer, handling the iteration over the second >>> axis manually of course (probably in cython)? >>> >>> I want something like this (modified from >>> http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#putting-the-inner-loop-in-cython >>> ) >>> >>> @cython.boundscheck(False) >>> def sum_squares_cy(arr): >>> cdef np.ndarray[double] x >>> cdef np.ndarray[double] y >>> cdef int size >>> cdef double value >>> cdef int j >>> >>> axeslist = list(arr.shape) >>> axeslist[1] = -1 >>> >>> out = zeros((arr.shape[0], 10)) >>> it = np.nditer([arr, out], flags=['reduce_ok', 'external_loop', >>> 'buffered', 'delay_bufalloc'], >>> op_flags=[['readonly'], ['readwrite', 'no_broadcast']], >>> op_axes=[None, axeslist], >>> op_dtypes=['float64', 'float64']) >>> it.operands[1][...] = 0 >>> it.reset() >>> for xarr, yarr in it: >>> x = xarr >>> y = yarr >>> size = x.shape[0] >>> j = 0 >>> for i in range(size): >>> #some magic here involving indexing into x[i] and y[j] >>> return it.operands[1] >>> >>> Does this make sense? Is it possible to do? >>> >> >> I'm not sure I understand precisely what you're asking. Maybe you could >> reshape A to have shape [5, 10, 10], so that one of those 10's can match up >> with the 10 in B, perhaps with the op_axes? >> >> -Mark >> >> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Oct 1 22:59:09 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 1 Oct 2011 20:59:09 -0600 Subject: [Numpy-discussion] should the return type of matlib.reshape be ndarray or matrix? In-Reply-To: <4E83E16F.7050304@gmail.com> References: <4E83E16F.7050304@gmail.com> Message-ID: On Wed, Sep 28, 2011 at 9:09 PM, Alan G Isaac wrote: > Is this the intended behavior? > > >>> from numpy import matlib > >>> m = matlib.reshape([1,2],(2,1)) > >>> type(m) > > > For any 2d shape, I expected a matrix. > (And probably an exception if the shape is not 2d.) > > I think you are right. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From millman at berkeley.edu Sun Oct 2 14:01:14 2011 From: millman at berkeley.edu (Jarrod Millman) Date: Sun, 2 Oct 2011 11:01:14 -0700 Subject: [Numpy-discussion] [ANN] SciPy India 2011 Call for Presentations Message-ID: =============================== SciPy 2011 India Call for Papers =============================== The third `SciPy India Conference `_ will be held from December 4th through the 7th at the `Indian Institute of Technology, Bombay (IITB) `_ in Mumbai, Maharashtra India. At this conference, novel applications and breakthroughs made in the pursuit of science using Python are presented. Attended by leading figures from both academia and industry, it is an excellent opportunity to experience the cutting edge of scientific software development. The conference is followed by two days of tutorials and a code sprint, during which community experts provide training on several scientific Python packages. We invite you to take part by submitting a talk abstract on the conference website at: http://scipy.in Talk/Paper Submission --------------------- We solicit talks and accompanying papers (either formal academic or magazine-style articles) that discuss topics regarding scientific computing using Python, including applications, teaching, development and research. We welcome contributions from academia as well as industry. Keynote Speaker --------------- Eric Jones will deliver the keynote address this year. Eric has a broad background in engineering and software development and leads Enthought's product engineering and software design. Prior to co-founding Enthought, Eric worked with numerical electromagnetics and genetic optimization in the Department of Electrical Engineering at Duke University. He has taught numerous courses on the use of Python for scientific computing and serves as a member of the Python Software Foundation. He holds M.S. and Ph.D. degrees from Duke University in electrical engineering and a B.S.E. in mechanical engineering from Baylor University. Eric was the Keynote Speaker at SciPy US 2011. Important Dates --------------- October 26, 2011, Wednesday: Abstracts Due October 31, 2011, Monday: Schedule announced November 21, 2011, Monday: Proceedings paper submission due December 4-5, 2011, Sunday-Monday: Conference December 6-7 2011, Tuesday-Wednesday: Tutorials/Sprints Organizers ---------- * Jarrod Millman, Neuroscience Institute, UC Berkeley, USA (Conference Co-Chair) * Prabhu Ramachandran, Department of Aerospace Engineering, IIT Bombay, India (Conference Co-Chair) * FOSSEE Team From ater1980 at gmail.com Mon Oct 3 08:22:01 2011 From: ater1980 at gmail.com (Alex Ter-Sarkissov) Date: Mon, 3 Oct 2011 16:22:01 +0400 Subject: [Numpy-discussion] (no subject) Message-ID: I got a problem running NumPy in Eclipse. I recently installed PyDev, but after downloading NumPy the installation attempt failed since python 2.6 was not found. I've installed Python 2.7. Do I need to replace it with Python 2.6? -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Mon Oct 3 08:48:16 2011 From: shish at keba.be (Olivier Delalleau) Date: Mon, 3 Oct 2011 08:48:16 -0400 Subject: [Numpy-discussion] (no subject) In-Reply-To: References: Message-ID: Sounds like you need to re-download NumPy, but the version for Python 2.7. -=- Olivier 2011/10/3 Alex Ter-Sarkissov > I got a problem running NumPy in Eclipse. I recently installed PyDev, but > after downloading NumPy the installation attempt failed since python 2.6 was > not found. I've installed Python 2.7. Do I need to replace it with Python > 2.6? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Mon Oct 3 12:03:14 2011 From: jsalvati at u.washington.edu (John Salvatier) Date: Mon, 3 Oct 2011 09:03:14 -0700 Subject: [Numpy-discussion] nditer: possible to manually handle dimensions with different lengths? In-Reply-To: References: Message-ID: Thanks mark! I think that's exactly what I'm looking for. We even had a previous discussion about this (oops!) ( http://mail.scipy.org/pipermail/numpy-discussion/2011-January/054421.html). I didn't find any documentation, I will try to add some once I understand how it works better. John On Sat, Oct 1, 2011 at 2:53 PM, Mark Wiebe wrote: > On Sat, Oct 1, 2011 at 1:45 PM, John Salvatier wrote: > >> I apologize, I picked a poor example of what I want to do. Your suggestion >> would work for the example I provided, but not for a more complex example. >> My actual task is something like a "group by" operation along a particular >> axis (with a known number of groups). >> >> Let me try again: What I would like to be able to do is to specify some of >> the iterator dimensions to be handled manually by me. For example lets say I >> have some kind of a 2d smoothing algorithm. If I start with an array of >> shape [a,b,c,d] and I'd like to do the 2d smoothing over the 2nd and 3rd >> dimensions, I'd like to be able to tell nditer to do normal broadcasting and >> iteration over the 1st and 4th dimensions but leave iteration over the 2nd >> and 3rd dimensions to me and my algorithm. Each iteration of nditer would >> give me a 2d array to which I apply my algorithm. This way I could write >> more arbitrary functions that operate on arrays and support broadcasting. >> >> Is clearer? >> > > Maybe this will work for you: > > In [15]: a = np.arange(2*3*4*5).reshape(2,3,4,5) > > In [16]: it0, it1 = np.nested_iters(a, [[0,3], [1,2]], > flags=['multi_index']) > > In [17]: for x in it0: > ....: print it1.itviews[0] > ....: > [[ 0 5 10 15] > [20 25 30 35] > [40 45 50 55]] > [[ 1 6 11 16] > [21 26 31 36] > [41 46 51 56]] > [[ 2 7 12 17] > [22 27 32 37] > [42 47 52 57]] > [[ 3 8 13 18] > [23 28 33 38] > [43 48 53 58]] > [[ 4 9 14 19] > [24 29 34 39] > [44 49 54 59]] > [[ 60 65 70 75] > [ 80 85 90 95] > [100 105 110 115]] > [[ 61 66 71 76] > [ 81 86 91 96] > [101 106 111 116]] > [[ 62 67 72 77] > [ 82 87 92 97] > [102 107 112 117]] > [[ 63 68 73 78] > [ 83 88 93 98] > [103 108 113 118]] > [[ 64 69 74 79] > [ 84 89 94 99] > [104 109 114 119]] > > Cheers, > Mark > > > > >> >> >> On Fri, Sep 30, 2011 at 5:04 PM, Mark Wiebe wrote: >> >>> On Fri, Sep 30, 2011 at 8:03 AM, John Salvatier < >>> jsalvati at u.washington.edu> wrote: >>> >>>> Using nditer, is it possible to manually handle dimensions with >>>> different lengths? >>>> >>>> For example, lets say I had an array A[5, 100] and I wanted to sample >>>> every 10 along the second axis so I would end up with an array B[5,10]. Is >>>> it possible to do this with nditer, handling the iteration over the second >>>> axis manually of course (probably in cython)? >>>> >>>> I want something like this (modified from >>>> http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#putting-the-inner-loop-in-cython >>>> ) >>>> >>>> @cython.boundscheck(False) >>>> def sum_squares_cy(arr): >>>> cdef np.ndarray[double] x >>>> cdef np.ndarray[double] y >>>> cdef int size >>>> cdef double value >>>> cdef int j >>>> >>>> axeslist = list(arr.shape) >>>> axeslist[1] = -1 >>>> >>>> out = zeros((arr.shape[0], 10)) >>>> it = np.nditer([arr, out], flags=['reduce_ok', 'external_loop', >>>> 'buffered', 'delay_bufalloc'], >>>> op_flags=[['readonly'], ['readwrite', 'no_broadcast']], >>>> op_axes=[None, axeslist], >>>> op_dtypes=['float64', 'float64']) >>>> it.operands[1][...] = 0 >>>> it.reset() >>>> for xarr, yarr in it: >>>> x = xarr >>>> y = yarr >>>> size = x.shape[0] >>>> j = 0 >>>> for i in range(size): >>>> #some magic here involving indexing into x[i] and y[j] >>>> return it.operands[1] >>>> >>>> Does this make sense? Is it possible to do? >>>> >>> >>> I'm not sure I understand precisely what you're asking. Maybe you could >>> reshape A to have shape [5, 10, 10], so that one of those 10's can match up >>> with the 10 in B, perhaps with the op_axes? >>> >>> -Mark >>> >>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yanghatespam at gmail.com Mon Oct 3 15:42:44 2011 From: yanghatespam at gmail.com (Yang Zhang) Date: Mon, 3 Oct 2011 12:42:44 -0700 Subject: [Numpy-discussion] Long-standing issue with using numpy in embedded CPython Message-ID: It turns out that there's a long-standing problem in numpy that prevents it from being used in embedded CPython environments: http://stackoverflow.com/questions/7592565/when-embedding-cpython-in-java-why-does-this-hang/7630992#7630992 http://mail.scipy.org/pipermail/numpy-discussion/2009-July/044046.html Is there any fix or workaround for this? Thanks. -- Yang Zhang http://yz.mit.edu/ From shish at keba.be Mon Oct 3 15:51:05 2011 From: shish at keba.be (Olivier Delalleau) Date: Mon, 3 Oct 2011 15:51:05 -0400 Subject: [Numpy-discussion] Long-standing issue with using numpy in embedded CPython In-Reply-To: References: Message-ID: As far as a workaround in concerned, that scipy archive post says you can disable threads in numpy. Sorry can't help more, I don't know much about how to bypass such GIL issues. -=- Olivier 2011/10/3 Yang Zhang > It turns out that there's a long-standing problem in numpy that > prevents it from being used in embedded CPython environments: > > > http://stackoverflow.com/questions/7592565/when-embedding-cpython-in-java-why-does-this-hang/7630992#7630992 > http://mail.scipy.org/pipermail/numpy-discussion/2009-July/044046.html > Is there any fix or workaround for this? Thanks. > -- > Yang Zhang > http://yz.mit.edu/ > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Mon Oct 3 17:03:02 2011 From: jsalvati at u.washington.edu (John Salvatier) Date: Mon, 3 Oct 2011 14:03:02 -0700 Subject: [Numpy-discussion] nditer: possible to manually handle dimensions with different lengths? In-Reply-To: References: Message-ID: Some observations and questions about nested_iters. Nested_iters seems to require that all input arrays have the same number of dimensions (so you will have to pad some input shapes with 1s). Is there a way to specify how the axes line are matched together like for nditer? When I try to run the following program, @cython.boundscheck(False) def vars(vals, group, axis ): cdef np.ndarray[double, ndim = 2] values cdef np.ndarray[long long, ndim = 2] groups cdef np.ndarray[double, ndim = 2] outs cdef int size cdef double value cdef int i, j cdef long long cgroup cdef double min cdef double max cdef double open oshape = list(vals.shape) bins = len(np.unique(group)) oshape = oshape+[bins] oshape[axis] = 1 out = np.empty(tuple(oshape)) axes = range(vals.ndim) axes.remove(axis) gshape = [1] * len(oshape) gshape[axis] = len(group) group.shape = gshape vals = vals[...,np.newaxis] it0, it1 = np.nested_iters([vals,group, out], [axes, [axis,len(oshape) -1]], op_dtypes=['float64', 'int64', 'float64'], flags = ['multi_index', 'buffered']) size = vals.shape[axis] for x in it0: values, groups, outs = it1.itviews j = -1 for i in range(size): if cgroup != groups[i,0]: if j != -1: outs[0,j] = garmanklass(open, values[i,0], min, max) cgroup = groups[i,0] min = inf max = -inf open = values[i,0] j += 1 min = fmin(min, values[i,0]) max = fmax(max, values[i,0]) outs[0,j+1] = garmanklass(open, values[size -1], min, max) return out I get an error File "comp.pyx", line 58, in varscale.comp.vars (varscale\comp.c:1565) values, groups, outs = it1.itviews ValueError: cannot provide an iterator view when buffering is enabled Which I am not sure how to deal with. Any advice? What I am trying to do here is to do a "grouped" calculation (the group specified by the group argument) on the values along the given axis. I try to use nested_iter to iterate over the specified axis and a new axis (the length of the number of groups) separately so I can do my calculation. On Mon, Oct 3, 2011 at 9:03 AM, John Salvatier wrote: > Thanks mark! I think that's exactly what I'm looking for. We even had a > previous discussion about this (oops!) ( > http://mail.scipy.org/pipermail/numpy-discussion/2011-January/054421.html > ). > > I didn't find any documentation, I will try to add some once I understand > how it works better. > > John > > > On Sat, Oct 1, 2011 at 2:53 PM, Mark Wiebe wrote: > >> On Sat, Oct 1, 2011 at 1:45 PM, John Salvatier > > wrote: >> >>> I apologize, I picked a poor example of what I want to do. Your >>> suggestion would work for the example I provided, but not for a more complex >>> example. My actual task is something like a "group by" operation along a >>> particular axis (with a known number of groups). >>> >>> Let me try again: What I would like to be able to do is to specify some >>> of the iterator dimensions to be handled manually by me. For example lets >>> say I have some kind of a 2d smoothing algorithm. If I start with an array >>> of shape [a,b,c,d] and I'd like to do the 2d smoothing over the 2nd and 3rd >>> dimensions, I'd like to be able to tell nditer to do normal broadcasting and >>> iteration over the 1st and 4th dimensions but leave iteration over the 2nd >>> and 3rd dimensions to me and my algorithm. Each iteration of nditer would >>> give me a 2d array to which I apply my algorithm. This way I could write >>> more arbitrary functions that operate on arrays and support broadcasting. >>> >>> Is clearer? >>> >> >> Maybe this will work for you: >> >> In [15]: a = np.arange(2*3*4*5).reshape(2,3,4,5) >> >> In [16]: it0, it1 = np.nested_iters(a, [[0,3], [1,2]], >> flags=['multi_index']) >> >> In [17]: for x in it0: >> ....: print it1.itviews[0] >> ....: >> [[ 0 5 10 15] >> [20 25 30 35] >> [40 45 50 55]] >> [[ 1 6 11 16] >> [21 26 31 36] >> [41 46 51 56]] >> [[ 2 7 12 17] >> [22 27 32 37] >> [42 47 52 57]] >> [[ 3 8 13 18] >> [23 28 33 38] >> [43 48 53 58]] >> [[ 4 9 14 19] >> [24 29 34 39] >> [44 49 54 59]] >> [[ 60 65 70 75] >> [ 80 85 90 95] >> [100 105 110 115]] >> [[ 61 66 71 76] >> [ 81 86 91 96] >> [101 106 111 116]] >> [[ 62 67 72 77] >> [ 82 87 92 97] >> [102 107 112 117]] >> [[ 63 68 73 78] >> [ 83 88 93 98] >> [103 108 113 118]] >> [[ 64 69 74 79] >> [ 84 89 94 99] >> [104 109 114 119]] >> >> Cheers, >> Mark >> >> >> >> >>> >>> >>> On Fri, Sep 30, 2011 at 5:04 PM, Mark Wiebe wrote: >>> >>>> On Fri, Sep 30, 2011 at 8:03 AM, John Salvatier < >>>> jsalvati at u.washington.edu> wrote: >>>> >>>>> Using nditer, is it possible to manually handle dimensions with >>>>> different lengths? >>>>> >>>>> For example, lets say I had an array A[5, 100] and I wanted to sample >>>>> every 10 along the second axis so I would end up with an array B[5,10]. Is >>>>> it possible to do this with nditer, handling the iteration over the second >>>>> axis manually of course (probably in cython)? >>>>> >>>>> I want something like this (modified from >>>>> http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#putting-the-inner-loop-in-cython >>>>> ) >>>>> >>>>> @cython.boundscheck(False) >>>>> def sum_squares_cy(arr): >>>>> cdef np.ndarray[double] x >>>>> cdef np.ndarray[double] y >>>>> cdef int size >>>>> cdef double value >>>>> cdef int j >>>>> >>>>> axeslist = list(arr.shape) >>>>> axeslist[1] = -1 >>>>> >>>>> out = zeros((arr.shape[0], 10)) >>>>> it = np.nditer([arr, out], flags=['reduce_ok', 'external_loop', >>>>> 'buffered', 'delay_bufalloc'], >>>>> op_flags=[['readonly'], ['readwrite', 'no_broadcast']], >>>>> op_axes=[None, axeslist], >>>>> op_dtypes=['float64', 'float64']) >>>>> it.operands[1][...] = 0 >>>>> it.reset() >>>>> for xarr, yarr in it: >>>>> x = xarr >>>>> y = yarr >>>>> size = x.shape[0] >>>>> j = 0 >>>>> for i in range(size): >>>>> #some magic here involving indexing into x[i] and y[j] >>>>> return it.operands[1] >>>>> >>>>> Does this make sense? Is it possible to do? >>>>> >>>> >>>> I'm not sure I understand precisely what you're asking. Maybe you could >>>> reshape A to have shape [5, 10, 10], so that one of those 10's can match up >>>> with the 10 in B, perhaps with the op_axes? >>>> >>>> -Mark >>>> >>>> >>>>> >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pengkui.luo at gmail.com Mon Oct 3 18:59:48 2011 From: pengkui.luo at gmail.com (Pengkui Luo) Date: Mon, 3 Oct 2011 17:59:48 -0500 Subject: [Numpy-discussion] should the return type of matlib.reshape be ndarray or matrix? In-Reply-To: <4E83E16F.7050304@gmail.com> References: <4E83E16F.7050304@gmail.com> Message-ID: Most functions in numpy return ndarray by default. Use numpy.asmatrix() if you want a matrix. >>> from numpy import matlib, asmatrix >>> m = matlib.reshape([1,2],(2,1)) >>> type(m) >>> type( asmatrix(m) ) -- Pengkui On Wed, Sep 28, 2011 at 22:09, Alan G Isaac wrote: > Is this the intended behavior? > > >>> from numpy import matlib > >>> m = matlib.reshape([1,2],(2,1)) > >>> type(m) > > > For any 2d shape, I expected a matrix. > (And probably an exception if the shape is not 2d.) > > Thanks, > Alan Isaac > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robince at gmail.com Tue Oct 4 04:28:41 2011 From: robince at gmail.com (Robin) Date: Tue, 4 Oct 2011 10:28:41 +0200 Subject: [Numpy-discussion] Long-standing issue with using numpy in embedded CPython In-Reply-To: References: Message-ID: On Mon, Oct 3, 2011 at 9:42 PM, Yang Zhang wrote: > It turns out that there's a long-standing problem in numpy that > prevents it from being used in embedded CPython environments: Just wanted to make the point for reference that in general Numpy does work fine in (non-threaded) embedded CPython situations, see for example pymex [1] which embeds Python + Numpy in a Matlab mex file and works really well. This seems to a be a problem specific to Jepp. Just wanted to mention it in case it puts someone off trying something unnecessarily in the future. Cheers Robin [1] https://github.com/kw/pymex > > http://stackoverflow.com/questions/7592565/when-embedding-cpython-in-java-why-does-this-hang/7630992#7630992 > http://mail.scipy.org/pipermail/numpy-discussion/2009-July/044046.html > Is there any fix or workaround for this? ?Thanks. > -- > Yang Zhang > http://yz.mit.edu/ > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From andrea.gavana at gmail.com Tue Oct 4 06:04:48 2011 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Tue, 4 Oct 2011 12:04:48 +0200 Subject: [Numpy-discussion] Crash on (un-orthodox) __import__ Message-ID: Hi All, I was fiddling here and there with some code doing dynamic import of stuff, and I noticed that this code: import os import sys init_name = r"C:\Python27\Lib\site-packages\numpy\__init__.py" directory, module_name = os.path.split(init_name) main = os.path.splitext(module_name)[0] sys.path.insert(0, os.path.normpath(directory)) # Crash here... mainmod = __import__(main) Produces a hard crash on Python (i.e., a dialog box with a "python.exe has stopped working" message). I know I am not supposed to import stuff like that, but I was curious to understand why Python should crash in this way. This happens on Python 2.7.2 with Numpy 1.6.1 and Python 2.5.4 with Numpy 1.5.0 Thank you for your suggestions :-D Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ >>> import PyQt4.QtGui Traceback (most recent call last): File "", line 1, in ImportError: No module named PyQt4.QtGui >>> >>> import pygtk Traceback (most recent call last): File "", line 1, in ImportError: No module named pygtk >>> >>> import wx >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From W.Miah at uea.ac.uk Tue Oct 4 06:54:07 2011 From: W.Miah at uea.ac.uk (Miah Wadud Dr (ITCS)) Date: Tue, 4 Oct 2011 11:54:07 +0100 Subject: [Numpy-discussion] build errors Message-ID: Hello numpy users, I am trying to build numpy 1.6.1 and am having problems. It prints the following error message: gcc -pthread -shared build/temp.linux-x86_64-2.4/numpy/core/blasdot/_dotblas.o -L/usr/lib64/atlas -Lbuild/temp.linux-x86_64-2.4 -lptf77blas -lptcblas -latlas -o build/lib.linux-x86_64-2.4/numpy/core/_dotblas.so /usr/bin/ld: /usr/lib64/atlas/libptcblas.a(cblas_dptgemm.o): relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC /usr/lib64/atlas/libptcblas.a: could not read symbols: Bad value collect2: ld returned 1 exit status /usr/bin/ld: /usr/lib64/atlas/libptcblas.a(cblas_dptgemm.o): relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC /usr/lib64/atlas/libptcblas.a: could not read symbols: Bad value collect2: ld returned 1 exit status error: Command "gcc -pthread -shared build/temp.linux-x86_64-2.4/numpy/core/blasdot/_dotblas.o -L/usr/lib64/atlas -Lbuild/temp.linux-x86_64-2.4 -lptf77blas -lptcblas -latlas -o build/lib.linux-x86_64-2.4/numpy/core/_dotblas.so" failed with exit status 1 I do not know anything about the setup.py script, so do not know how to add the -fPIC switch. Any help will be greatly appreciated. Regards, ---------- Wadud Miah, High Performance Computing Systems Developer Research Computing Services, University of East Anglia Web: http://www.uea.ac.uk/~xca10fju/ Telephone: 01603 593856 Information Services ---------- From alan.isaac at gmail.com Tue Oct 4 07:49:44 2011 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 04 Oct 2011 07:49:44 -0400 Subject: [Numpy-discussion] should the return type of matlib.reshape be ndarray or matrix? In-Reply-To: References: <4E83E16F.7050304@gmail.com> Message-ID: <4E8AF2D8.4060408@gmail.com> On 10/3/2011 6:59 PM, Pengkui Luo wrote: > Most functions in numpy return ndarray by default. > Use numpy.asmatrix() if you want a matrix. Please note that the example is using matlib.reshape, not numpy.reshape. Alan Isaac From cournape at gmail.com Tue Oct 4 13:39:34 2011 From: cournape at gmail.com (David Cournapeau) Date: Tue, 4 Oct 2011 18:39:34 +0100 Subject: [Numpy-discussion] build errors In-Reply-To: References: Message-ID: On Tue, Oct 4, 2011 at 11:54 AM, Miah Wadud Dr (ITCS) wrote: > Hello numpy users, > > I am trying to build numpy 1.6.1 and am having problems. It prints the following error message: > > gcc -pthread -shared build/temp.linux-x86_64-2.4/numpy/core/blasdot/_dotblas.o -L/usr/lib64/atlas -Lbuild/temp.linux-x86_64-2.4 -lptf77blas -lptcblas -latlas -o build/lib.linux-x86_64-2.4/numpy/core/_dotblas.so > /usr/bin/ld: /usr/lib64/atlas/libptcblas.a(cblas_dptgemm.o): relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC > /usr/lib64/atlas/libptcblas.a: could not read symbols: Bad value > collect2: ld returned 1 exit status > /usr/bin/ld: /usr/lib64/atlas/libptcblas.a(cblas_dptgemm.o): relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC > /usr/lib64/atlas/libptcblas.a: could not read symbols: Bad value > collect2: ld returned 1 exit status > error: Command "gcc -pthread -shared build/temp.linux-x86_64-2.4/numpy/core/blasdot/_dotblas.o -L/usr/lib64/atlas -Lbuild/temp.linux-x86_64-2.4 -lptf77blas -lptcblas -latlas -o build/lib.linux-x86_64-2.4/numpy/core/_dotblas.so" failed with exit status 1 Did you build Atlas by yourself ? If so, it is most likely not usable for shared libraries (mandatory for any python extension, including bumpy). You need to configure atlas with the option "-Fa alg -fPIC". David From yanghatespam at gmail.com Tue Oct 4 15:05:20 2011 From: yanghatespam at gmail.com (Yang Zhang) Date: Tue, 4 Oct 2011 12:05:20 -0700 Subject: [Numpy-discussion] Long-standing issue with using numpy in embedded CPython In-Reply-To: References: Message-ID: On Tue, Oct 4, 2011 at 1:28 AM, Robin wrote: > On Mon, Oct 3, 2011 at 9:42 PM, Yang Zhang wrote: >> It turns out that there's a long-standing problem in numpy that >> prevents it from being used in embedded CPython environments: > > Just wanted to make the point for reference that in general Numpy does > work fine in (non-threaded) embedded CPython situations, see for > example pymex [1] which embeds Python + Numpy in a Matlab mex file and > works really well. > > This seems to a be a problem specific to Jepp. > > Just wanted to mention it in case it puts someone off trying something > unnecessarily in the future. My (second-hand) understanding is that this is a problem with having multiple CPython interpreters, which both Jepp and numpy utilize, incompatibly - is that right? I.e., if either one were restricted to using a single CPython interpreter, we wouldn't see this problem? I'm curious how to disable threads in numpy (not an ideal solution). Googling seems to point me to setting NPY_ALLOW_THREADS to 0....somewhere. > > Cheers > > Robin > > [1] https://github.com/kw/pymex > >> >> http://stackoverflow.com/questions/7592565/when-embedding-cpython-in-java-why-does-this-hang/7630992#7630992 >> http://mail.scipy.org/pipermail/numpy-discussion/2009-July/044046.html >> Is there any fix or workaround for this? ?Thanks. >> -- >> Yang Zhang >> http://yz.mit.edu/ >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Yang Zhang http://yz.mit.edu/ From newsletters at alexandreleray.com Tue Oct 4 15:10:13 2011 From: newsletters at alexandreleray.com (Alexandre Leray) Date: Tue, 04 Oct 2011 21:10:13 +0200 Subject: [Numpy-discussion] oblique text Message-ID: <4E8B5A15.6000900@alexandreleray.com> Dear all, I'm trying to create oblique texts by reordering their letters. Here is an exemple to illustrate this (to be display in a monospaced font): >>> text = """\ ... This ... is ... a ... test""" >>> print(make_oblique(text)) T i h a s i t s e s t So far I have this: def make_oblique(text): words = text.splitlines() matrix = [[" " for i in xrange(len(words) + len(max(words)))] \ for j in xrange(len(words) + len(max(words)))] for i, word in enumerate(words): for j, letter in enumerate(word): matrix[j + i][j] = letter matrix = map(lambda x: "".join(x), matrix) return "\n".join(matrix) But it is not very flexible. For instance, I'd like to control the "line spacing" (by adding extra spaces in between letters since it isn't real lines). I just found numpy and I have the intuition that it could do the job since it deals with matrices. Am I right? If so how would you do such a thing? Thanks! Alex From pav at iki.fi Tue Oct 4 16:22:48 2011 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 04 Oct 2011 22:22:48 +0200 Subject: [Numpy-discussion] oblique text In-Reply-To: <4E8B5A15.6000900@alexandreleray.com> References: <4E8B5A15.6000900@alexandreleray.com> Message-ID: 04.10.2011 21:10, Alexandre Leray kirjoitti: [clip] > But it is not very flexible. For instance, I'd like to control the "line > spacing" (by adding extra spaces in between letters since it isn't real > lines). I just found numpy and I have the intuition that it could do the > job since it deals with matrices. > > Am I right? If so how would you do such a thing? import numpy as np def make_oblique(words): n = max(2*len(word) for word in words) + len(words) canvas = np.zeros((n, n), dtype='S1') canvas[...] = ' ' # ye mighty FORTRAN, we beseech thee for j, word in enumerate(words): i = np.arange(len(word)) canvas[i+j, 2*i] = list(word) canvas[:,-1] = '\n' return canvas.tostring().rstrip() -- Pauli Virtanen From teoliphant at gmail.com Tue Oct 4 16:58:43 2011 From: teoliphant at gmail.com (Travis Oliphant) Date: Tue, 4 Oct 2011 15:58:43 -0500 Subject: [Numpy-discussion] A Foundation for the support of NumPy and SciPy Message-ID: Hi all, At the recent US SciPy conference and at other times in the past I have been approached about the possibility of creating a foundation to support the development of SciPy and NumPy. I know there are varying opinions about that, but I am generally very supportive of the idea and would like to encourage it as much as I can. It would be interesting to have a public discussion of the issues, but these discussions should not clog the main list of either NumPy or SciPy. As a result, there has been set up a public mailing list for discussion of the creation of a Foundation for the Advancement of Scientific, Technical, and Engineering Computing Using High Level Abstractions (FASTECUHLA). The list is fastecuhla at googlegroups.com This is a place-holder name that can be replaced if somebody comes up with a better one. Please sign up for that list if you would like to contribute to the discussion. The items to discuss include: * where to organize * what the purposes should be * who should be members * where should money come from * what other organizations exist that we could either piggy-back on or emulate * what are the pitfalls on starting a foundation to support NumPy and SciPy versus other approaches * who has time to participate in its organization and maintenance One important feature is that I see this foundation as a service opportunity and obligation and not as a "feather-in-the-cap" or something to join lightly. I'm hopeful that it can be a place where people and organizations can donate money and know that it will be going directly to further the core packages for Scientific Computing with Python. Thank you, -Travis From njs at pobox.com Tue Oct 4 19:36:44 2011 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 4 Oct 2011 16:36:44 -0700 Subject: [Numpy-discussion] A Foundation for the support of NumPy and SciPy In-Reply-To: References: Message-ID: [Does the group actually exist yet? Google says: "No groups match fastecuhla." Replying here instead...] I've been following discussions around non-profit incorporation for FOSS projects for about a decade (including some years on the internal mailing list for SPI Inc. -- Debian's non-profit foundation). My strong recommendation is that we not do it ourselves. Setting up our own non-profit takes an immense amount of energy, and keeping it going requires continuing to jump through annoying hoops on a regular basis (you must have a procedure for selecting a board; the board must meet on some regular schedule, achieve quorum, and regularly elect officers; each board meeting must have minutes produced and approved, you must file taxes on time, ...), and it's expensive to boot (you'll need a professional accountant, etc.). As a result, most projects that try going it on their own end up with a horrible mess sooner or later. It works okay if you're, say, Gnome, but most projects are not Gnome. But fortunately, this is a solved problem: there are several non-profit umbrella corporations that are set up to let experts take care of this nonsense and amortize the costs over multiple projects. The Software Freedom Conservancy is probably the most well put together: http://www.sfconservancy.org/overview/ http://www.sfconservancy.org/members/services/ http://sfconservancy.org/about/board/ Many large projects with complicated legal situations like Samba, Busybox, jQuery, Wine, Boost, ... have also chosen this approach: http://www.sfconservancy.org/members/current/ TL;DR: When it comes to legal matters: starting your own non-profit is to joining an existing umbrella non-profit as CVS is to git. (And in fact git is also a SF Conservancy member.) My $0.02, -- Nathaniel On Tue, Oct 4, 2011 at 1:58 PM, Travis Oliphant wrote: > Hi all, > > At the recent US SciPy conference and at other times in the past I have been approached about the possibility of creating a foundation to support the development of SciPy and NumPy. > > I know there are varying opinions about that, but I am generally very supportive of the idea and would like to encourage it as much as I can. ? It would be interesting to have a public discussion of the issues, but these discussions should not clog the main list of either NumPy or SciPy. > > As a result, there has been set up a public mailing list for discussion of the creation of a Foundation for the Advancement of Scientific, Technical, and Engineering Computing Using High Level Abstractions (FASTECUHLA). ? ? The list is fastecuhla at googlegroups.com > > This is a place-holder name that can be replaced if somebody comes up with a better one. ? ? Please sign up for that list if you would like to contribute to the discussion. > > The items to discuss include: > ? ? ? ?* where to organize > ? ? ? ?* what the purposes should be > ? ? ? ?* who should be members > ? ? ? ?* where should money come from > ? ? ? ?* what other organizations exist that we could either piggy-back on or emulate > ? ? ? ?* what are the pitfalls on starting a foundation to support NumPy and SciPy versus other approaches > ? ? ? ?* who has time to participate in its organization and maintenance > > One important feature is that I see this foundation as a service opportunity and obligation and not as a "feather-in-the-cap" or something to join lightly. ? I'm hopeful that it can be a place where people and organizations can donate money and know that it will be going directly to further the core packages for Scientific Computing with Python. > > Thank you, > > -Travis > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Tue Oct 4 19:57:24 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 4 Oct 2011 17:57:24 -0600 Subject: [Numpy-discussion] A Foundation for the support of NumPy and SciPy In-Reply-To: References: Message-ID: On Tue, Oct 4, 2011 at 5:36 PM, Nathaniel Smith wrote: > [Does the group actually exist yet? Google says: "No groups match > fastecuhla." Replying here instead...] > > I've been following discussions around non-profit incorporation for > FOSS projects for about a decade (including some years on the internal > mailing list for SPI Inc. -- Debian's non-profit foundation). My > strong recommendation is that we not do it ourselves. Setting up our > own non-profit takes an immense amount of energy, and keeping it going > requires continuing to jump through annoying hoops on a regular basis > (you must have a procedure for selecting a board; the board must meet > on some regular schedule, achieve quorum, and regularly elect > officers; each board meeting must have minutes produced and approved, > you must file taxes on time, ...), and it's expensive to boot (you'll > need a professional accountant, etc.). As a result, most projects that > try going it on their own end up with a horrible mess sooner or later. > It works okay if you're, say, Gnome, but most projects are not Gnome. > > But fortunately, this is a solved problem: there are several > non-profit umbrella corporations that are set up to let experts take > care of this nonsense and amortize the costs over multiple projects. > The Software Freedom Conservancy is probably the most well put > together: > http://www.sfconservancy.org/overview/ > http://www.sfconservancy.org/members/services/ > http://sfconservancy.org/about/board/ > Many large projects with complicated legal situations like Samba, > Busybox, jQuery, Wine, Boost, ... have also chosen this approach: > http://www.sfconservancy.org/members/current/ > > TL;DR: When it comes to legal matters: starting your own non-profit is > to joining an existing umbrella non-profit as CVS is to git. (And in > fact git is also a SF Conservancy member.) > > My $0.02, > -- Nathaniel > All excellent points. Chuck > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Oct 4 20:12:35 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 4 Oct 2011 20:12:35 -0400 Subject: [Numpy-discussion] A Foundation for the support of NumPy and SciPy In-Reply-To: References: Message-ID: On Tue, Oct 4, 2011 at 7:36 PM, Nathaniel Smith wrote: > [Does the group actually exist yet? Google says: "No groups match > fastecuhla." Replying here instead...] https://groups.google.com/group/fastecuhla > > I've been following discussions around non-profit incorporation for > FOSS projects for about a decade (including some years on the internal > mailing list for SPI Inc. -- Debian's non-profit foundation). My > strong recommendation is that we not do it ourselves. Setting up our > own non-profit takes an immense amount of energy, and keeping it going > requires continuing to jump through annoying hoops on a regular basis > (you must have a procedure for selecting a board; the board must meet > on some regular schedule, achieve quorum, and regularly elect > officers; each board meeting must have minutes produced and approved, > you must file taxes on time, ...), and it's expensive to boot (you'll > need a professional accountant, etc.). As a result, most projects that > try going it on their own end up with a horrible mess sooner or later. > It works okay if you're, say, Gnome, but most projects are not Gnome. > > But fortunately, this is a solved problem: there are several > non-profit umbrella corporations that are set up to let experts take > care of this nonsense and amortize the costs over multiple projects. > The Software Freedom Conservancy is probably the most well put > together: > ? http://www.sfconservancy.org/overview/ > ? http://www.sfconservancy.org/members/services/ > ? http://sfconservancy.org/about/board/ > Many large projects with complicated legal situations like Samba, > Busybox, jQuery, Wine, Boost, ... have also chosen this approach: > ? http://www.sfconservancy.org/members/current/ > > TL;DR: When it comes to legal matters: starting your own non-profit is > to joining an existing umbrella non-profit as CVS is to git. (And in > fact git is also a SF Conservancy member.) > > My $0.02, > -- Nathaniel > > On Tue, Oct 4, 2011 at 1:58 PM, Travis Oliphant wrote: >> Hi all, >> >> At the recent US SciPy conference and at other times in the past I have been approached about the possibility of creating a foundation to support the development of SciPy and NumPy. >> >> I know there are varying opinions about that, but I am generally very supportive of the idea and would like to encourage it as much as I can. ? It would be interesting to have a public discussion of the issues, but these discussions should not clog the main list of either NumPy or SciPy. >> >> As a result, there has been set up a public mailing list for discussion of the creation of a Foundation for the Advancement of Scientific, Technical, and Engineering Computing Using High Level Abstractions (FASTECUHLA). ? ? The list is fastecuhla at googlegroups.com >> >> This is a place-holder name that can be replaced if somebody comes up with a better one. ? ? Please sign up for that list if you would like to contribute to the discussion. >> >> The items to discuss include: >> ? ? ? ?* where to organize >> ? ? ? ?* what the purposes should be >> ? ? ? ?* who should be members >> ? ? ? ?* where should money come from >> ? ? ? ?* what other organizations exist that we could either piggy-back on or emulate >> ? ? ? ?* what are the pitfalls on starting a foundation to support NumPy and SciPy versus other approaches >> ? ? ? ?* who has time to participate in its organization and maintenance >> >> One important feature is that I see this foundation as a service opportunity and obligation and not as a "feather-in-the-cap" or something to join lightly. ? I'm hopeful that it can be a place where people and organizations can donate money and know that it will be going directly to further the core packages for Scientific Computing with Python. >> >> Thank you, >> >> -Travis >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From teoliphant at gmail.com Wed Oct 5 00:51:35 2011 From: teoliphant at gmail.com (Travis Oliphant) Date: Tue, 4 Oct 2011 23:51:35 -0500 Subject: [Numpy-discussion] A Foundation for the support of NumPy and SciPy In-Reply-To: References: Message-ID: <05E5E745-422D-44B9-B3BB-BB83F2793811@enthought.com> This is a great suggestion. Thank you! It is definitely worth careful consideration. -Travis On Oct 4, 2011, at 6:36 PM, Nathaniel Smith wrote: > [Does the group actually exist yet? Google says: "No groups match > fastecuhla." Replying here instead...] > > I've been following discussions around non-profit incorporation for > FOSS projects for about a decade (including some years on the internal > mailing list for SPI Inc. -- Debian's non-profit foundation). My > strong recommendation is that we not do it ourselves. Setting up our > own non-profit takes an immense amount of energy, and keeping it going > requires continuing to jump through annoying hoops on a regular basis > (you must have a procedure for selecting a board; the board must meet > on some regular schedule, achieve quorum, and regularly elect > officers; each board meeting must have minutes produced and approved, > you must file taxes on time, ...), and it's expensive to boot (you'll > need a professional accountant, etc.). As a result, most projects that > try going it on their own end up with a horrible mess sooner or later. > It works okay if you're, say, Gnome, but most projects are not Gnome. > > But fortunately, this is a solved problem: there are several > non-profit umbrella corporations that are set up to let experts take > care of this nonsense and amortize the costs over multiple projects. > The Software Freedom Conservancy is probably the most well put > together: > http://www.sfconservancy.org/overview/ > http://www.sfconservancy.org/members/services/ > http://sfconservancy.org/about/board/ > Many large projects with complicated legal situations like Samba, > Busybox, jQuery, Wine, Boost, ... have also chosen this approach: > http://www.sfconservancy.org/members/current/ > > TL;DR: When it comes to legal matters: starting your own non-profit is > to joining an existing umbrella non-profit as CVS is to git. (And in > fact git is also a SF Conservancy member.) > > My $0.02, > -- Nathaniel > > On Tue, Oct 4, 2011 at 1:58 PM, Travis Oliphant wrote: >> Hi all, >> >> At the recent US SciPy conference and at other times in the past I have been approached about the possibility of creating a foundation to support the development of SciPy and NumPy. >> >> I know there are varying opinions about that, but I am generally very supportive of the idea and would like to encourage it as much as I can. It would be interesting to have a public discussion of the issues, but these discussions should not clog the main list of either NumPy or SciPy. >> >> As a result, there has been set up a public mailing list for discussion of the creation of a Foundation for the Advancement of Scientific, Technical, and Engineering Computing Using High Level Abstractions (FASTECUHLA). The list is fastecuhla at googlegroups.com >> >> This is a place-holder name that can be replaced if somebody comes up with a better one. Please sign up for that list if you would like to contribute to the discussion. >> >> The items to discuss include: >> * where to organize >> * what the purposes should be >> * who should be members >> * where should money come from >> * what other organizations exist that we could either piggy-back on or emulate >> * what are the pitfalls on starting a foundation to support NumPy and SciPy versus other approaches >> * who has time to participate in its organization and maintenance >> >> One important feature is that I see this foundation as a service opportunity and obligation and not as a "feather-in-the-cap" or something to join lightly. I'm hopeful that it can be a place where people and organizations can donate money and know that it will be going directly to further the core packages for Scientific Computing with Python. >> >> Thank you, >> >> -Travis >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com From jason-sage at creativetrax.com Wed Oct 5 02:51:15 2011 From: jason-sage at creativetrax.com (Jason Grout) Date: Wed, 05 Oct 2011 01:51:15 -0500 Subject: [Numpy-discussion] A Foundation for the support of NumPy and SciPy In-Reply-To: References: Message-ID: <4E8BFE63.40809@creativetrax.com> On 10/4/11 6:36 PM, Nathaniel Smith wrote: > TL;DR: When it comes to legal matters: starting your own non-profit is > to joining an existing umbrella non-profit as CVS is to git. (And in > fact git is also a SF Conservancy member.) Good point. William has a Sage Foundation set up through University of Washington, and UW (IIRC) handles all of these details. I think it has worked out well (though, of course, William is the one to ask). Jason From newsletters at alexandreleray.com Wed Oct 5 06:29:08 2011 From: newsletters at alexandreleray.com (Alexandre Leray) Date: Wed, 05 Oct 2011 12:29:08 +0200 Subject: [Numpy-discussion] oblique text In-Reply-To: References: <4E8B5A15.6000900@alexandreleray.com> Message-ID: <4E8C3174.3040703@alexandreleray.com> Thanks Pauli, exactly what I wanted! On 04/10/2011 22:22, Pauli Virtanen wrote: > 04.10.2011 21:10, Alexandre Leray kirjoitti: > [clip] >> But it is not very flexible. For instance, I'd like to control the "line >> spacing" (by adding extra spaces in between letters since it isn't real >> lines). I just found numpy and I have the intuition that it could do the >> job since it deals with matrices. >> >> Am I right? If so how would you do such a thing? > > import numpy as np > > def make_oblique(words): > n = max(2*len(word) for word in words) + len(words) > canvas = np.zeros((n, n), dtype='S1') > canvas[...] = ' ' # ye mighty FORTRAN, we beseech thee > > for j, word in enumerate(words): > i = np.arange(len(word)) > canvas[i+j, 2*i] = list(word) > > canvas[:,-1] = '\n' > return canvas.tostring().rstrip() > From W.Miah at uea.ac.uk Thu Oct 6 06:11:56 2011 From: W.Miah at uea.ac.uk (Miah Wadud Dr (ITCS)) Date: Thu, 6 Oct 2011 11:11:56 +0100 Subject: [Numpy-discussion] build errors In-Reply-To: References: Message-ID: Hi David, Thanks for your reply. Nope, I didn't build the ATLAS libraries myself and am trying to do that now. However, whenever I try to build the shared libraries using the configure command: [root at cn130 linux]# ../configure -Fa alg -fPIC --prefix=/gpfs/grace/atlas-3.8.4 it keeps building the static version. The ATLAS documentation stated that I need to provide the above flags to build the dynamic ones but this doesn't seem to work. Any help will be greatly appreciated. Thanks in advance. Regards, Wadud. >-----Original Message----- >From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion- >bounces at scipy.org] On Behalf Of David Cournapeau >Sent: Tuesday, October 04, 2011 6:40 PM >To: Discussion of Numerical Python >Subject: Re: [Numpy-discussion] build errors > >On Tue, Oct 4, 2011 at 11:54 AM, Miah Wadud Dr (ITCS) wrote: >> Hello numpy users, >> >> I am trying to build numpy 1.6.1 and am having problems. It prints the >following error message: >> >> gcc -pthread -shared build/temp.linux-x86_64- >2.4/numpy/core/blasdot/_dotblas.o -L/usr/lib64/atlas -Lbuild/temp.linux-x86_64- >2.4 -lptf77blas -lptcblas -latlas -o build/lib.linux-x86_64- >2.4/numpy/core/_dotblas.so >> /usr/bin/ld: /usr/lib64/atlas/libptcblas.a(cblas_dptgemm.o): relocation >R_X86_64_32 against `a local symbol' can not be used when making a shared >object; recompile with -fPIC >> /usr/lib64/atlas/libptcblas.a: could not read symbols: Bad value >> collect2: ld returned 1 exit status >> /usr/bin/ld: /usr/lib64/atlas/libptcblas.a(cblas_dptgemm.o): relocation >R_X86_64_32 against `a local symbol' can not be used when making a shared >object; recompile with -fPIC >> /usr/lib64/atlas/libptcblas.a: could not read symbols: Bad value >> collect2: ld returned 1 exit status >> error: Command "gcc -pthread -shared build/temp.linux-x86_64- >2.4/numpy/core/blasdot/_dotblas.o -L/usr/lib64/atlas -Lbuild/temp.linux-x86_64- >2.4 -lptf77blas -lptcblas -latlas -o build/lib.linux-x86_64- >2.4/numpy/core/_dotblas.so" failed with exit status 1 > > >Did you build Atlas by yourself ? If so, it is most likely not usable >for shared libraries (mandatory for any python extension, including >bumpy). You need to configure atlas with the option "-Fa alg -fPIC". > >David >_______________________________________________ >NumPy-Discussion mailing list >NumPy-Discussion at scipy.org >http://mail.scipy.org/mailman/listinfo/numpy-discussion From ndbecker2 at gmail.com Thu Oct 6 08:08:15 2011 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 06 Oct 2011 08:08:15 -0400 Subject: [Numpy-discussion] simple vector->matrix question Message-ID: Given a vector y, I want a matrix H whose rows are y - x0 y - x1 y - x2 ... where x_i are scalars Suggestion? From warren.weckesser at enthought.com Thu Oct 6 08:18:20 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 6 Oct 2011 07:18:20 -0500 Subject: [Numpy-discussion] simple vector->matrix question In-Reply-To: References: Message-ID: On Thu, Oct 6, 2011 at 7:08 AM, Neal Becker wrote: > Given a vector y, I want a matrix H whose rows are > > y - x0 > y - x1 > y - x2 > ... > > > where x_i are scalars > > Suggestion? > > In [15]: import numpy as np In [16]: y = np.array([10.0, 20.0, 30.0]) In [17]: x = np.array([0, 1, 2, 4]) In [18]: H = y - x[:, np.newaxis] In [19]: H Out[19]: array([[ 10., 20., 30.], [ 9., 19., 29.], [ 8., 18., 28.], [ 6., 16., 26.]]) Warren > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scipy at samueljohn.de Thu Oct 6 08:20:32 2011 From: scipy at samueljohn.de (Samuel John) Date: Thu, 6 Oct 2011 14:20:32 +0200 Subject: [Numpy-discussion] simple vector->matrix question In-Reply-To: References: Message-ID: <4743A867-C8D8-4AB0-BFBF-651E049A8FE8@samueljohn.de> import numpy # Say y is y = numpy.array([1,2,3]) Y = numpy.vstack([y,y,y,y]) # Y is array([[1, 2, 3], # [1, 2, 3], # [1, 2, 3], # [1, 2, 3]]) x = numpy.array([[0],[2],[4],[6]]) # a column-vector of your scalars x0, x1... Y - x Hope this is what you meant. cheers, Samuel On 06.10.2011, at 14:08, Neal Becker wrote: > Given a vector y, I want a matrix H whose rows are > > y - x0 > y - x1 > y - x2 > ... > > > where x_i are scalars From scipy at samueljohn.de Thu Oct 6 08:29:14 2011 From: scipy at samueljohn.de (Samuel John) Date: Thu, 6 Oct 2011 14:29:14 +0200 Subject: [Numpy-discussion] simple vector->matrix question In-Reply-To: References: Message-ID: <0FE42CB4-04C7-446F-952F-853FD12AD157@samueljohn.de> I just learned two things: 1. np.newaxis 2. Array dimension broadcasting rocks more than you think. The x[:, np.newaxis] might not be the most intuitive solution but it's great and powerful. Intuitive would be to have x.T to transform [0,1,2,4] into [[0],[1],[2],[4]]. Thanks Warren :-) Samuel On 06.10.2011, at 14:18, Warren Weckesser wrote: > > > On Thu, Oct 6, 2011 at 7:08 AM, Neal Becker wrote: > Given a vector y, I want a matrix H whose rows are > > y - x0 > y - x1 > y - x2 > ... > > > where x_i are scalars > > Suggestion? > > > > In [15]: import numpy as np > > In [16]: y = np.array([10.0, 20.0, 30.0]) > > In [17]: x = np.array([0, 1, 2, 4]) > > In [18]: H = y - x[:, np.newaxis] > > In [19]: H > Out[19]: > array([[ 10., 20., 30.], > [ 9., 19., 29.], > [ 8., 18., 28.], > [ 6., 16., 26.]]) > > > Warren > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From warren.weckesser at enthought.com Thu Oct 6 08:41:13 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 6 Oct 2011 07:41:13 -0500 Subject: [Numpy-discussion] simple vector->matrix question In-Reply-To: <0FE42CB4-04C7-446F-952F-853FD12AD157@samueljohn.de> References: <0FE42CB4-04C7-446F-952F-853FD12AD157@samueljohn.de> Message-ID: On Thu, Oct 6, 2011 at 7:29 AM, Samuel John wrote: > I just learned two things: > > 1. np.newaxis > 2. Array dimension broadcasting rocks more than you think. > > Yup. :) > > The x[:, np.newaxis] might not be the most intuitive solution but it's > great and powerful. > Intuitive would be to have x.T to transform [0,1,2,4] into > [[0],[1],[2],[4]]. > I agree, creating a new dimension by indexing with np.newaxis isn't the first thing I would guess if I didn't already know about it. An alternative is x.reshape(4,1) (or even better, x.reshape(-1,1) so it doesn't explicitly refer to the length of x). (Also, you probably noticed that transposing won't work, because x is one-dimensional. The transpose operation simply swaps dimensions, and with just one dimension there is nothing to swap; x.T is the same as x.) Warren > Thanks Warren :-) > Samuel > > On 06.10.2011, at 14:18, Warren Weckesser wrote: > > > > > > > On Thu, Oct 6, 2011 at 7:08 AM, Neal Becker wrote: > > Given a vector y, I want a matrix H whose rows are > > > > y - x0 > > y - x1 > > y - x2 > > ... > > > > > > where x_i are scalars > > > > Suggestion? > > > > > > > > In [15]: import numpy as np > > > > In [16]: y = np.array([10.0, 20.0, 30.0]) > > > > In [17]: x = np.array([0, 1, 2, 4]) > > > > In [18]: H = y - x[:, np.newaxis] > > > > In [19]: H > > Out[19]: > > array([[ 10., 20., 30.], > > [ 9., 19., 29.], > > [ 8., 18., 28.], > > [ 6., 16., 26.]]) > > > > > > Warren > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From W.Miah at uea.ac.uk Thu Oct 6 09:05:56 2011 From: W.Miah at uea.ac.uk (Miah Wadud Dr (ITCS)) Date: Thu, 6 Oct 2011 14:05:56 +0100 Subject: [Numpy-discussion] build errors In-Reply-To: References: Message-ID: Hi again, I have built the ATLAS dynamic shared libraries and now need to tell numpy to build against them which are located in a different location to where it expects them. Do you know how I can do that? The command I am using to build numpy is: python setup.py build --fcompiler=gnu95 but this looks in /usr/lib64/atlas and I need it to look in another location (/gpfs/grace/atlas-3.8.4). Thanks in advance, Regards. >-----Original Message----- >From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion- >bounces at scipy.org] On Behalf Of Miah Wadud Dr (ITCS) >Sent: Thursday, October 06, 2011 11:12 AM >To: Discussion of Numerical Python >Subject: Re: [Numpy-discussion] build errors > >Hi David, > >Thanks for your reply. Nope, I didn't build the ATLAS libraries myself and am >trying to do that now. However, whenever I try to build the shared libraries >using the configure command: > >[root at cn130 linux]# ../configure -Fa alg -fPIC --prefix=/gpfs/grace/atlas-3.8.4 > >it keeps building the static version. The ATLAS documentation stated that I >need to provide the above flags to build the dynamic ones but this doesn't seem >to work. > >Any help will be greatly appreciated. Thanks in advance. > >Regards, >Wadud. > >>-----Original Message----- >>From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion- >>bounces at scipy.org] On Behalf Of David Cournapeau >>Sent: Tuesday, October 04, 2011 6:40 PM >>To: Discussion of Numerical Python >>Subject: Re: [Numpy-discussion] build errors >> >>On Tue, Oct 4, 2011 at 11:54 AM, Miah Wadud Dr (ITCS) >wrote: >>> Hello numpy users, >>> >>> I am trying to build numpy 1.6.1 and am having problems. It prints the >>following error message: >>> >>> gcc -pthread -shared build/temp.linux-x86_64- >>2.4/numpy/core/blasdot/_dotblas.o -L/usr/lib64/atlas -Lbuild/temp.linux- >x86_64- >>2.4 -lptf77blas -lptcblas -latlas -o build/lib.linux-x86_64- >>2.4/numpy/core/_dotblas.so >>> /usr/bin/ld: /usr/lib64/atlas/libptcblas.a(cblas_dptgemm.o): relocation >>R_X86_64_32 against `a local symbol' can not be used when making a shared >>object; recompile with -fPIC >>> /usr/lib64/atlas/libptcblas.a: could not read symbols: Bad value >>> collect2: ld returned 1 exit status >>> /usr/bin/ld: /usr/lib64/atlas/libptcblas.a(cblas_dptgemm.o): relocation >>R_X86_64_32 against `a local symbol' can not be used when making a shared >>object; recompile with -fPIC >>> /usr/lib64/atlas/libptcblas.a: could not read symbols: Bad value >>> collect2: ld returned 1 exit status >>> error: Command "gcc -pthread -shared build/temp.linux-x86_64- >>2.4/numpy/core/blasdot/_dotblas.o -L/usr/lib64/atlas -Lbuild/temp.linux- >x86_64- >>2.4 -lptf77blas -lptcblas -latlas -o build/lib.linux-x86_64- >>2.4/numpy/core/_dotblas.so" failed with exit status 1 >> >> >>Did you build Atlas by yourself ? If so, it is most likely not usable >>for shared libraries (mandatory for any python extension, including >>bumpy). You need to configure atlas with the option "-Fa alg -fPIC". >> >>David >>_______________________________________________ >>NumPy-Discussion mailing list >>NumPy-Discussion at scipy.org >>http://mail.scipy.org/mailman/listinfo/numpy-discussion >_______________________________________________ >NumPy-Discussion mailing list >NumPy-Discussion at scipy.org >http://mail.scipy.org/mailman/listinfo/numpy-discussion From paul.anton.letnes at gmail.com Thu Oct 6 09:15:58 2011 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Thu, 6 Oct 2011 15:15:58 +0200 Subject: [Numpy-discussion] build errors In-Reply-To: References: Message-ID: You can use the BLAS and LAPACK environment variables. export BLAS=/path/to/libatlas.so export LAPACK=/path/to/libatlas.so python setup.py build .... I've recently had problems with ATLAS solving equation systems incorrectly for certain inputs with no adequate explanation. Re-running the same simulation but linking to Goto2 BLAS/LAPACK instead solved the problem. Strange, yet worrisome. YMMV! Cheers Paul On Thu, Oct 6, 2011 at 3:05 PM, Miah Wadud Dr (ITCS) wrote: > Hi again, > > I have built the ATLAS dynamic shared libraries and now need to tell numpy to build against them which are located in a different location to where it expects them. Do you know how I can do that? The command I am using to build numpy is: > > python setup.py build --fcompiler=gnu95 > > but this looks in /usr/lib64/atlas and I need it to look in another location (/gpfs/grace/atlas-3.8.4). > > Thanks in advance, > Regards. > >>-----Original Message----- >>From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion- >>bounces at scipy.org] On Behalf Of Miah Wadud Dr (ITCS) >>Sent: Thursday, October 06, 2011 11:12 AM >>To: Discussion of Numerical Python >>Subject: Re: [Numpy-discussion] build errors >> >>Hi David, >> >>Thanks for your reply. Nope, I didn't build the ATLAS libraries myself and am >>trying to do that now. However, whenever I try to build the shared libraries >>using the configure command: >> >>[root at cn130 linux]# ../configure -Fa alg -fPIC --prefix=/gpfs/grace/atlas-3.8.4 >> >>it keeps building the static version. The ATLAS documentation stated that I >>need to provide the above flags to build the dynamic ones but this doesn't seem >>to work. >> >>Any help will be greatly appreciated. Thanks in advance. >> >>Regards, >>Wadud. >> >>>-----Original Message----- >>>From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion- >>>bounces at scipy.org] On Behalf Of David Cournapeau >>>Sent: Tuesday, October 04, 2011 6:40 PM >>>To: Discussion of Numerical Python >>>Subject: Re: [Numpy-discussion] build errors >>> >>>On Tue, Oct 4, 2011 at 11:54 AM, Miah Wadud Dr (ITCS) >>wrote: >>>> Hello numpy users, >>>> >>>> I am trying to build numpy 1.6.1 and am having problems. It prints the >>>following error message: >>>> >>>> gcc -pthread -shared build/temp.linux-x86_64- >>>2.4/numpy/core/blasdot/_dotblas.o -L/usr/lib64/atlas -Lbuild/temp.linux- >>x86_64- >>>2.4 -lptf77blas -lptcblas -latlas -o build/lib.linux-x86_64- >>>2.4/numpy/core/_dotblas.so >>>> /usr/bin/ld: /usr/lib64/atlas/libptcblas.a(cblas_dptgemm.o): relocation >>>R_X86_64_32 against `a local symbol' can not be used when making a shared >>>object; recompile with -fPIC >>>> /usr/lib64/atlas/libptcblas.a: could not read symbols: Bad value >>>> collect2: ld returned 1 exit status >>>> /usr/bin/ld: /usr/lib64/atlas/libptcblas.a(cblas_dptgemm.o): relocation >>>R_X86_64_32 against `a local symbol' can not be used when making a shared >>>object; recompile with -fPIC >>>> /usr/lib64/atlas/libptcblas.a: could not read symbols: Bad value >>>> collect2: ld returned 1 exit status >>>> error: Command "gcc -pthread -shared build/temp.linux-x86_64- >>>2.4/numpy/core/blasdot/_dotblas.o -L/usr/lib64/atlas -Lbuild/temp.linux- >>x86_64- >>>2.4 -lptf77blas -lptcblas -latlas -o build/lib.linux-x86_64- >>>2.4/numpy/core/_dotblas.so" failed with exit status 1 >>> >>> >>>Did you build Atlas by yourself ? If so, it is most likely not usable >>>for shared libraries (mandatory for any python extension, including >>>bumpy). You need to configure atlas with the option "-Fa alg -fPIC". >>> >>>David >>>_______________________________________________ >>>NumPy-Discussion mailing list >>>NumPy-Discussion at scipy.org >>>http://mail.scipy.org/mailman/listinfo/numpy-discussion >>_______________________________________________ >>NumPy-Discussion mailing list >>NumPy-Discussion at scipy.org >>http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jsalvati at u.washington.edu Thu Oct 6 13:24:04 2011 From: jsalvati at u.washington.edu (John Salvatier) Date: Thu, 6 Oct 2011 10:24:04 -0700 Subject: [Numpy-discussion] nditer: possible to manually handle dimensions with different lengths? In-Reply-To: References: Message-ID: I ended up fixing my problem by removing the 'buffering' flag and adding the 'copy' flag to each of the input arrays. I think that nested_iters might be improved by an operand axes specification for each layer of nesting like nditer uses, though I suppose that 3 layers of nesting might be confusing for users. I get an "array too big error" on "values, groups, outs = it1.itviews" when the shape of the iterator is larger than ~(4728, 125285) even if each of the arrays should only have actual size along one dimension. Code: @cython.boundscheck(False) def groupwise_accumulate(vals, group, axis ): cdef np.ndarray[double, ndim = 2] values cdef np.ndarray[long, ndim = 2] groups cdef np.ndarray[double, ndim = 2] outs cdef int size cdef long g cdef int i, j #copy so that swaping the axis doesn't mess up the original arrays vals = vals.copy() group = group.copy() #add a dimension to match up with the new dimension and swap the given axis to the end vals.shape = vals.shape + (1,) vaxes = range(vals.ndim) vaxes.append(axis) vaxes.remove(axis) vals = np.transpose(vals, vaxes) vals = vals.copy() #the output should have the same shape as the values except along the #last two axes (which are the given axis and the new axis) oshape = list(vals.shape) bins = len(np.unique(group)) oshape[-1] = 1 oshape[-2] = bins out = np.empty(tuple(oshape)) #line up grouping with the given axis gshape = [1] * (len(oshape) - 1) + [vals.shape[-1]] group.shape = gshape #nested iterator should go along the last two axes axes = range(vals.ndim) axes0 = axes[:-2] axes1 = axes[-2:] it0, it1 = np.nested_iters([vals,group, out], [axes0, axes1], op_dtypes=['float64', 'int32', 'float64'], op_flags = [['readonly', 'copy'], ['readonly','copy'], ['readwrite']], flags = ['multi_index', 'reduce_ok' ]) size = vals.shape[-1] for x in it0: values, groups, outs = it1.itviews i = 0 j = 0 while i < size: g = groups[0,i] #accumulation initialization while i < size and groups[0,i] == g: #groupwise accumulation i += 1 outs[j,0] = calculation() j += 1 #swap back the new axis to the original location of the given axis out.shape = out.shape[:-1] oaxes = range(vals.ndim -1) oaxes.insert(axis, out.ndim-1) oaxes = oaxes[:-1] #remove the now reduced original given axis out = np.transpose(out, oaxes) return out On Mon, Oct 3, 2011 at 2:03 PM, John Salvatier wrote: > Some observations and questions about nested_iters. Nested_iters seems to > require that all input arrays have the same number of dimensions (so you > will have to pad some input shapes with 1s). Is there a way to specify how > the axes line are matched together like for nditer? > > > When I try to run the following program, > > @cython.boundscheck(False) > def vars(vals, group, axis ): > cdef np.ndarray[double, ndim = 2] values > cdef np.ndarray[long long, ndim = 2] groups > cdef np.ndarray[double, ndim = 2] outs > cdef int size > cdef double value > cdef int i, j > cdef long long cgroup > cdef double min > cdef double max > cdef double open > oshape = list(vals.shape) > bins = len(np.unique(group)) > oshape = oshape+[bins] > oshape[axis] = 1 > out = np.empty(tuple(oshape)) > axes = range(vals.ndim) > axes.remove(axis) > gshape = [1] * len(oshape) > gshape[axis] = len(group) > group.shape = gshape > vals = vals[...,np.newaxis] > it0, it1 = np.nested_iters([vals,group, out], > [axes, [axis,len(oshape) -1]], > op_dtypes=['float64', 'int64', 'float64'], > flags = ['multi_index', 'buffered']) > size = vals.shape[axis] > for x in it0: > values, groups, outs = it1.itviews > > j = -1 > for i in range(size): > if cgroup != groups[i,0]: > if j != -1: > outs[0,j] = garmanklass(open, values[i,0], min, max) > cgroup = groups[i,0] > min = inf > max = -inf > open = values[i,0] > j += 1 > > min = fmin(min, values[i,0]) > max = fmax(max, values[i,0]) > > outs[0,j+1] = garmanklass(open, values[size -1], min, max) > return out > > > I get an error > > File "comp.pyx", line 58, in varscale.comp.vars (varscale\comp.c:1565) > values, groups, outs = it1.itviews > ValueError: cannot provide an iterator view when buffering is enabled > > > Which I am not sure how to deal with. Any advice? > > What I am trying to do here is to do a "grouped" calculation (the group > specified by the group argument) on the values along the given axis. I try > to use nested_iter to iterate over the specified axis and a new axis (the > length of the number of groups) separately so I can do my calculation. > > On Mon, Oct 3, 2011 at 9:03 AM, John Salvatier wrote: > >> Thanks mark! I think that's exactly what I'm looking for. We even had a >> previous discussion about this (oops!) ( >> http://mail.scipy.org/pipermail/numpy-discussion/2011-January/054421.html >> ). >> >> I didn't find any documentation, I will try to add some once I understand >> how it works better. >> >> John >> >> >> On Sat, Oct 1, 2011 at 2:53 PM, Mark Wiebe wrote: >> >>> On Sat, Oct 1, 2011 at 1:45 PM, John Salvatier < >>> jsalvati at u.washington.edu> wrote: >>> >>>> I apologize, I picked a poor example of what I want to do. Your >>>> suggestion would work for the example I provided, but not for a more complex >>>> example. My actual task is something like a "group by" operation along a >>>> particular axis (with a known number of groups). >>>> >>>> Let me try again: What I would like to be able to do is to specify some >>>> of the iterator dimensions to be handled manually by me. For example lets >>>> say I have some kind of a 2d smoothing algorithm. If I start with an array >>>> of shape [a,b,c,d] and I'd like to do the 2d smoothing over the 2nd and 3rd >>>> dimensions, I'd like to be able to tell nditer to do normal broadcasting and >>>> iteration over the 1st and 4th dimensions but leave iteration over the 2nd >>>> and 3rd dimensions to me and my algorithm. Each iteration of nditer would >>>> give me a 2d array to which I apply my algorithm. This way I could write >>>> more arbitrary functions that operate on arrays and support broadcasting. >>>> >>>> Is clearer? >>>> >>> >>> Maybe this will work for you: >>> >>> In [15]: a = np.arange(2*3*4*5).reshape(2,3,4,5) >>> >>> In [16]: it0, it1 = np.nested_iters(a, [[0,3], [1,2]], >>> flags=['multi_index']) >>> >>> In [17]: for x in it0: >>> ....: print it1.itviews[0] >>> ....: >>> [[ 0 5 10 15] >>> [20 25 30 35] >>> [40 45 50 55]] >>> [[ 1 6 11 16] >>> [21 26 31 36] >>> [41 46 51 56]] >>> [[ 2 7 12 17] >>> [22 27 32 37] >>> [42 47 52 57]] >>> [[ 3 8 13 18] >>> [23 28 33 38] >>> [43 48 53 58]] >>> [[ 4 9 14 19] >>> [24 29 34 39] >>> [44 49 54 59]] >>> [[ 60 65 70 75] >>> [ 80 85 90 95] >>> [100 105 110 115]] >>> [[ 61 66 71 76] >>> [ 81 86 91 96] >>> [101 106 111 116]] >>> [[ 62 67 72 77] >>> [ 82 87 92 97] >>> [102 107 112 117]] >>> [[ 63 68 73 78] >>> [ 83 88 93 98] >>> [103 108 113 118]] >>> [[ 64 69 74 79] >>> [ 84 89 94 99] >>> [104 109 114 119]] >>> >>> Cheers, >>> Mark >>> >>> >>> >>> >>>> >>>> >>>> On Fri, Sep 30, 2011 at 5:04 PM, Mark Wiebe wrote: >>>> >>>>> On Fri, Sep 30, 2011 at 8:03 AM, John Salvatier < >>>>> jsalvati at u.washington.edu> wrote: >>>>> >>>>>> Using nditer, is it possible to manually handle dimensions with >>>>>> different lengths? >>>>>> >>>>>> For example, lets say I had an array A[5, 100] and I wanted to sample >>>>>> every 10 along the second axis so I would end up with an array B[5,10]. Is >>>>>> it possible to do this with nditer, handling the iteration over the second >>>>>> axis manually of course (probably in cython)? >>>>>> >>>>>> I want something like this (modified from >>>>>> http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#putting-the-inner-loop-in-cython >>>>>> ) >>>>>> >>>>>> @cython.boundscheck(False) >>>>>> def sum_squares_cy(arr): >>>>>> cdef np.ndarray[double] x >>>>>> cdef np.ndarray[double] y >>>>>> cdef int size >>>>>> cdef double value >>>>>> cdef int j >>>>>> >>>>>> axeslist = list(arr.shape) >>>>>> axeslist[1] = -1 >>>>>> >>>>>> out = zeros((arr.shape[0], 10)) >>>>>> it = np.nditer([arr, out], flags=['reduce_ok', 'external_loop', >>>>>> 'buffered', 'delay_bufalloc'], >>>>>> op_flags=[['readonly'], ['readwrite', >>>>>> 'no_broadcast']], >>>>>> op_axes=[None, axeslist], >>>>>> op_dtypes=['float64', 'float64']) >>>>>> it.operands[1][...] = 0 >>>>>> it.reset() >>>>>> for xarr, yarr in it: >>>>>> x = xarr >>>>>> y = yarr >>>>>> size = x.shape[0] >>>>>> j = 0 >>>>>> for i in range(size): >>>>>> #some magic here involving indexing into x[i] and y[j] >>>>>> return it.operands[1] >>>>>> >>>>>> Does this make sense? Is it possible to do? >>>>>> >>>>> >>>>> I'm not sure I understand precisely what you're asking. Maybe you >>>>> could reshape A to have shape [5, 10, 10], so that one of those 10's can >>>>> match up with the 10 in B, perhaps with the op_axes? >>>>> >>>>> -Mark >>>>> >>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> NumPy-Discussion mailing list >>>>>> NumPy-Discussion at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pivanov314 at gmail.com Thu Oct 6 17:25:18 2011 From: pivanov314 at gmail.com (Paul Ivanov) Date: Thu, 6 Oct 2011 14:25:18 -0700 Subject: [Numpy-discussion] Crash on (un-orthodox) __import__ In-Reply-To: References: Message-ID: Hi Andrea, On Tue, Oct 4, 2011 at 3:04 AM, Andrea Gavana wrote: > Hi All, > ? ? I was fiddling here and there with some code doing dynamic import of > stuff, and I noticed that this code: > import os > import sys > init_name = r"C:\Python27\Lib\site-packages\numpy\__init__.py" > directory, module_name = os.path.split(init_name) > main = os.path.splitext(module_name)[0] > sys.path.insert(0, os.path.normpath(directory)) > # Crash here... > mainmod = __import__(main) in this case, your main is '__init__' and your directory is 'C:\Python27\Lib\site-packages\numpy' which is probably not what you intended. You should make directory 'C:\Python27\Lib\site-packages' and main into 'numpy' best, -- Paul Ivanov 314 address only used for lists, ?off-list direct email at: http://pirsquared.org | GPG/PGP key id: 0x0F3E28F7 From robert.kern at gmail.com Fri Oct 7 00:33:01 2011 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 7 Oct 2011 00:33:01 -0400 Subject: [Numpy-discussion] Crash on (un-orthodox) __import__ In-Reply-To: References: Message-ID: On Thu, Oct 6, 2011 at 17:25, Paul Ivanov wrote: > Hi Andrea, > > On Tue, Oct 4, 2011 at 3:04 AM, Andrea Gavana wrote: >> Hi All, >> ? ? I was fiddling here and there with some code doing dynamic import of >> stuff, and I noticed that this code: >> import os >> import sys >> init_name = r"C:\Python27\Lib\site-packages\numpy\__init__.py" >> directory, module_name = os.path.split(init_name) >> main = os.path.splitext(module_name)[0] >> sys.path.insert(0, os.path.normpath(directory)) >> # Crash here... >> mainmod = __import__(main) > > > in this case, your main is '__init__' and your directory is > 'C:\Python27\Lib\site-packages\numpy' which is probably not what you > intended. You should make directory 'C:\Python27\Lib\site-packages' > and main into 'numpy' Still, it shouldn't segfault, and it's worth figuring out why it does. gdb has been mostly unenlightening for me since gdb won't let me navigate the traceback. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From hangenuit at gmail.com Fri Oct 7 01:40:53 2011 From: hangenuit at gmail.com (Han Genuit) Date: Fri, 7 Oct 2011 07:40:53 +0200 Subject: [Numpy-discussion] Crash on (un-orthodox) __import__ In-Reply-To: References: Message-ID: > Still, it shouldn't segfault, and it's worth figuring out why it does. > gdb has been mostly unenlightening for me since gdb won't let me > navigate the traceback. You could try faulthandler, it prints the (python) traceback after a crash: http://pypi.python.org/pypi/faulthandler/ From cgohlke at uci.edu Fri Oct 7 05:10:57 2011 From: cgohlke at uci.edu (Christoph Gohlke) Date: Fri, 07 Oct 2011 02:10:57 -0700 Subject: [Numpy-discussion] Crash on (un-orthodox) __import__ In-Reply-To: References: Message-ID: <4E8EC221.5030809@uci.edu> On 10/6/2011 9:33 PM, Robert Kern wrote: > On Thu, Oct 6, 2011 at 17:25, Paul Ivanov wrote: >> Hi Andrea, >> >> On Tue, Oct 4, 2011 at 3:04 AM, Andrea Gavana wrote: >>> Hi All, >>> I was fiddling here and there with some code doing dynamic import of >>> stuff, and I noticed that this code: >>> import os >>> import sys >>> init_name = r"C:\Python27\Lib\site-packages\numpy\__init__.py" >>> directory, module_name = os.path.split(init_name) >>> main = os.path.splitext(module_name)[0] >>> sys.path.insert(0, os.path.normpath(directory)) >>> # Crash here... >>> mainmod = __import__(main) >> >> >> in this case, your main is '__init__' and your directory is >> 'C:\Python27\Lib\site-packages\numpy' which is probably not what you >> intended. You should make directory 'C:\Python27\Lib\site-packages' >> and main into 'numpy' > > Still, it shouldn't segfault, and it's worth figuring out why it does. > gdb has been mostly unenlightening for me since gdb won't let me > navigate the traceback. > This is the same crash that occurs when running `python -v -c"import __init__"` from within the site-packages/numpy directory. Several numpy Python and C extension modules are imported/executed twice. Besides fixing the segfault it might be worth preventing this type of import in numpy/__init__.py, for example: import sys as _sys if '__init__' in _sys.modules and _sys.modules['__init__'].__file__ == __file__: _sys.stderr.write("Use `import numpy` ... .\n") # or raise ImportError() else: ... del _sys The faulthandler output is: File "core\numerictypes.py", line 226 in _evalname File "core\numerictypes.py", line 247 in bitname File "core\numerictypes.py", line 307 in _add_aliases File "core\numerictypes.py", line 330 in File "core\__init__.py", line 8 in File "__init__.py", line 146 in File "", line 1 in The function call that crashes during the second import of core/numerictypes.py is `int("64")`. Christoph From chaoyuejoy at gmail.com Fri Oct 7 05:36:19 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Fri, 7 Oct 2011 11:36:19 +0200 Subject: [Numpy-discussion] how to count Nan occurrence in a ndarray? Message-ID: Dear all, I have an ndarray with dimension of 4X62500. is there anyway I can count the number of missing value (NaN)? because I want to know how many observations are missing? Thanks for any idea, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Fri Oct 7 05:36:53 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Fri, 7 Oct 2011 11:36:53 +0200 Subject: [Numpy-discussion] how to count Nan occurrence in a ndarray? In-Reply-To: References: Message-ID: The data is read from a .mat file. 2011/10/7 Chao YUE > Dear all, > > I have an ndarray with dimension of 4X62500. is there anyway I can count > the number of missing value (NaN)? because I want to know how many > observations are missing? > > Thanks for any idea, > > Chao > > -- > > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > ************************************************************************************ > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Fri Oct 7 06:40:33 2011 From: shish at keba.be (Olivier Delalleau) Date: Fri, 7 Oct 2011 06:40:33 -0400 Subject: [Numpy-discussion] how to count Nan occurrence in a ndarray? In-Reply-To: References: Message-ID: You can use numpy.isnan(array).sum() -=- Olivier 2011/10/7 Chao YUE > The data is read from a .mat file. > > 2011/10/7 Chao YUE > >> Dear all, >> >> I have an ndarray with dimension of 4X62500. is there anyway I can count >> the number of missing value (NaN)? because I want to know how many >> observations are missing? >> >> Thanks for any idea, >> >> Chao >> >> -- >> >> *********************************************************************************** >> Chao YUE >> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) >> UMR 1572 CEA-CNRS-UVSQ >> Batiment 712 - Pe 119 >> 91191 GIF Sur YVETTE Cedex >> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 >> >> ************************************************************************************ >> >> > > > -- > > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > ************************************************************************************ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Fri Oct 7 08:38:52 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Fri, 7 Oct 2011 14:38:52 +0200 Subject: [Numpy-discussion] how to count Nan occurrence in a ndarray? In-Reply-To: References: Message-ID: Thanks Olivier. Chao 2011/10/7 Olivier Delalleau > You can use numpy.isnan(array).sum() > > -=- Olivier > > 2011/10/7 Chao YUE > >> The data is read from a .mat file. >> >> 2011/10/7 Chao YUE >> >>> Dear all, >>> >>> I have an ndarray with dimension of 4X62500. is there anyway I can count >>> the number of missing value (NaN)? because I want to know how many >>> observations are missing? >>> >>> Thanks for any idea, >>> >>> Chao >>> >>> -- >>> >>> *********************************************************************************** >>> Chao YUE >>> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) >>> UMR 1572 CEA-CNRS-UVSQ >>> Batiment 712 - Pe 119 >>> 91191 GIF Sur YVETTE Cedex >>> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 >>> >>> ************************************************************************************ >>> >>> >> >> >> -- >> >> *********************************************************************************** >> Chao YUE >> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) >> UMR 1572 CEA-CNRS-UVSQ >> Batiment 712 - Pe 119 >> 91191 GIF Sur YVETTE Cedex >> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 >> >> ************************************************************************************ >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Sat Oct 8 10:45:06 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Sat, 8 Oct 2011 16:45:06 +0200 Subject: [Numpy-discussion] what python module to modify NetCDF data? Message-ID: Dear all, I want to change some variable values in a series of NetCDF file. Did anybody else did this before using python? Now I use pupynere for reading data from NetCDF files and making plots. but the document of pupynere for writing data to NetCDF file is quite simple and I still feel difficult to do this with pupynere. the NetCDF file I want to change is a global data (0.5X0.5d resolution, 360X720grid with 12 time steps) and have approx. 10 variables. I just want to change some points for a specific variable for all 12 time steps. I know it's possible use NCO ncap2 utility to do the job. but now I have some problem in using ncap2 within a shell script. I guess there is some easy way to use some python module to do the job? like mainly altering the data that need to change while let the others remaining intact? Any idea will be greatly appreciated. I will all a good weekend, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sole at esrf.fr Sat Oct 8 11:57:14 2011 From: sole at esrf.fr (Vicente Sole) Date: Sat, 08 Oct 2011 17:57:14 +0200 Subject: [Numpy-discussion] what python module to modify NetCDF data? In-Reply-To: References: Message-ID: <20111008175714.9p4msjkcckc8wgok@160.103.2.152> Hi, I have never seen myself a NetCDF file but if your NetCDF file is using HDF5 as format (possible since NetCDF 4 if I am not mistaken), you should be able to use h5py or PyTables to access and or modify it. Best regards, Armando Quoting Chao YUE : > Dear all, > > I want to change some variable values in a series of NetCDF file. Did > anybody else did this before using python? > Now I use pupynere for reading data from NetCDF files and making plots. but > the document of pupynere for writing data to NetCDF file is quite simple and > I still feel difficult to do this with pupynere. > > the NetCDF file I want to change is a global data (0.5X0.5d resolution, > 360X720grid with 12 time steps) and have approx. 10 variables. I just want > to change some points for a specific > variable for all 12 time steps. I know it's possible use NCO ncap2 utility > to do the job. but now I have some problem in using ncap2 within a shell > script. > I guess there is some easy way to use some python module to do the job? like > mainly altering the data that need to change while let the others remaining > intact? > > Any idea will be greatly appreciated. I will all a good weekend, > > Chao > > -- > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > ************************************************************************************ > From kikocorreoso at gmail.com Sat Oct 8 12:12:47 2011 From: kikocorreoso at gmail.com (Kiko) Date: Sat, 8 Oct 2011 18:12:47 +0200 Subject: [Numpy-discussion] what python module to modify NetCDF data? In-Reply-To: <20111008175714.9p4msjkcckc8wgok@160.103.2.152> References: <20111008175714.9p4msjkcckc8wgok@160.103.2.152> Message-ID: Quoting Chao YUE : > > > Dear all, > > > > I want to change some variable values in a series of NetCDF file. Did > > anybody else did this before using python? > > Now I use pupynere for reading data from NetCDF files and making plots. > but > > the document of pupynere for writing data to NetCDF file is quite simple > and > > I still feel difficult to do this with pupynere. > > > > the NetCDF file I want to change is a global data (0.5X0.5d resolution, > > 360X720grid with 12 time steps) and have approx. 10 variables. I just > want > > to change some points for a specific > > variable for all 12 time steps. I know it's possible use NCO ncap2 > utility > > to do the job. but now I have some problem in using ncap2 within a shell > > script. > > I guess there is some easy way to use some python module to do the job? > like > > mainly altering the data that need to change while let the others > remaining > > intact? > > > > Any idea will be greatly appreciated. I will all a good weekend, > > > > Chao > Hi. Have a look to [1] and [2]. [1] http://code.google.com/p/netcdf4-python/ [2] http://www.scipy.org/doc/api_docs/SciPy.io.netcdf.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Sat Oct 8 12:25:20 2011 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 8 Oct 2011 11:25:20 -0500 Subject: [Numpy-discussion] what python module to modify NetCDF data? In-Reply-To: References: <20111008175714.9p4msjkcckc8wgok@160.103.2.152> Message-ID: On Saturday, October 8, 2011, Kiko wrote: > Quoting Chao YUE : >> >> > Dear all, >> > >> > I want to change some variable values in a series of NetCDF file. Did >> > anybody else did this before using python? >> > Now I use pupynere for reading data from NetCDF files and making plots. but >> > the document of pupynere for writing data to NetCDF file is quite simple and >> > I still feel difficult to do this with pupynere. >> > >> > the NetCDF file I want to change is a global data (0.5X0.5d resolution, >> > 360X720grid with 12 time steps) and have approx. 10 variables. I just want >> > to change some points for a specific >> > variable for all 12 time steps. I know it's possible use NCO ncap2 utility >> > to do the job. but now I have some problem in using ncap2 within a shell >> > script. >> > I guess there is some easy way to use some python module to do the job? like >> > mainly altering the data that need to change while let the others remaining >> > intact? >> > >> > Any idea will be greatly appreciated. I will all a good weekend, >> > >> > Chao > > Hi. > > Have a look to [1] and [2]. > > [1] http://code.google.com/p/netcdf4-python/ > [2] http://www.scipy.org/doc/api_docs/SciPy.io.netcdf.html > Just a caveat about scipy.io.netcdf, it only reads and writes. It does not modify. So, what one has to do with that module is to load up all data, modify some of it, and then save it all back to a new file. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjhnson at gmail.com Sat Oct 8 21:44:41 2011 From: tjhnson at gmail.com (T J) Date: Sat, 8 Oct 2011 18:44:41 -0700 Subject: [Numpy-discussion] Non-rectangular neighborhoods Message-ID: While reading the documentation for the neighborhood iterator, it seems that it can only handle rectangular neighborhoods. Have I understood this correctly? If it is possible to do non-rectangular regions, could someone post an example/sketch of how to do this? From Chris.Barker at noaa.gov Sun Oct 9 02:20:11 2011 From: Chris.Barker at noaa.gov (Chris Barker) Date: Sat, 08 Oct 2011 23:20:11 -0700 Subject: [Numpy-discussion] what python module to modify NetCDF data? In-Reply-To: References: Message-ID: <4E913D1B.3080809@noaa.gov> On 10/8/11 7:45 AM, Chao YUE wrote: > I want to change some variable values in a series of NetCDF file. Did > anybody else did this before using python? I like the netcdf4 package -- very powerful and a pretty nice numpy-compatible API: http://code.google.com/p/netcdf4-python/ Check out the "examples" and "utils" directories for some code to take a look at. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From konrad.hinsen at fastmail.net Sun Oct 9 05:30:22 2011 From: konrad.hinsen at fastmail.net (Konrad Hinsen) Date: Sun, 9 Oct 2011 11:30:22 +0200 Subject: [Numpy-discussion] what python module to modify NetCDF data? In-Reply-To: <20111008175714.9p4msjkcckc8wgok@160.103.2.152> References: <20111008175714.9p4msjkcckc8wgok@160.103.2.152> Message-ID: <8D863376-7879-449B-B7C0-4D65931DAD28@fastmail.net> On 8 oct. 11, at 17:57, Vicente Sole wrote: > I have never seen myself a NetCDF file but if your NetCDF file is > using HDF5 as format (possible since NetCDF 4 if I am not mistaken), > you should be able to use h5py or PyTables to access and or modify it. I haven't tried this, but I don't think it's a good plan. PyTables can read arbitrary HDF5 files, but writes only a subset. The HDF5-based netCDF 4 format is also a subset of valid HDF5 files. It is unlikely that those two subsets are compatible, meaning that sooner or later PyTables will produce a file that netCDF 4 will not accept. h5py would be a better choice for an HDF5-based approach. However, there are two reasons for using a specific netCDF interface rather than HDF5: 1) There are still plenty of non-HDF5 netCDF files around, and netCDF4 continues to support them. 2) The netCDF interface is simpler than the HDF5 interface, and therefore easier to use, even if that difference is much less important in Python than in C or Fortran. A full netCDF interface has been available for many years as part of the ScientificPython package: http://dirac.cnrs-orleans.fr/ScientificPython/ScientificPythonManual/ Recent versions fully support netCDF4, including the HDF5-based formats. There are other Python interfaces to the netCDF libraries, but I haven't used them. Konrad. From chaoyuejoy at gmail.com Sun Oct 9 05:48:17 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Sun, 9 Oct 2011 11:48:17 +0200 Subject: [Numpy-discussion] what python module to modify NetCDF data? In-Reply-To: <8D863376-7879-449B-B7C0-4D65931DAD28@fastmail.net> References: <20111008175714.9p4msjkcckc8wgok@160.103.2.152> <8D863376-7879-449B-B7C0-4D65931DAD28@fastmail.net> Message-ID: Thanks for all for very useful discussions. I am just on the beginning of modeling and for now I only use netCDF4 data. I think it might be a good idea to try netcdf4-pythonfirst. I think whitaker will continue to develop this package. But it gave me a good idea what python package people are using for netCDF data. Chao 2011/10/9 Konrad Hinsen > On 8 oct. 11, at 17:57, Vicente Sole wrote: > > > I have never seen myself a NetCDF file but if your NetCDF file is > > using HDF5 as format (possible since NetCDF 4 if I am not mistaken), > > you should be able to use h5py or PyTables to access and or modify it. > > I haven't tried this, but I don't think it's a good plan. PyTables can > read arbitrary HDF5 files, but writes only a subset. The HDF5-based > netCDF 4 format is also a subset of valid HDF5 files. It is unlikely > that those two subsets are compatible, meaning that sooner or later > PyTables will produce a file that netCDF 4 will not accept. h5py would > be a better choice for an HDF5-based approach. > > However, there are two reasons for using a specific netCDF interface > rather than HDF5: > > 1) There are still plenty of non-HDF5 netCDF files around, and netCDF4 > continues to support them. > 2) The netCDF interface is simpler than the HDF5 interface, and > therefore easier to use, even if that difference is much less > important in Python than in C or Fortran. > > A full netCDF interface has been available for many years as part of > the ScientificPython package: > > > http://dirac.cnrs-orleans.fr/ScientificPython/ScientificPythonManual/ > > Recent versions fully support netCDF4, including the HDF5-based formats. > > There are other Python interfaces to the netCDF libraries, but I > haven't used them. > > Konrad. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jean-louis at durrieu.ch Sun Oct 9 09:37:23 2011 From: jean-louis at durrieu.ch (Jean-Louis Durrieu) Date: Sun, 9 Oct 2011 15:37:23 +0200 Subject: [Numpy-discussion] numpy.lib.npyio.load Message-ID: Hi everyone, I was just wondering something: lately, I had to use the load function, to load arrays stored in npz files. During one session, I need to read quite a few times several files (or even the same files), for some model training. I however just found out that the batch processing I ran failed because of a "too many open files" problem. After checking, with lsof, it seems that the use of np.load(filename), where filename is a string (= path to the file), worked an unexpected way. When I do the following, in a ipython 0.11 session, with the --pylab option : In [1]: np.__version__ Out[1]: '1.6.1' In [2]: np.load Out[2]: In [3]: struc = np.load('path/to/file.npz') In [4]: ar1 = struc['ar1'] I would expect to have opened a file, read the array in it, and closed it. However, 'lsof' proved me wrong, and I found out that I need to explicitly do 'struc.close()' in order to close the file. While this is not a big issue, since one has to do that anyway when opening/closing files, I was a bit surprised that I had to do it when using the load function. Maybe this behaviour should be made explicit in the documentation? (at least, in the help(load), nothing is said about that, as far as I could read). I just wanted to share that, just in case someone was crossing the same type of issues. I hope this is going to be useful for others! Best regards, Jean-Louis Durrieu From cournape at gmail.com Mon Oct 10 04:15:04 2011 From: cournape at gmail.com (David Cournapeau) Date: Mon, 10 Oct 2011 09:15:04 +0100 Subject: [Numpy-discussion] numpy.lib.npyio.load In-Reply-To: References: Message-ID: Hi Jean-Louis, On Sun, Oct 9, 2011 at 2:37 PM, Jean-Louis Durrieu wrote: > Hi everyone, > > I was just wondering something: lately, I had to use the load function, to load arrays stored in npz files. > > During one session, I need to read quite a few times several files (or even the same files), for some model training. I however just found out that the batch processing I ran failed because of a "too many open files" problem. > > After checking, with lsof, it seems that the use of np.load(filename), where filename is a string (= path to the file), worked an unexpected way. When I do the following, in a ipython 0.11 session, with the --pylab option : > > In [1]: np.__version__ > Out[1]: '1.6.1' > > In [2]: np.load > Out[2]: > > In [3]: struc = np.load('path/to/file.npz') > > In [4]: ar1 = struc['ar1'] > > I would expect to have opened a file, read the array in it, and closed it. However, 'lsof' proved me wrong, and I found out that I need to explicitly do 'struc.close()' in order to close the file. This is a documentation bug. If you look into the sources of load, you will see that in the case of zipfile, a NpzFile instance is returned by load. This is a file-like object, and needs to be closed. The rationale is that it enables lazy-loading (not all arrays are loaded in memory, only the one you request). So for now, closing the returned NpzFile instance is the correct solution. I added a note about this in the load doc, and a context manager to NpzFile so you can also do (python >= 2.5 only): with load('yo.npz') as data: .... cheers, David From inconnu at list.ru Mon Oct 10 04:53:54 2011 From: inconnu at list.ru (Andrey N. Sobolev) Date: Mon, 10 Oct 2011 14:53:54 +0600 Subject: [Numpy-discussion] Lookup array Message-ID: <20111010145354.75d49b0e@w7-pericles.r249.physics.susu.ac.ru> Hi all, I have 2 arrays - A with the dimensions of 1000x4 and B with the dimensions of 5000x4. B doesn't (hopefully) contain any rows that are not in A. I need to create a lookup array C, the i-th value of which will be the index of B[i] in A. In the (very rare) case when B[i] is not in A C[i] should be equal to -1. Now I'm dealing with it using which: ... import numpy as np for i, row in enumerate(B): try: C[i] = np.which(np.all(A == row, axis = 1))[0][0] except (IndexError, ): C[i] = -1 ... but that's very slow (it consumes 70% of cpu time needed by the whole program). I guess that it's because of a slow pythonic loop, but I just can't get how to get rid of it. Any suggestions would be appreciated. Thanks in advance, Andrey -- Researcher, General and theoretical physics dept., South Ural State University 454080, Pr. Lenina, 76, Chelyabinsk, Russia Tel: +7 351 265-47-13 andrey at physics.susu.ac.ru From rjd4+numpy at cam.ac.uk Mon Oct 10 05:03:48 2011 From: rjd4+numpy at cam.ac.uk (Bob Dowling) Date: Mon, 10 Oct 2011 10:03:48 +0100 Subject: [Numpy-discussion] Lookup array In-Reply-To: <20111010145354.75d49b0e@w7-pericles.r249.physics.susu.ac.ru> References: <20111010145354.75d49b0e@w7-pericles.r249.physics.susu.ac.ru> Message-ID: <4E92B4F4.5070701@cam.ac.uk> On 10/10/11 09:53, Andrey N. Sobolev wrote: > I have 2 arrays - A with the dimensions of 1000x4 and B with the > dimensions of 5000x4. B doesn't (hopefully) contain any rows that are > not in A. I need to create a lookup array C, the i-th value of which > will be the index of B[i] in A. In the (very rare) case when B[i] is not > in A C[i] should be equal to -1. May we assume that there are no repeats in A? (i.e. no cases where two different indices are both valid?) -- Bob Dowling From inconnu at list.ru Mon Oct 10 05:17:41 2011 From: inconnu at list.ru (Andrey N. Sobolev) Date: Mon, 10 Oct 2011 15:17:41 +0600 Subject: [Numpy-discussion] Lookup array In-Reply-To: <4E92B4F4.5070701@cam.ac.uk> References: <20111010145354.75d49b0e@w7-pericles.r249.physics.susu.ac.ru> <4E92B4F4.5070701@cam.ac.uk> Message-ID: <20111010151741.438dd400@w7-pericles.r249.physics.susu.ac.ru> ? Mon, 10 Oct 2011 10:03:48 +0100 Bob Dowling ?????: > > On 10/10/11 09:53, Andrey N. Sobolev wrote: > > > I have 2 arrays - A with the dimensions of 1000x4 and B with the > > dimensions of 5000x4. B doesn't (hopefully) contain any rows that > > are not in A. I need to create a lookup array C, the i-th value of > > which will be the index of B[i] in A. In the (very rare) case when > > B[i] is not in A C[i] should be equal to -1. > > May we assume that there are no repeats in A? (i.e. no cases where > two different indices are both valid?) > Yes, rows in A are unique and sorted. One more typo found - instead of np.which in the previous e-mail it has to be np.where, I don't know what I thought about :) Thanks in advance! Andrey From stefan at sun.ac.za Mon Oct 10 06:59:48 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 10 Oct 2011 03:59:48 -0700 Subject: [Numpy-discussion] scikits.image 0.3 release Message-ID: Announcement: scikits.image 0.3 =============================== After a brief (!) absence, we're back with a new and shiny version of scikits.image, the image processing toolbox for SciPy. This release runs under all major operating systems where Python (>=2.6 or 3.x), NumPy and SciPy can be installed. For more information, visit our website http://scikits-image.org or the examples gallery at http://scikits-image.org/docs/0.3/auto_examples/ New Features ------------ - Shortest paths - Total variation denoising - Hough and probabilistic Hough transforms - Radon transform with reconstruction - Histogram of gradients - Morphology, including watershed, connected components - Faster homography transformations (rotations, zoom, etc.) - Image dtype conversion routines - Line drawing - Better image collection handling - Constant time median filter - Edge detection (canny, sobel, etc.) - IO: Freeimage, FITS, Qt and other image loaders; video support. - SIFT feature loader - Example data-sets ... as well as many bug fixes and minor updates. Contributors for this release ----------------------------- Martin Bergtholdt Luis Pedro Coelho Chris Colbert Damian Eads Dan Farmer Emmanuelle Gouillart Brian Holt Pieter Holtzhausen Thouis (Ray) Jones Lee Kamentsky Almar Klein Kyle Mandli Andreas Mueller Neil Muller Zachary Pincus James Turner Stefan van der Walt Gael Varoquaux Tony Yu From shish at keba.be Mon Oct 10 11:20:08 2011 From: shish at keba.be (Olivier Delalleau) Date: Mon, 10 Oct 2011 11:20:08 -0400 Subject: [Numpy-discussion] Lookup array In-Reply-To: <20111010151741.438dd400@w7-pericles.r249.physics.susu.ac.ru> References: <20111010145354.75d49b0e@w7-pericles.r249.physics.susu.ac.ru> <4E92B4F4.5070701@cam.ac.uk> <20111010151741.438dd400@w7-pericles.r249.physics.susu.ac.ru> Message-ID: 2011/10/10 Andrey N. Sobolev > ? Mon, 10 Oct 2011 10:03:48 +0100 > Bob Dowling ?????: > > > > > On 10/10/11 09:53, Andrey N. Sobolev wrote: > > > > > I have 2 arrays - A with the dimensions of 1000x4 and B with the > > > dimensions of 5000x4. B doesn't (hopefully) contain any rows that > > > are not in A. I need to create a lookup array C, the i-th value of > > > which will be the index of B[i] in A. In the (very rare) case when > > > B[i] is not in A C[i] should be equal to -1. > > > > May we assume that there are no repeats in A? (i.e. no cases where > > two different indices are both valid?) > > > > Yes, rows in A are unique and sorted. > One more typo found - instead of np.which in the previous e-mail it has > to be np.where, I don't know what I thought about :) > > Thanks in advance! > Andrey > The following doesn't use numpy but seems to be about 20x faster: A_rows = {} for i, row in enumerate(A): A_rows[tuple(row)] = i for i, row in enumerate(B): C[i] = A_rows.get(tuple(row), -1) -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From nitroamos at gmail.com Mon Oct 10 15:27:36 2011 From: nitroamos at gmail.com (Amos Anderson) Date: Mon, 10 Oct 2011 12:27:36 -0700 Subject: [Numpy-discussion] failure to build Message-ID: hello -- i'm trying to build numpy-1.6.1. so far, it works on several machines that i've tried, but now it's failing. I put some details on the machine below. I tried to change the compiler, but setup.py objected to the things I tried (--compiler=g++4, --compiler=icc). thanks for any advice! Amos. building 'numpy.core.multiarray' extension compiling C sources C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC creating build/temp.linux-x86_64-2.7/numpy/core/src/multiarray compile options: '-Inumpy/core/include -Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/home/amosa/triad/tools/python/include/python2.7 -Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray -Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c' gcc: numpy/core/src/multiarray/multiarraymodule_onefile.c numpy/core/src/multiarray/descriptor.c: In function `_convert_divisor_to_multiple': numpy/core/src/multiarray/descriptor.c:606: warning: 'q' might be used uninitialized in this function numpy/core/src/multiarray/einsum.c.src: In function `float_sum_of_products_contig_outstride0_one': numpy/core/src/multiarray/einsum.c.src:852: error: unrecognizable insn: (insn:HI 440 228 481 14 /usr/lib/gcc/x86_64-redhat-linux/3.4.6/include/xmmintrin.h:915 (set (reg:SF 148) (vec_select:SF (reg/v:V4SF 67 [ accum_sse ]) (parallel [ (const_int 0 [0x0]) ]))) -1 (insn_list 213 (nil)) (nil)) numpy/core/src/multiarray/einsum.c.src:852: internal compiler error: in extract_insn, at recog.c:2083 Please submit a full bug report, with preprocessed source if appropriate. See for instructions. Preprocessed source stored into /tmp/ccWyirX3.out file, please attach this to your bugreport. numpy/core/src/multiarray/descriptor.c: In function `_convert_divisor_to_multiple': numpy/core/src/multiarray/descriptor.c:606: warning: 'q' might be used uninitialized in this function numpy/core/src/multiarray/einsum.c.src: In function `float_sum_of_products_contig_outstride0_one': numpy/core/src/multiarray/einsum.c.src:852: error: unrecognizable insn: (insn:HI 440 228 481 14 /usr/lib/gcc/x86_64-redhat-linux/3.4.6/include/xmmintrin.h:915 (set (reg:SF 148) (vec_select:SF (reg/v:V4SF 67 [ accum_sse ]) (parallel [ (const_int 0 [0x0]) ]))) -1 (insn_list 213 (nil)) (nil)) numpy/core/src/multiarray/einsum.c.src:852: internal compiler error: in extract_insn, at recog.c:2083 Please submit a full bug report, with preprocessed source if appropriate. See for instructions. Preprocessed source stored into /tmp/ccWyirX3.out file, please attach this to your bugreport. error: Command "gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Inumpy/core/include -Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/home/amosa/triad/tools/python/include/python2.7 -Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray -Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c numpy/core/src/multiarray/multiarraymodule_onefile.c -o build/temp.linux-x86_64-2.7/numpy/core/src/multiarray/multiarraymodule_onefile.o" failed with exit status 1 cat /etc/redhat-release Red Hat Enterprise Linux AS release 4 (Nahant Update 4) $ svnversion . /repos/svn/trunk 4168 gcc: _configtest.c gcc -pthread _configtest.o -L/usr/lib64 -lf77blas -lcblas -latlas -o _configtest ATLAS version 3.7.11 built by root on Mon Jun 5 10:14:12 EDT 2006: UNAME : Linux intel1.lsf.platform.com 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:56:28 EST 2006 x86_64 x86_64 x86_64 GNU/Linux INSTFLG : MMDEF : /export/madison/src/roll/hpc/BUILD/ATLAS/CONFIG/ARCHS/P4E64SSE3/gcc/gemm ARCHDEF : /export/madison/src/roll/hpc/BUILD/ATLAS/CONFIG/ARCHS/P4E64SSE3/gcc/misc F2CDEFS : -DAdd__ -DStringSunStyle CACHEEDGE: 393216 F77 : /usr/bin/g77, version GNU Fortran (GCC) 3.4.5 20051201 (Red Hat 3.4.5-2) F77FLAGS : -fomit-frame-pointer -O -m64 CC : /usr/bin/gcc, version gcc (GCC) 3.4.5 20051201 (Red Hat 3.4.5-2) CC FLAGS : -fomit-frame-pointer -O3 -funroll-all-loops -m64 MCC : /usr/bin/gcc, version gcc (GCC) 3.4.5 20051201 (Red Hat 3.4.5-2) MCCFLAGS : -fomit-frame-pointer -O -m64 success! From jesper.webmail at gmail.com Mon Oct 10 17:02:47 2011 From: jesper.webmail at gmail.com (Jesper Larsen) Date: Mon, 10 Oct 2011 23:02:47 +0200 Subject: [Numpy-discussion] np.where and broadcasting Message-ID: Hi numpy-users I have a 2d array of shape (ny, nx). I want to broadcast (copy) this array to a target array of shape (nt, nz, ny, nz) or (nt, ny, nx) so that the 2d array is repeated for each t and z index (corresponding to nt and nz). I am not sure how to do this (generic solution, for different target array shapes). Can anyone help? Best regards, Jesper From pav at iki.fi Mon Oct 10 17:14:04 2011 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 10 Oct 2011 23:14:04 +0200 Subject: [Numpy-discussion] np.where and broadcasting In-Reply-To: References: Message-ID: 10.10.2011 23:02, Jesper Larsen kirjoitti: > Hi numpy-users > > I have a 2d array of shape (ny, nx). I want to broadcast (copy) this > array to a target array of shape (nt, nz, ny, nz) or (nt, ny, nx) so > that the 2d array is repeated for each t and z index (corresponding to > nt and nz). I am not sure how to do this (generic solution, for > different target array shapes). Can anyone help? assert a.shape == (nt, nz, ny, nx) or a.shape == (nt, ny, nx) assert b.shape == (ny, nx) a[...] = b From martin.raspaud at smhi.se Tue Oct 11 01:49:10 2011 From: martin.raspaud at smhi.se (Martin Raspaud) Date: Tue, 11 Oct 2011 07:49:10 +0200 Subject: [Numpy-discussion] kind of a matrix multiplication Message-ID: <4E93D8D6.7000803@smhi.se> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, I have a stack of vectors: v1 = np.arange(3) v2 = np.arange(3) + 3 stack = np.vstack(v1, v2) (now stack is : array([[0, 1, 2], [3, 4, 5]])) and a 3d matrix: mat = np.dstack((np.eye(3), np.eye(3) * 2)) (mat is now array([[[ 1., 2.], [ 0., 0.], [ 0., 0.]], [[ 0., 0.], [ 1., 2.], [ 0., 0.]], [[ 0., 0.], [ 0., 0.], [ 1., 2.]]])) I'm looking for the operation needed to get the two (stacked) vectors array([[0, 1, 2], [6, 8, 10]])) or its transpose. I tried various combinations of tensor products, but I always get a result in 3 dimensions, while I just want two. Any suggestions ? Thanks, Martin -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) Comment: Using GnuPG with Red Hat - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOk9jWAAoJEBdvyODiyJI4y30IAJu6YIHK+ED8pN5M2TFrEKj8 k/K22MjitlQ8wTFDxwc5xBRI+yoniqgAfpzWjdU3pc5MxzXRgbZrRZagYWjepZyI CtN/CHy+BfM8EPJulFeVcInAgo1pgfAhH4xwEakbu88XhKSgat1Y9xlNRcrohTUQ oBVd+DNmBYpEUAa0pDjkMYXM8vaJqzePZZGaviZxY0AY2MBDrbZN/z6t4u2Unajn 8X1vjCg/XfDbm9v7FK/52MUorAJinZRdHiWBTE9rOmAqjJxTBoFKkN+0FMTUk6Sj acJNjr5KFjl6o3JPxqU4jRfw1zFRO9BEouzosKfYcs/kLozNjTBmfZztg0np/dg= =Ykij -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: martin_raspaud.vcf Type: text/x-vcard Size: 293 bytes Desc: not available URL: From inconnu at list.ru Tue Oct 11 03:39:05 2011 From: inconnu at list.ru (Andrey N. Sobolev) Date: Tue, 11 Oct 2011 13:39:05 +0600 Subject: [Numpy-discussion] Lookup array In-Reply-To: References: <20111010145354.75d49b0e@w7-pericles.r249.physics.susu.ac.ru> <4E92B4F4.5070701@cam.ac.uk> <20111010151741.438dd400@w7-pericles.r249.physics.susu.ac.ru> Message-ID: <20111011133905.788673fb@w7-pericles.r249.physics.susu.ac.ru> ? Mon, 10 Oct 2011 11:20:08 -0400 Olivier Delalleau ?????: > > The following doesn't use numpy but seems to be about 20x faster: > > A_rows = {} > for i, row in enumerate(A): > A_rows[tuple(row)] = i > for i, row in enumerate(B): > C[i] = A_rows.get(tuple(row), -1) > > -=- Olivier Thanks a lot, Olivier, that's makes my program like 3x faster. One lesson I can draw from this - don't try to use NumPy in situations it doesn't fit :) WBR, Andrey -- Researcher, General and theoretical physics dept., South Ural State University 454080, Pr. Lenina, 76, Chelyabinsk, Russia Tel: +7 351 265-47-13 andrey at physics.susu.ac.ru From cwg at falma.de Tue Oct 11 04:00:36 2011 From: cwg at falma.de (Christoph Groth) Date: Tue, 11 Oct 2011 10:00:36 +0200 Subject: [Numpy-discussion] speeding up operations on small vectors Message-ID: <87mxd8dkkb.fsf@falma.de> Dear numpy experts, I could not find a satisfying solution to the following problem, so I thought I would ask: In one part of a large program I have to deal a lot with small (2d or 3d) vectors and matrices, performing simple linear algebra operations with them (dot products and matrix multiplications). For several reasons I want the code to be in python. The most natural choice seemed to be to use numpy. However, the constant time cost when dealing with numpy arrays seems to be immense, as demonstrated by this toy program: **************************************************************** import numpy as np from time import time def points_tuple(radius): rr = radius**2 def inside(point): return point[0]**2 + point[1]**2 < rr M = ((1, 0), (0, 1)) for x in xrange(-radius, radius + 1): for y in xrange(-radius, radius + 1): point = (M[0][0] * x + M[0][1] * y, M[1][0] * x + M[1][1] * y) if inside(point): yield point def points_numpy(radius): rr = radius**2 def inside(point): return np.dot(point, point) < rr M = np.identity(2, dtype=int) for x in xrange(-radius, radius + 1): for y in xrange(-radius, radius + 1): point = np.dot(M, (x, y)) if inside(point): yield point def main(): r = 200 for func in [points_tuple, points_numpy]: t = time() for point in func(r): pass print func.__name__, time() - t, 'seconds' if __name__ == '__main__': main() **************************************************************** On my trusty old machine the output (python 2.6, numpy 1.5.1) is: points_tuple 0.36815404892 seconds points_numpy 6.20338392258 seconds I do not need C performance here, but the latter is definitely too slow. In the real program it's not so easy (but possible) to use tuples because the code is dimension-independent. Before considering writing an own "small vector" module, I'd like about other possible solutions. Other people must have stumbled across this before! Thanks, Christoph From martin.raspaud at smhi.se Tue Oct 11 05:48:51 2011 From: martin.raspaud at smhi.se (Martin Raspaud) Date: Tue, 11 Oct 2011 11:48:51 +0200 Subject: [Numpy-discussion] kind of a matrix multiplication In-Reply-To: <4E93D8D6.7000803@smhi.se> References: <4E93D8D6.7000803@smhi.se> Message-ID: <4E941103.1080805@smhi.se> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/10/11 07:49, Martin Raspaud wrote: > Hi all, [...] > I'm looking for the operation needed to get the two (stacked) vectors > array([[0, 1, 2], > [6, 8, 10]])) > or its transpose. Hi again, Here is a solution I just found: np.einsum("ik, jki->ij", stack, mat) array([[ 0., 1., 2.], [ 6., 8., 10.]]) Best regards, Martin -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) Comment: Using GnuPG with Red Hat - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOlBEDAAoJEBdvyODiyJI4DTQH/ikl8Rb1Lx5qakyS4IoXcBos lPyKk7s+AEA6dthxDJP+O5v+YQsMxR1f5Swx1rJ2OJxjmJ/6Rb8yphMMdX89kgkL VhMIpDwre8/KyReTl1CT2chWtOtz8B46Pn1ivhDbNfvCUBc39v2ymJXV2bZCYzBD fn1phhoYgmo3zsnnj3w6CSjAjsJmaI4+OuLAsd2IXWUmN8/CmwwAIBL9tlzrhmBR n5N1gEq9YV7LYgxjeoG47NkbLe0GbLTVWUb7H5/3GvoXh6bKtrujtPeWaMTVLeIo q/n8Cg8977WSpR1W1soBEMf/nCdLs7IeqlN7utjOTI33UOo07hssb+xtfwv2By8= =UU5t -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: martin_raspaud.vcf Type: text/x-vcard Size: 293 bytes Desc: not available URL: From shish at keba.be Tue Oct 11 07:12:34 2011 From: shish at keba.be (Olivier Delalleau) Date: Tue, 11 Oct 2011 07:12:34 -0400 Subject: [Numpy-discussion] speeding up operations on small vectors In-Reply-To: <87mxd8dkkb.fsf@falma.de> References: <87mxd8dkkb.fsf@falma.de> Message-ID: Here's a version that uses less Python loops and thus is faster. What still takes time is the array creation (np.array(...)), I'm not sure exactly why. It may be possible to speed it up. def points_numpy(radius): rr = radius**2 M = np.identity(2, dtype=int) x_y = np.array(list(itertools.product(xrange(-radius, radius + 1), xrange(-radius, radius + 1)))) x_y_M = np.dot(x_y, M) is_inside = (x_y_M ** 2).sum(axis=1) < rr for point in x_y_M[is_inside]: yield point -=- Olivier 2011/10/11 Christoph Groth > Dear numpy experts, > > I could not find a satisfying solution to the following problem, so I > thought I would ask: > > In one part of a large program I have to deal a lot with small (2d or > 3d) vectors and matrices, performing simple linear algebra operations > with them (dot products and matrix multiplications). For several > reasons I want the code to be in python. > > The most natural choice seemed to be to use numpy. However, the > constant time cost when dealing with numpy arrays seems to be immense, > as demonstrated by this toy program: > > **************************************************************** > import numpy as np > from time import time > > def points_tuple(radius): > rr = radius**2 > def inside(point): > return point[0]**2 + point[1]**2 < rr > > M = ((1, 0), (0, 1)) > for x in xrange(-radius, radius + 1): > for y in xrange(-radius, radius + 1): > point = (M[0][0] * x + M[0][1] * y, > M[1][0] * x + M[1][1] * y) > if inside(point): > yield point > > def points_numpy(radius): > rr = radius**2 > def inside(point): > return np.dot(point, point) < rr > > M = np.identity(2, dtype=int) > for x in xrange(-radius, radius + 1): > for y in xrange(-radius, radius + 1): > point = np.dot(M, (x, y)) > if inside(point): > yield point > > def main(): > r = 200 > for func in [points_tuple, points_numpy]: > t = time() > for point in func(r): > pass > print func.__name__, time() - t, 'seconds' > > if __name__ == '__main__': > main() > **************************************************************** > > On my trusty old machine the output (python 2.6, numpy 1.5.1) is: > > points_tuple 0.36815404892 seconds > points_numpy 6.20338392258 seconds > > I do not need C performance here, but the latter is definitely too slow. > > In the real program it's not so easy (but possible) to use tuples > because the code is dimension-independent. Before considering writing > an own "small vector" module, I'd like about other possible solutions. > Other people must have stumbled across this before! > > Thanks, > Christoph > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cwg at falma.de Tue Oct 11 08:14:48 2011 From: cwg at falma.de (Christoph Groth) Date: Tue, 11 Oct 2011 14:14:48 +0200 Subject: [Numpy-discussion] speeding up operations on small vectors References: <87mxd8dkkb.fsf@falma.de> Message-ID: <87wrcb913b.fsf@falma.de> Olivier Delalleau writes: > Here's a version that uses less Python loops and thus is faster. What > still takes time is the array creation (np.array(...)), I'm not sure > exactly why. It may be possible to speed it up. Thank you for your suggestion. It doesn't help me however, because the algorithm I'm _really_ trying to speed up cannot be vectorized with numpy in the way you vectorized my toy example. Any other ideas? From pav at iki.fi Tue Oct 11 08:49:18 2011 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 11 Oct 2011 14:49:18 +0200 Subject: [Numpy-discussion] speeding up operations on small vectors In-Reply-To: <87wrcb913b.fsf@falma.de> References: <87mxd8dkkb.fsf@falma.de> <87wrcb913b.fsf@falma.de> Message-ID: 11.10.2011 14:14, Christoph Groth kirjoitti: [clip] > Thank you for your suggestion. It doesn't help me however, because the > algorithm I'm _really_ trying to speed up cannot be vectorized with > numpy in the way you vectorized my toy example. > > Any other ideas? Reformulate the problem so that it can be vectorized. Without knowing more about the actual algorithm you are trying to implement, it's not easy to give more detailed help. -- Pauli Virtanen From shish at keba.be Tue Oct 11 08:59:46 2011 From: shish at keba.be (Olivier Delalleau) Date: Tue, 11 Oct 2011 08:59:46 -0400 Subject: [Numpy-discussion] kind of a matrix multiplication In-Reply-To: <4E93D8D6.7000803@smhi.se> References: <4E93D8D6.7000803@smhi.se> Message-ID: I don't really understand the operation you have in mind that should lead to your desired result, so here's a way to get it that discards most of mat's content: (which does not seem needed to compute what you want): (stack.T * mat[0, 0]).T -=- Olivier 2011/10/11 Martin Raspaud > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi all, > > I have a stack of vectors: > > v1 = np.arange(3) > v2 = np.arange(3) + 3 > stack = np.vstack(v1, v2) > > (now stack is : > array([[0, 1, 2], > [3, 4, 5]])) > > and a 3d matrix: > > mat = np.dstack((np.eye(3), np.eye(3) * 2)) > (mat is now > array([[[ 1., 2.], > [ 0., 0.], > [ 0., 0.]], > > [[ 0., 0.], > [ 1., 2.], > [ 0., 0.]], > > [[ 0., 0.], > [ 0., 0.], > [ 1., 2.]]])) > > I'm looking for the operation needed to get the two (stacked) vectors > array([[0, 1, 2], > [6, 8, 10]])) > or its transpose. > > I tried various combinations of tensor products, but I always get a > result in 3 dimensions, while I just want two. > > Any suggestions ? > > Thanks, > Martin > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.14 (GNU/Linux) > Comment: Using GnuPG with Red Hat - http://enigmail.mozdev.org/ > > iQEcBAEBAgAGBQJOk9jWAAoJEBdvyODiyJI4y30IAJu6YIHK+ED8pN5M2TFrEKj8 > k/K22MjitlQ8wTFDxwc5xBRI+yoniqgAfpzWjdU3pc5MxzXRgbZrRZagYWjepZyI > CtN/CHy+BfM8EPJulFeVcInAgo1pgfAhH4xwEakbu88XhKSgat1Y9xlNRcrohTUQ > oBVd+DNmBYpEUAa0pDjkMYXM8vaJqzePZZGaviZxY0AY2MBDrbZN/z6t4u2Unajn > 8X1vjCg/XfDbm9v7FK/52MUorAJinZRdHiWBTE9rOmAqjJxTBoFKkN+0FMTUk6Sj > acJNjr5KFjl6o3JPxqU4jRfw1zFRO9BEouzosKfYcs/kLozNjTBmfZztg0np/dg= > =Ykij > -----END PGP SIGNATURE----- > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanluc.menut at free.fr Tue Oct 11 09:35:51 2011 From: jeanluc.menut at free.fr (Jean-Luc Menut) Date: Tue, 11 Oct 2011 15:35:51 +0200 Subject: [Numpy-discussion] speeding up operations on small vectors In-Reply-To: <87wrcb913b.fsf@falma.de> References: <87mxd8dkkb.fsf@falma.de> <87wrcb913b.fsf@falma.de> Message-ID: <4E944637.1010209@free.fr> > Any other ideas? I'm not an expert at all, but I far as I understand if you cannot vectorize your problem, numpy is not the best tool to use if the speed matter a bit. Of course it's not a realistic example, but a simple loop computing a cosine is 3-4 time slower using numpy cos than python math cos (which is already slower than IDL). From nwagner at iam.uni-stuttgart.de Tue Oct 11 10:34:50 2011 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Tue, 11 Oct 2011 16:34:50 +0200 Subject: [Numpy-discussion] genfromtxt Message-ID: Hi all, How do I use genfromtxt to read a file with the following lines 1 1 2.2592365264892578D+01 2 2 2.2592365264892578D+01 1 3 2.6666669845581055D+00 3 3 2.2592365264892578D+01 2 4 2.6666669845581055D+00 4 4 2.2592365264892578D+01 3 5 2.6666669845581055D+00 5 5 2.2592365264892578D+01 4 6 2.6666669845581055D+00 6 6 2.2592365264892578D+01 1 7 2.9814243316650391D+00 7 7 1.7259031295776367D+01 2 8 2.9814243316650391D+00 8 8 1.7259031295776367D+01 ... names =("i","j","v") A = np.genfromtxt('bmll.mtl',dtype=[('i','int'),('j','int'),('v','d')],names=names) V = A[:]['v'] >>> V array([ NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN]) yields NaN, while convertfunc = lambda x: x.replace('D','E') names =("i","j","v") A = np.genfromtxt('bmll.mtl',dtype=[('i','int'),('j','int'),('v','|S24')],names=names,converters={"v":convertfunc}) V = A[:]['v'].astype(float) >>> V array([ 22.59236526, 22.59236526, 2.66666698, 22.59236526, 2.66666698, 22.59236526, 2.66666698, 22.59236526, 2.66666698, 22.59236526, 2.98142433, 17.2590313 , 2.98142433, 17.2590313 , 2.98142433, 2.98142433, 2.66666698, 22.59236526, 2.98142433, 2.98142433, 2.66666698, 22.59236526, 2.98142433, 2.98142433, 2.66666698, 22.59236526, 2.98142433, 2.98142433, 2.66666698, 22.59236526, 2.98142433, 2.66666698, 17.2590313 , 2.98142433, 2.66666698, 17.2590313 ]) works fine. Nils From cimrman3 at ntc.zcu.cz Tue Oct 11 10:59:20 2011 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Tue, 11 Oct 2011 16:59:20 +0200 Subject: [Numpy-discussion] numpy.distutils quirk Message-ID: <4E9459C8.8080300@ntc.zcu.cz> Hi, I have now spent several hours hunting down a major slowdown of my code caused (apparently) by using config.add_library() for a reusable part of C source files instead of just config.add_extension(). The reason of the slowdown was different, but hard to discern, naming of options and silent ignoring of non-existing ones: add_library() : extra_compiler_args add_extension() : extra_compile_args Other build keys used for the same purpose also differ. A bug to be reported, or is this going to be solved by going bento? r. From cwg at falma.de Tue Oct 11 11:57:06 2011 From: cwg at falma.de (Christoph Groth) Date: Tue, 11 Oct 2011 17:57:06 +0200 Subject: [Numpy-discussion] speeding up operations on small vectors References: <87mxd8dkkb.fsf@falma.de> <87wrcb913b.fsf@falma.de> Message-ID: <87sjmz8qst.fsf@falma.de> Pauli Virtanen writes: >> Thank you for your suggestion. It doesn't help me however, because >> the algorithm I'm _really_ trying to speed up cannot be vectorized >> with numpy in the way you vectorized my toy example. >> >> Any other ideas? > > Reformulate the problem so that it can be vectorized. Without knowing > more about the actual algorithm you are trying to implement, it's not > easy to give more detailed help. My question was about ways to achieve a speedup without modifying the algorithm. I was hoping that there is some numpy-like library for python which for small arrays achieves a performance at least on par with the implementation using tuples. This should be possible technically. The actual problem I'm trying to solve is finding those points of a n-dimensional lattice which belong to an implicitly given shape. The input is a lattice (specified in the most simple case by n n-dimensional vectors, i.e. a n-by-n matrix), a starting point on that lattice, and a shape function which returns True if a point belongs to the shape, and False if it does not. The output is an iterable over the lattice points which belong to the shape. To generate the output, the algorithm (flood-fill) recursively examines the starting point and its neighbors, calling for each of them the shape function. There are various variants of this algorithm, but all of them rely on the same basic operations. To my knowledge, it is not possible to vectorize this algorithm using numpy. One can vectorize it if a bounding box for the shape is known in advance, but this is not very efficient as all the lattice points inside the bounding box are checked. Christoph From jsseabold at gmail.com Tue Oct 11 12:11:01 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 11 Oct 2011 12:11:01 -0400 Subject: [Numpy-discussion] speeding up operations on small vectors In-Reply-To: <87sjmz8qst.fsf@falma.de> References: <87mxd8dkkb.fsf@falma.de> <87wrcb913b.fsf@falma.de> <87sjmz8qst.fsf@falma.de> Message-ID: On Tue, Oct 11, 2011 at 11:57 AM, Christoph Groth wrote: > Pauli Virtanen writes: > >>> Thank you for your suggestion. ?It doesn't help me however, because >>> the algorithm I'm _really_ trying to speed up cannot be vectorized >>> with numpy in the way you vectorized my toy example. >>> >>> Any other ideas? >> >> Reformulate the problem so that it can be vectorized. Without knowing >> more about the actual algorithm you are trying to implement, it's not >> easy to give more detailed help. > > My question was about ways to achieve a speedup without modifying the > algorithm. ?I was hoping that there is some numpy-like library for > python which for small arrays achieves a performance at least on par > with the implementation using tuples. ?This should be possible > technically. So it's the dot function being called repeatedly on smallish arrays that's the bottleneck? I've run into this as well. See this thread [1]. You might gain some speed if you drop it down into Cython, some examples in that thread. If you're still up against it, you can try the C code that Fernando posted for fast matrix multiplication (I haven't yet), or you might be able to do well to use tokyo from Cython since Wes' has fixed it up [2]. I'd be very interested to hear if you achieve a great speed-up with cython+tokyo. Cheers, Skipper [1] http://mail.scipy.org/pipermail/scipy-user/2010-December/thread.html#27791 [2] https://github.com/wesm/tokyo From pav at iki.fi Tue Oct 11 12:20:39 2011 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 11 Oct 2011 18:20:39 +0200 Subject: [Numpy-discussion] speeding up operations on small vectors In-Reply-To: <87sjmz8qst.fsf@falma.de> References: <87mxd8dkkb.fsf@falma.de> <87wrcb913b.fsf@falma.de> <87sjmz8qst.fsf@falma.de> Message-ID: 11.10.2011 17:57, Christoph Groth kirjoitti: [clip] > My question was about ways to achieve a speedup without modifying the > algorithm. I was hoping that there is some numpy-like library for > python which for small arrays achieves a performance at least on par > with the implementation using tuples. This should be possible > technically. I'm not aware of such a library. Writing one e.g. with Cython should be quite straightforward, however. [clip] > To generate the output, the algorithm (flood-fill) recursively examines > the starting point and its neighbors, calling for each of them the shape > function. There are various variants of this algorithm, but all of them > rely on the same basic operations. > > To my knowledge, it is not possible to vectorize this algorithm using > numpy. One can vectorize it if a bounding box for the shape is known in > advance, but this is not very efficient as all the lattice points inside > the bounding box are checked. The only way to vectorize this I see is to do write the floodfill algorithm on rectangular supercells, so that the constant costs are amortized. Sounds a bit messy to do, though. -- Pauli Virtanen From shish at keba.be Tue Oct 11 12:26:49 2011 From: shish at keba.be (Olivier Delalleau) Date: Tue, 11 Oct 2011 12:26:49 -0400 Subject: [Numpy-discussion] speeding up operations on small vectors In-Reply-To: References: <87mxd8dkkb.fsf@falma.de> <87wrcb913b.fsf@falma.de> <87sjmz8qst.fsf@falma.de> Message-ID: 2011/10/11 Skipper Seabold > On Tue, Oct 11, 2011 at 11:57 AM, Christoph Groth wrote: > > Pauli Virtanen writes: > > > >>> Thank you for your suggestion. It doesn't help me however, because > >>> the algorithm I'm _really_ trying to speed up cannot be vectorized > >>> with numpy in the way you vectorized my toy example. > >>> > >>> Any other ideas? > >> > >> Reformulate the problem so that it can be vectorized. Without knowing > >> more about the actual algorithm you are trying to implement, it's not > >> easy to give more detailed help. > > > > My question was about ways to achieve a speedup without modifying the > > algorithm. I was hoping that there is some numpy-like library for > > python which for small arrays achieves a performance at least on par > > with the implementation using tuples. This should be possible > > technically. > > So it's the dot function being called repeatedly on smallish arrays > that's the bottleneck? I've run into this as well. See this thread > [1]. You might gain some speed if you drop it down into Cython, some > examples in that thread. If you're still up against it, you can try > the C code that Fernando posted for fast matrix multiplication (I > haven't yet), or you might be able to do well to use tokyo from Cython > since Wes' has fixed it up [2]. > > I'd be very interested to hear if you achieve a great speed-up with > cython+tokyo. > > Cheers, > > Skipper > > [1] > http://mail.scipy.org/pipermail/scipy-user/2010-December/thread.html#27791 > [2] https://github.com/wesm/tokyo > > Another idea would be to use Theano ( http://deeplearning.net/software/theano/). It's a bit overkill though and you would need to express most of your algorithm in a symbolic way to be able to take advantage of it. You would then be able to write your own C code to do the array operations that are too slow when relying on numpy. If you are interested in pursuing this direction though, let me know and I can give you a few pointers. -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From derek at astro.physik.uni-goettingen.de Tue Oct 11 12:27:04 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Tue, 11 Oct 2011 18:27:04 +0200 Subject: [Numpy-discussion] genfromtxt In-Reply-To: References: Message-ID: Hi Nils, On 11 Oct 2011, at 16:34, Nils Wagner wrote: > How do I use genfromtxt to read a file with the following > lines > > 1 1 2.2592365264892578D+01 > 2 2 2.2592365264892578D+01 > 1 3 2.6666669845581055D+00 > 3 3 2.2592365264892578D+01 > 2 4 2.6666669845581055D+00 > 4 4 2.2592365264892578D+01 > 3 5 2.6666669845581055D+00 > 5 5 2.2592365264892578D+01 > 4 6 2.6666669845581055D+00 > 6 6 2.2592365264892578D+01 > 1 7 2.9814243316650391D+00 > 7 7 1.7259031295776367D+01 > 2 8 2.9814243316650391D+00 > 8 8 1.7259031295776367D+01 > ... > > > names =("i","j","v") > A = > np.genfromtxt('bmll.mtl',dtype=[('i','int'),('j','int'),('v','d')],names=names) > V = A[:]['v'] > >>>> V > array([ NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, > NaN, NaN, NaN, > NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, > NaN, NaN, NaN, > NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, > NaN, NaN, NaN, > NaN, NaN, NaN]) > > yields NaN, while > > convertfunc = lambda x: x.replace('D','E') > names =("i","j","v") > A = > np.genfromtxt('bmll.mtl',dtype=[('i','int'),('j','int'),('v','|S24')],names=names,converters={"v":convertfunc}) > V = A[:]['v'].astype(float) >>>> V > array([ 22.59236526, 22.59236526, 2.66666698, > 22.59236526, > 2.66666698, 22.59236526, 2.66666698, > 22.59236526, > 2.66666698, 22.59236526, 2.98142433, > 17.2590313 , > 2.98142433, 17.2590313 , 2.98142433, > 2.98142433, > 2.66666698, 22.59236526, 2.98142433, > 2.98142433, > 2.66666698, 22.59236526, 2.98142433, > 2.98142433, > 2.66666698, 22.59236526, 2.98142433, > 2.98142433, > 2.66666698, 22.59236526, 2.98142433, > 2.66666698, > 17.2590313 , 2.98142433, 2.66666698, > 17.2590313 ]) > > > works fine. took me a moment to figure out what the actual problem remaining was, but expect you'd prefer it to load directly into a float record? The problem is simply that the converter _replaces_ the default converter function (which would be float(x) in this case), rather than operating on top of it. Try instead convertfunc = lambda x: float(x.replace('D','E')) and you should be ready to use ('v', 'd') as dtype (BTW, specifying 'names' is redundant in the above example). This behaviour is only hinted at in the docstring example, so maybe the documentation should be clearer here. Cheers, Derek From cwg at falma.de Tue Oct 11 12:36:49 2011 From: cwg at falma.de (Christoph Groth) Date: Tue, 11 Oct 2011 18:36:49 +0200 Subject: [Numpy-discussion] speeding up operations on small vectors References: <87mxd8dkkb.fsf@falma.de> <87wrcb913b.fsf@falma.de> <87sjmz8qst.fsf@falma.de> Message-ID: <87ipnvzdr2.fsf@falma.de> >> My question was about ways to achieve a speedup without modifying the >> algorithm. I was hoping that there is some numpy-like library for >> python which for small arrays achieves a performance at least on par >> with the implementation using tuples. This should be possible >> technically. > > I'm not aware of such a library. Writing one e.g. with Cython should be > quite straightforward, however. That's what I'll probably end up doing... From cwg at falma.de Tue Oct 11 12:41:36 2011 From: cwg at falma.de (Christoph Groth) Date: Tue, 11 Oct 2011 18:41:36 +0200 Subject: [Numpy-discussion] speeding up operations on small vectors References: <87mxd8dkkb.fsf@falma.de> <87wrcb913b.fsf@falma.de> <87sjmz8qst.fsf@falma.de> Message-ID: <87ehyjzdj3.fsf@falma.de> Skipper Seabold writes: > So it's the dot function being called repeatedly on smallish arrays > that's the bottleneck? I've run into this as well. See this thread > [1]. > (...) Thanks for the links. "tokyo" is interesting, though I fear the intermediate matrix size regime where it really makes a difference will be rather small. My concern is in really tiny vectors, where it's not even worth to call BLAS. > I'd be very interested to hear if you achieve a great speed-up with > cython+tokyo. I try to solve this problem in some way or other. I'll post here if I end up with something interesting. Christoph From jsseabold at gmail.com Tue Oct 11 13:06:01 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 11 Oct 2011 13:06:01 -0400 Subject: [Numpy-discussion] speeding up operations on small vectors In-Reply-To: <87ehyjzdj3.fsf@falma.de> References: <87mxd8dkkb.fsf@falma.de> <87wrcb913b.fsf@falma.de> <87sjmz8qst.fsf@falma.de> <87ehyjzdj3.fsf@falma.de> Message-ID: On Tue, Oct 11, 2011 at 12:41 PM, Christoph Groth wrote: > Skipper Seabold writes: > >> So it's the dot function being called repeatedly on smallish arrays >> that's the bottleneck? I've run into this as well. See this thread >> [1]. >> (...) > > Thanks for the links. ?"tokyo" is interesting, though I fear the > intermediate matrix size regime where it really makes a difference will > be rather small. ?My concern is in really tiny vectors, where it's not > even worth to call BLAS. > IIUC, it's not so much the BLAS that's helpful but avoiding the overhead in calling numpy.dot from cython. >> I'd be very interested to hear if you achieve a great speed-up with >> cython+tokyo. > > I try to solve this problem in some way or other. ?I'll post here if I > end up with something interesting. Please do. Skipper From bsouthey at gmail.com Tue Oct 11 13:29:41 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 11 Oct 2011 12:29:41 -0500 Subject: [Numpy-discussion] speeding up operations on small vectors In-Reply-To: References: <87mxd8dkkb.fsf@falma.de> <87wrcb913b.fsf@falma.de> <87sjmz8qst.fsf@falma.de> <87ehyjzdj3.fsf@falma.de> Message-ID: <4E947D05.3010106@gmail.com> On 10/11/2011 12:06 PM, Skipper Seabold wrote: > On Tue, Oct 11, 2011 at 12:41 PM, Christoph Groth wrote: >> Skipper Seabold writes: >> >>> So it's the dot function being called repeatedly on smallish arrays >>> that's the bottleneck? I've run into this as well. See this thread >>> [1]. >>> (...) >> Thanks for the links. "tokyo" is interesting, though I fear the >> intermediate matrix size regime where it really makes a difference will >> be rather small. My concern is in really tiny vectors, where it's not >> even worth to call BLAS. >> > IIUC, it's not so much the BLAS that's helpful but avoiding the > overhead in calling numpy.dot from cython. > >>> I'd be very interested to hear if you achieve a great speed-up with >>> cython+tokyo. >> I try to solve this problem in some way or other. I'll post here if I >> end up with something interesting. > Please do. > > Skipper > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion In the example, M is an identity 2 by 2 array. This creates a lot of overhead in creating arrays from a tuple followed by two dot operations. But the tuple code is not exactly equivalent because M is 'expanded' into a single dimension to avoid some of the unnecessary multiplications. Thus the tuple code is already a different algorithm than the numpy code so the comparison is not really correct. All that is needed here for looping over scalar values of x, y and radius is to evaluate (x*x + y*y) < radius**2 That could probably be done with array multiplication and broadcasting. Bruce From matthew.brett at gmail.com Tue Oct 11 14:00:31 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 11 Oct 2011 14:00:31 -0400 Subject: [Numpy-discussion] Rounding to next lowest float Message-ID: Hi, Can anyone think of a clever way to round an integer to the next lowest integer represented in a particular floating point format? For example: In [247]: a = 2**25+3 This is out of range of the continuous integers representable by float32, hence: In [248]: print a, int(np.float32(a)) 33554435 33554436 But I want to round down (floor) the integer in float32. That is, in this case I want: >>> floor_exact(a, np.float32) 33554432 I can break the float into its parts to do it: https://github.com/matthew-brett/nibabel/blob/f687bfc88d1676a09fc76c968a346bc81e4d0d04/nibabel/floating.py but that's obviously rather ugly... Is there a simpler way? I'm sure there is and I haven't thought of it... Best, Matthew From matthew.brett at gmail.com Tue Oct 11 14:06:21 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 11 Oct 2011 14:06:21 -0400 Subject: [Numpy-discussion] Nice float -> integer conversion? Message-ID: Hi, Have I missed a fast way of doing nice float to integer conversion? By nice I mean, rounding to the nearest integer, converting NaN to 0, inf, -inf to the max and min of the integer range? The astype method and cast functions don't do what I need here: In [40]: np.array([1.6, np.nan, np.inf, -np.inf]).astype(np.int16) Out[40]: array([1, 0, 0, 0], dtype=int16) In [41]: np.cast[np.int16](np.array([1.6, np.nan, np.inf, -np.inf])) Out[41]: array([1, 0, 0, 0], dtype=int16) Have I missed something obvious? See y'all, Matthew From matthew.brett at gmail.com Tue Oct 11 14:17:47 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 11 Oct 2011 14:17:47 -0400 Subject: [Numpy-discussion] float128 casting rounding as if it were float64 Message-ID: Hi, While struggling with floating point precision, I ran into this: In [52]: a = 2**54+3 In [53]: a Out[53]: 18014398509481987L In [54]: np.float128(a) Out[54]: 18014398509481988.0 In [55]: np.float128(a)-1 Out[55]: 18014398509481987.0 The line above tells us that float128 can exactly represent 2**54+3, but the line above that says that np.float128(2**54+3) rounds upwards as if it were a float64: In [59]: np.float64(a) Out[59]: 18014398509481988.0 In [60]: np.float64(a)-1 Out[60]: 18014398509481988.0 Similarly: In [66]: np.float128('1e308') Out[66]: 1.000000000000000011e+308 In [67]: np.float128('1e309') Out[67]: inf Is it possible that float64 is being used somewhere in float128 casting? Best, Matthew From matthew.brett at gmail.com Tue Oct 11 14:23:15 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 11 Oct 2011 14:23:15 -0400 Subject: [Numpy-discussion] abs for max negative integers - desired behavior? Message-ID: Hi, I recently ran into this: In [68]: arr = np.array(-128, np.int8) In [69]: arr Out[69]: array(-128, dtype=int8) In [70]: np.abs(arr) Out[70]: -128 Of course, I can see why this happens, but it is still surprising, and it seems to me that it would be a confusing source of bugs, because of course it only happens for the maximum negative integer. One particular confusing result was: In [71]: np.allclose(arr, arr) Out[71]: False I wanted to ask whether this is the desired behavior, and whether it might be worth planning a change in the long term? Best, Matthew From matthew.brett at gmail.com Tue Oct 11 14:39:48 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 11 Oct 2011 14:39:48 -0400 Subject: [Numpy-discussion] np.finfo().maxexp confusing Message-ID: Hi, I realize it is probably too late to do anything about this, but: In [72]: info = np.finfo(np.float32) In [73]: info.minexp Out[73]: -126 In [74]: info.maxexp Out[74]: 128 minexp is correct, in that 2**(-126) is the minimum value for the exponent part of float32. But maxexp is not correct, because 2**(127) is the maximum value for the float32 exponent part: http://en.wikipedia.org/wiki/Single_precision_floating-point_format There is the same maxexp+1 feature for the other float types. Is this a sufficiently quiet corner of the API that it might be changed in the future with suitable warnings? Best, Matthew From derek at astro.physik.uni-goettingen.de Tue Oct 11 15:06:25 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Tue, 11 Oct 2011 21:06:25 +0200 Subject: [Numpy-discussion] Nice float -> integer conversion? In-Reply-To: References: Message-ID: On 11 Oct 2011, at 20:06, Matthew Brett wrote: > Have I missed a fast way of doing nice float to integer conversion? > > By nice I mean, rounding to the nearest integer, converting NaN to 0, > inf, -inf to the max and min of the integer range? The astype method > and cast functions don't do what I need here: > > In [40]: np.array([1.6, np.nan, np.inf, -np.inf]).astype(np.int16) > Out[40]: array([1, 0, 0, 0], dtype=int16) > > In [41]: np.cast[np.int16](np.array([1.6, np.nan, np.inf, -np.inf])) > Out[41]: array([1, 0, 0, 0], dtype=int16) > > Have I missed something obvious? np.[a]round comes closer to what you wish (is there consensus that NaN should map to 0?), but not quite there, and it's not really consistent either! In [42]: c = np.zeros(4, np.int16) In [43]: d = np.zeros(4, np.int32) In [44]: np.around([1.6,np.nan,np.inf,-np.inf], out=c) Out[44]: array([2, 0, 0, 0], dtype=int16) In [45]: np.around([1.6,np.nan,np.inf,-np.inf], out=d) Out[45]: array([ 2, -2147483648, -2147483648, -2147483648], dtype=int32) Perhaps a starting point to harmonise this behaviour and get it closer to your expectations (it still would not be really nice having to define the output array first, I guess)... Cheers, Derek From charlesr.harris at gmail.com Tue Oct 11 15:16:39 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 11 Oct 2011 13:16:39 -0600 Subject: [Numpy-discussion] abs for max negative integers - desired behavior? In-Reply-To: References: Message-ID: On Tue, Oct 11, 2011 at 12:23 PM, Matthew Brett wrote: > Hi, > > I recently ran into this: > > In [68]: arr = np.array(-128, np.int8) > > In [69]: arr > Out[69]: array(-128, dtype=int8) > > In [70]: np.abs(arr) > Out[70]: -128 > > This has come up for discussion before, but no consensus was ever reached. One solution is for abs to return an unsigned type, but then combining that with signed type of the same number of bits will cause both to be cast to higher precision. IIRC, matlab was said to return +127 as abs(-128), which, if true, is quite curious. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Oct 11 15:18:58 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 11 Oct 2011 15:18:58 -0400 Subject: [Numpy-discussion] Nice float -> integer conversion? In-Reply-To: References: Message-ID: On Tue, Oct 11, 2011 at 3:06 PM, Derek Homeier wrote: > On 11 Oct 2011, at 20:06, Matthew Brett wrote: > >> Have I missed a fast way of doing nice float to integer conversion? >> >> By nice I mean, rounding to the nearest integer, converting NaN to 0, >> inf, -inf to the max and min of the integer range? ?The astype method >> and cast functions don't do what I need here: >> >> In [40]: np.array([1.6, np.nan, np.inf, -np.inf]).astype(np.int16) >> Out[40]: array([1, 0, 0, 0], dtype=int16) >> >> In [41]: np.cast[np.int16](np.array([1.6, np.nan, np.inf, -np.inf])) >> Out[41]: array([1, 0, 0, 0], dtype=int16) >> >> Have I missed something obvious? > > np.[a]round comes closer to what you wish (is there consensus > that NaN should map to 0?), but not quite there, and it's not really > consistent either! > > In [42]: c = np.zeros(4, np.int16) > In [43]: d = np.zeros(4, np.int32) > In [44]: np.around([1.6,np.nan,np.inf,-np.inf], out=c) > Out[44]: array([2, 0, 0, 0], dtype=int16) > > In [45]: np.around([1.6,np.nan,np.inf,-np.inf], out=d) > Out[45]: array([ ? ? ? ? ?2, -2147483648, -2147483648, -2147483648], dtype=int32) > > Perhaps a starting point to harmonise this behaviour and get it closer to > your expectations (it still would not be really nice having to define the > output array first, I guess)... what numpy is this? >>> np.array([1.6, np.nan, np.inf, -np.inf]).astype(np.int16) array([ 1, -32768, -32768, -32768], dtype=int16) >>> np.__version__ '1.5.1' >>> a = np.ones(4, np.int16) >>> a[:]=np.array([1.6, np.nan, np.inf, -np.inf]) >>> a array([ 1, -32768, -32768, -32768], dtype=int16) I thought we get ValueError to avoid nan to zero bugs >>> a[2] = np.nan Traceback (most recent call last): File "", line 1, in a[2] = np.nan ValueError: cannot convert float NaN to integer Josef > > Cheers, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Derek > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From cjwilliams43 at gmail.com Tue Oct 11 15:20:24 2011 From: cjwilliams43 at gmail.com (Colin J. Williams) Date: Tue, 11 Oct 2011 15:20:24 -0400 Subject: [Numpy-discussion] Rounding to next lowest float In-Reply-To: References: Message-ID: <4E9496F8.3090704@gmail.com> If you are using integers, why not use Python's Long? Colin W. On 11/10/2011 2:00 PM, Matthew Brett wrote: > Hi, > > Can anyone think of a clever way to round an integer to the next > lowest integer represented in a particular floating point format? > > For example: > > In [247]: a = 2**25+3 > > This is out of range of the continuous integers representable by float32, hence: > > In [248]: print a, int(np.float32(a)) > 33554435 33554436 > > But I want to round down (floor) the integer in float32. That is, in > this case I want: > >>>> floor_exact(a, np.float32) > 33554432 > > I can break the float into its parts to do it: > > https://github.com/matthew-brett/nibabel/blob/f687bfc88d1676a09fc76c968a346bc81e4d0d04/nibabel/floating.py > > but that's obviously rather ugly... Is there a simpler way? I'm sure > there is and I haven't thought of it... > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Oct 11 15:32:13 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 11 Oct 2011 15:32:13 -0400 Subject: [Numpy-discussion] Nice float -> integer conversion? In-Reply-To: References: Message-ID: Hi, On Tue, Oct 11, 2011 at 3:06 PM, Derek Homeier wrote: > On 11 Oct 2011, at 20:06, Matthew Brett wrote: > >> Have I missed a fast way of doing nice float to integer conversion? >> >> By nice I mean, rounding to the nearest integer, converting NaN to 0, >> inf, -inf to the max and min of the integer range? ?The astype method >> and cast functions don't do what I need here: >> >> In [40]: np.array([1.6, np.nan, np.inf, -np.inf]).astype(np.int16) >> Out[40]: array([1, 0, 0, 0], dtype=int16) >> >> In [41]: np.cast[np.int16](np.array([1.6, np.nan, np.inf, -np.inf])) >> Out[41]: array([1, 0, 0, 0], dtype=int16) >> >> Have I missed something obvious? > > np.[a]round comes closer to what you wish (is there consensus > that NaN should map to 0?), but not quite there, and it's not really > consistent either! > > In [42]: c = np.zeros(4, np.int16) > In [43]: d = np.zeros(4, np.int32) > In [44]: np.around([1.6,np.nan,np.inf,-np.inf], out=c) > Out[44]: array([2, 0, 0, 0], dtype=int16) > > In [45]: np.around([1.6,np.nan,np.inf,-np.inf], out=d) > Out[45]: array([ ? ? ? ? ?2, -2147483648, -2147483648, -2147483648], dtype=int32) > > Perhaps a starting point to harmonise this behaviour and get it closer to > your expectations (it still would not be really nice having to define the > output array first, I guess)... Thanks - it hadn't occurred to me to try around with an output array - an interesting idea. But - isn't this different but just as bad? Best, Matthew From matthew.brett at gmail.com Tue Oct 11 15:36:23 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 11 Oct 2011 15:36:23 -0400 Subject: [Numpy-discussion] abs for max negative integers - desired behavior? In-Reply-To: References: Message-ID: Hi, On Tue, Oct 11, 2011 at 3:16 PM, Charles R Harris wrote: > > > On Tue, Oct 11, 2011 at 12:23 PM, Matthew Brett > wrote: >> >> Hi, >> >> I recently ran into this: >> >> In [68]: arr = np.array(-128, np.int8) >> >> In [69]: arr >> Out[69]: array(-128, dtype=int8) >> >> In [70]: np.abs(arr) >> Out[70]: -128 >> > > This has come up for discussion before, but no consensus was ever reached. > One solution is for abs to return an unsigned type, but then combining that > with signed type of the same number of bits will cause both to be cast to > higher precision. IIRC, matlab was said to return +127 as abs(-128), which, > if true, is quite curious. Ah - sorry - I think I missed the previous discussion. The conversion to unsigned seemed like an great improvement. Are you saying that the cost down the line is an increase in memory use for arrays which are then combined with a signed type? That seems like a reasonable trade-off to me. Was that the main objection? See you, Matthew From matthew.brett at gmail.com Tue Oct 11 15:43:01 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 11 Oct 2011 15:43:01 -0400 Subject: [Numpy-discussion] Rounding to next lowest float In-Reply-To: <4E9496F8.3090704@gmail.com> References: <4E9496F8.3090704@gmail.com> Message-ID: Hi, On Tue, Oct 11, 2011 at 3:20 PM, Colin J. Williams wrote: > If you are using integers, why not use Python's Long? You mean, why do I need to know the next lowest representable integer in a float type? It's because I have a floating point array that I'm converting to integers, and I'm trying to find the right threshold for clipping the floating point array, to stop this happening: In [84]: np.array([2**32], np.float).astype(np.int32) Out[84]: array([-2147483648]) I need then to clip my floating point input array at some maximum floating point value ``M``, such that in_type(M).astype(out_type) <= X, where X is the maximum of the out_type. Best, Matthew From matthew.brett at gmail.com Tue Oct 11 15:51:32 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 11 Oct 2011 15:51:32 -0400 Subject: [Numpy-discussion] abs for max negative integers - desired behavior? In-Reply-To: References: Message-ID: Hi On Tue, Oct 11, 2011 at 3:16 PM, Charles R Harris wrote: > > > On Tue, Oct 11, 2011 at 12:23 PM, Matthew Brett > wrote: >> >> Hi, >> >> I recently ran into this: >> >> In [68]: arr = np.array(-128, np.int8) >> >> In [69]: arr >> Out[69]: array(-128, dtype=int8) >> >> In [70]: np.abs(arr) >> Out[70]: -128 >> > > This has come up for discussion before, but no consensus was ever reached. > One solution is for abs to return an unsigned type, but then combining that > with signed type of the same number of bits will cause both to be cast to > higher precision. IIRC, matlab was said to return +127 as abs(-128), which, > if true, is quite curious. octave-3.2.3:1> a = int8([-128, 127]) a = -128 127 octave-3.2.3:2> abs(a) ans = 127 127 Matlab is the same. That is curious... See you, Matthew From matthew.brett at gmail.com Tue Oct 11 16:55:50 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 11 Oct 2011 16:55:50 -0400 Subject: [Numpy-discussion] np.finfo().maxexp confusing In-Reply-To: References: Message-ID: Hi, On Tue, Oct 11, 2011 at 2:39 PM, Matthew Brett wrote: > Hi, > > I realize it is probably too late to do anything about this, but: > > In [72]: info = np.finfo(np.float32) > > In [73]: info.minexp > Out[73]: -126 > > In [74]: info.maxexp > Out[74]: 128 > > minexp is correct, in that 2**(-126) is the minimum value for the > exponent part of float32. ?But maxexp is not correct, because 2**(127) > is the maximum value for the float32 exponent part: > > http://en.wikipedia.org/wiki/Single_precision_floating-point_format > > There is the same maxexp+1 feature for the other float types. > > Is this a sufficiently quiet corner of the API that it might be > changed in the future with suitable warnings? Ah - I just found this: http://docs.scipy.org/doc/numpy/reference/generated/numpy.MachAr.html#numpy.MachAr which explains that: maxexp int Smallest (positive) power of ibeta that causes overflow. which is - unpleasantly named, but at least, clearly documented - if not at the point of use. When I've got better internet access, I'll try and modify the finfo docstring to explain. See you, Matthew From derek at astro.physik.uni-goettingen.de Tue Oct 11 17:30:46 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Tue, 11 Oct 2011 23:30:46 +0200 Subject: [Numpy-discussion] Nice float -> integer conversion? In-Reply-To: References: Message-ID: <0FA9B288-AADE-40A2-B3B6-7F0BF1970BF6@astro.physik.uni-goettingen.de> On 11.10.2011, at 9:18PM, josef.pktd at gmail.com wrote: >> >> In [42]: c = np.zeros(4, np.int16) >> In [43]: d = np.zeros(4, np.int32) >> In [44]: np.around([1.6,np.nan,np.inf,-np.inf], out=c) >> Out[44]: array([2, 0, 0, 0], dtype=int16) >> >> In [45]: np.around([1.6,np.nan,np.inf,-np.inf], out=d) >> Out[45]: array([ 2, -2147483648, -2147483648, -2147483648], dtype=int32) >> >> Perhaps a starting point to harmonise this behaviour and get it closer to >> your expectations (it still would not be really nice having to define the >> output array first, I guess)... > > what numpy is this? > This was 1.6.1 I did suppress a RuntimeWarning that was raised on the first call, though: In [33]: np.around([1.67,np.nan,np.inf,-np.inf], decimals=1, out=d) /sw/lib/python2.7/site-packages/numpy/core/fromnumeric.py:37: RuntimeWarning: invalid value encountered in multiply result = getattr(asarray(obj),method)(*args, **kwds) >>>> np.array([1.6, np.nan, np.inf, -np.inf]).astype(np.int16) > array([ 1, -32768, -32768, -32768], dtype=int16) >>>> np.__version__ > '1.5.1' >>>> a = np.ones(4, np.int16) >>>> a[:]=np.array([1.6, np.nan, np.inf, -np.inf]) >>>> a > array([ 1, -32768, -32768, -32768], dtype=int16) > > > I thought we get ValueError to avoid nan to zero bugs > >>>> a[2] = np.nan > Traceback (most recent call last): > File "", line 1, in > a[2] = np.nan > ValueError: cannot convert float NaN to integer On master, an integer out raises a TypeError for any float input - not sure I'd consider that an improvement? >>> np.__version__ '2.0.0.dev-8f689df' >>> np.around([1.6,-23.42, -13.98, 0.14], out=c) Traceback (most recent call last): File "", line 1, in File "/Users/derek/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 2277, in around return _wrapit(a, 'round', decimals, out) File "/Users/derek/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 37, in _wrapit result = getattr(asarray(obj),method)(*args, **kwds) TypeError: ufunc 'rint' output (typecode 'd') could not be coerced to provided output parameter (typecode 'h') according to the casting rule ?same_kind? I thought the NaN might have been dealt with first, before casting to int, but that doesn't seem to be the case (on master, again): >>> np.around([1.6,np.nan,np.inf,-np.inf]) array([ 2., nan, inf, -inf]) >>> np.around([1.6,np.nan,np.inf,-np.inf]).astype(np.int16) array([2, 0, 0, 0], dtype=int16) >>> np.around([1.6,np.nan,np.inf,-np.inf]).astype(np.int32) array([ 2, -2147483648, -2147483648, -2147483648], dtype=int32) Cheers, Derek From matthew.brett at gmail.com Tue Oct 11 17:37:16 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 11 Oct 2011 17:37:16 -0400 Subject: [Numpy-discussion] Nice float -> integer conversion? In-Reply-To: <0FA9B288-AADE-40A2-B3B6-7F0BF1970BF6@astro.physik.uni-goettingen.de> References: <0FA9B288-AADE-40A2-B3B6-7F0BF1970BF6@astro.physik.uni-goettingen.de> Message-ID: Hi, On Tue, Oct 11, 2011 at 5:30 PM, Derek Homeier wrote: > On 11.10.2011, at 9:18PM, josef.pktd at gmail.com wrote: >>> >>> In [42]: c = np.zeros(4, np.int16) >>> In [43]: d = np.zeros(4, np.int32) >>> In [44]: np.around([1.6,np.nan,np.inf,-np.inf], out=c) >>> Out[44]: array([2, 0, 0, 0], dtype=int16) >>> >>> In [45]: np.around([1.6,np.nan,np.inf,-np.inf], out=d) >>> Out[45]: array([ ? ? ? ? ?2, -2147483648, -2147483648, -2147483648], dtype=int32) >>> >>> Perhaps a starting point to harmonise this behaviour and get it closer to >>> your expectations (it still would not be really nice having to define the >>> output array first, I guess)... >> >> what numpy is this? >> > This was 1.6.1 > I did suppress a RuntimeWarning that was raised on the first call, though: > In [33]: np.around([1.67,np.nan,np.inf,-np.inf], decimals=1, out=d) > /sw/lib/python2.7/site-packages/numpy/core/fromnumeric.py:37: RuntimeWarning: invalid value encountered in multiply > ?result = getattr(asarray(obj),method)(*args, **kwds) > >>>>> np.array([1.6, np.nan, np.inf, -np.inf]).astype(np.int16) >> array([ ? ? 1, -32768, -32768, -32768], dtype=int16) >>>>> np.__version__ >> '1.5.1' >>>>> a = np.ones(4, np.int16) >>>>> a[:]=np.array([1.6, np.nan, np.inf, -np.inf]) >>>>> a >> array([ ? ? 1, -32768, -32768, -32768], dtype=int16) >> >> >> I thought we get ValueError to avoid nan to zero bugs >> >>>>> a[2] = np.nan >> Traceback (most recent call last): >> ?File "", line 1, in >> ? ?a[2] = np.nan >> ValueError: cannot convert float NaN to integer > > On master, an integer out raises a TypeError for any float input - not sure I'd > consider that an improvement? > >>>> np.__version__ > '2.0.0.dev-8f689df' >>>> np.around([1.6,-23.42, -13.98, 0.14], out=c) > Traceback (most recent call last): > ?File "", line 1, in > ?File "/Users/derek/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 2277, in around > ? ?return _wrapit(a, 'round', decimals, out) > ?File "/Users/derek/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 37, in _wrapit > ? ?result = getattr(asarray(obj),method)(*args, **kwds) > TypeError: ufunc 'rint' output (typecode 'd') could not be coerced to provided output parameter (typecode 'h') according to the casting rule ?same_kind? > > I thought the NaN might have been dealt ?with first, before casting to int, > but that doesn't seem to be the case (on master, again): > >>>> np.around([1.6,np.nan,np.inf,-np.inf]) > array([ ?2., ?nan, ?inf, -inf]) >>>> np.around([1.6,np.nan,np.inf,-np.inf]).astype(np.int16) > array([2, 0, 0, 0], dtype=int16) >>>> np.around([1.6,np.nan,np.inf,-np.inf]).astype(np.int32) > array([ ? ? ? ? ?2, -2147483648, -2147483648, -2147483648], dtype=int32) Just to whet the appetite: In [85]: for t in np.sctypes['int'] + np.sctypes['uint']: ....: print np.array([np.nan], float).astype(t) ....: [0] [0] [-2147483648] [-2147483648] [-9223372036854775808] [0] [0] [2147483648] [2147483648] [9223372036854775808] In [89]: for t in np.sctypes['int'] + np.sctypes['uint']: ....: print np.around(np.array([np.nan], float), out=np.zeros(1, t)) ....: /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/numpy/core/fromnumeric.py:2278: RuntimeWarning: invalid value encountered in rint return round(decimals, out) [0] [0] [-2147483648] [-2147483648] [-9223372036854775808] [0] [0] [2147483648] [2147483648] [9223372036854775808] In [86]: np.__version__ Out[86]: '1.6.1' Maybe it would be good to have a np.nice_round function? See you, Matthew From ben.root at ou.edu Tue Oct 11 19:13:06 2011 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 11 Oct 2011 18:13:06 -0500 Subject: [Numpy-discussion] abs for max negative integers - desired behavior? In-Reply-To: References: Message-ID: On Tue, Oct 11, 2011 at 2:51 PM, Matthew Brett wrote: > Hi > > On Tue, Oct 11, 2011 at 3:16 PM, Charles R Harris > wrote: > > > > > > On Tue, Oct 11, 2011 at 12:23 PM, Matthew Brett > > > wrote: > >> > >> Hi, > >> > >> I recently ran into this: > >> > >> In [68]: arr = np.array(-128, np.int8) > >> > >> In [69]: arr > >> Out[69]: array(-128, dtype=int8) > >> > >> In [70]: np.abs(arr) > >> Out[70]: -128 > >> > > > > This has come up for discussion before, but no consensus was ever > reached. > > One solution is for abs to return an unsigned type, but then combining > that > > with signed type of the same number of bits will cause both to be cast to > > higher precision. IIRC, matlab was said to return +127 as abs(-128), > which, > > if true, is quite curious. > > octave-3.2.3:1> a = int8([-128, 127]) > a = > > -128 127 > > octave-3.2.3:2> abs(a) > ans = > > 127 127 > > Matlab is the same. That is curious... > > See you, > > Matthew > Well, it _is_ only off by 0.78%. That should be good enough for government work, right? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Tue Oct 11 19:32:35 2011 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 11 Oct 2011 18:32:35 -0500 Subject: [Numpy-discussion] Nice float -> integer conversion? In-Reply-To: References: Message-ID: On Tue, Oct 11, 2011 at 2:06 PM, Derek Homeier < derek at astro.physik.uni-goettingen.de> wrote: > On 11 Oct 2011, at 20:06, Matthew Brett wrote: > > > Have I missed a fast way of doing nice float to integer conversion? > > > > By nice I mean, rounding to the nearest integer, converting NaN to 0, > > inf, -inf to the max and min of the integer range? The astype method > > and cast functions don't do what I need here: > > > > In [40]: np.array([1.6, np.nan, np.inf, -np.inf]).astype(np.int16) > > Out[40]: array([1, 0, 0, 0], dtype=int16) > > > > In [41]: np.cast[np.int16](np.array([1.6, np.nan, np.inf, -np.inf])) > > Out[41]: array([1, 0, 0, 0], dtype=int16) > > > > Have I missed something obvious? > > np.[a]round comes closer to what you wish (is there consensus > that NaN should map to 0?), but not quite there, and it's not really > consistent either! > > In a way, there is already consensus in the code. np.nan_to_num() by default converts nans to zero, and the infinities go to very large and very small. >>> np.set_printoptions(precision=8) >>> x = np.array([np.inf, -np.inf, np.nan, -128, 128]) >>> np.nan_to_num(x) array([ 1.79769313e+308, -1.79769313e+308, 0.00000000e+000, -1.28000000e+002, 1.28000000e+002]) Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Oct 11 19:33:44 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 11 Oct 2011 19:33:44 -0400 Subject: [Numpy-discussion] abs for max negative integers - desired behavior? In-Reply-To: References: Message-ID: On Tue, Oct 11, 2011 at 7:13 PM, Benjamin Root wrote: > On Tue, Oct 11, 2011 at 2:51 PM, Matthew Brett > wrote: >> >> Hi >> >> On Tue, Oct 11, 2011 at 3:16 PM, Charles R Harris >> wrote: >> > >> > >> > On Tue, Oct 11, 2011 at 12:23 PM, Matthew Brett >> > >> > wrote: >> >> >> >> Hi, >> >> >> >> I recently ran into this: >> >> >> >> In [68]: arr = np.array(-128, np.int8) >> >> >> >> In [69]: arr >> >> Out[69]: array(-128, dtype=int8) >> >> >> >> In [70]: np.abs(arr) >> >> Out[70]: -128 >> >> >> > >> > This has come up for discussion before, but no consensus was ever >> > reached. >> > One solution is for abs to return an unsigned type, but then combining >> > that >> > with signed type of the same number of bits will cause both to be cast >> > to >> > higher precision. IIRC, matlab was said to return +127 as abs(-128), >> > which, >> > if true, is quite curious. >> >> octave-3.2.3:1> a = int8([-128, 127]) >> a = >> >> ?-128 ?127 >> >> octave-3.2.3:2> abs(a) >> ans = >> >> ?127 ?127 >> >> Matlab is the same. ?That is curious... >> >> See you, >> >> Matthew > > Well, it _is_ only off by 0.78%. That should be good enough for government > work, right? So, which government is using numpy, only off by 200% Josef > > Ben Root > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From ben.root at ou.edu Tue Oct 11 20:05:51 2011 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 11 Oct 2011 19:05:51 -0500 Subject: [Numpy-discussion] abs for max negative integers - desired behavior? In-Reply-To: References: Message-ID: On Tue, Oct 11, 2011 at 6:33 PM, wrote: > On Tue, Oct 11, 2011 at 7:13 PM, Benjamin Root wrote: > > On Tue, Oct 11, 2011 at 2:51 PM, Matthew Brett > > wrote: > >> > >> Hi > >> > >> On Tue, Oct 11, 2011 at 3:16 PM, Charles R Harris > >> wrote: > >> > > >> > > >> > On Tue, Oct 11, 2011 at 12:23 PM, Matthew Brett > >> > > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> I recently ran into this: > >> >> > >> >> In [68]: arr = np.array(-128, np.int8) > >> >> > >> >> In [69]: arr > >> >> Out[69]: array(-128, dtype=int8) > >> >> > >> >> In [70]: np.abs(arr) > >> >> Out[70]: -128 > >> >> > >> > > >> > This has come up for discussion before, but no consensus was ever > >> > reached. > >> > One solution is for abs to return an unsigned type, but then combining > >> > that > >> > with signed type of the same number of bits will cause both to be cast > >> > to > >> > higher precision. IIRC, matlab was said to return +127 as abs(-128), > >> > which, > >> > if true, is quite curious. > >> > >> octave-3.2.3:1> a = int8([-128, 127]) > >> a = > >> > >> -128 127 > >> > >> octave-3.2.3:2> abs(a) > >> ans = > >> > >> 127 127 > >> > >> Matlab is the same. That is curious... > >> > >> See you, > >> > >> Matthew > > > > Well, it _is_ only off by 0.78%. That should be good enough for > government > > work, right? > > So, which government is using numpy, only off by 200% > > Josef > > Not government, but maybe Lockheed-Martin when they were doing that Mars probe? "What? It was negative? Well, that explains why it went down, not up!" ::rimshot:: Thank you folks! I will be here all week! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Wed Oct 12 03:46:16 2011 From: cournape at gmail.com (David Cournapeau) Date: Wed, 12 Oct 2011 08:46:16 +0100 Subject: [Numpy-discussion] abs for max negative integers - desired behavior? In-Reply-To: References: Message-ID: On Tue, Oct 11, 2011 at 8:16 PM, Charles R Harris wrote: > > > On Tue, Oct 11, 2011 at 12:23 PM, Matthew Brett > wrote: >> >> Hi, >> >> I recently ran into this: >> >> In [68]: arr = np.array(-128, np.int8) >> >> In [69]: arr >> Out[69]: array(-128, dtype=int8) >> >> In [70]: np.abs(arr) >> Out[70]: -128 >> > > This has come up for discussion before, but no consensus was ever reached. > One solution is for abs to return an unsigned type, but then combining that > with signed type of the same number of bits will cause both to be cast to > higher precision. IIRC, matlab was said to return +127 as abs(-128), which, > if true, is quite curious. In C, abs(INT_MIN) is undefined, so both 127 and -128 work :) David From sole at esrf.fr Wed Oct 12 04:18:19 2011 From: sole at esrf.fr (=?ISO-8859-1?Q?=22V=2E_Armando_Sol=E9=22?=) Date: Wed, 12 Oct 2011 10:18:19 +0200 Subject: [Numpy-discussion] abs for max negative integers - desired behavior? In-Reply-To: References: Message-ID: <4E954D4B.7040905@esrf.fr> From a pure user perspective, I would not expect the abs function to return a negative number. Returning +127 plus a warning the first time that happens seems to me a good compromise. Armando On 12/10/2011 09:46, David Cournapeau wrote: > On Tue, Oct 11, 2011 at 8:16 PM, Charles R Harris > wrote: >> >> On Tue, Oct 11, 2011 at 12:23 PM, Matthew Brett >> wrote: >>> Hi, >>> >>> I recently ran into this: >>> >>> In [68]: arr = np.array(-128, np.int8) >>> >>> In [69]: arr >>> Out[69]: array(-128, dtype=int8) >>> >>> In [70]: np.abs(arr) >>> Out[70]: -128 >>> >> This has come up for discussion before, but no consensus was ever reached. >> One solution is for abs to return an unsigned type, but then combining that >> with signed type of the same number of bits will cause both to be cast to >> higher precision. IIRC, matlab was said to return +127 as abs(-128), which, >> if true, is quite curious. > In C, abs(INT_MIN) is undefined, so both 127 and -128 work :) > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From cournape at gmail.com Wed Oct 12 04:46:23 2011 From: cournape at gmail.com (David Cournapeau) Date: Wed, 12 Oct 2011 09:46:23 +0100 Subject: [Numpy-discussion] abs for max negative integers - desired behavior? In-Reply-To: <4E954D4B.7040905@esrf.fr> References: <4E954D4B.7040905@esrf.fr> Message-ID: On Wed, Oct 12, 2011 at 9:18 AM, "V. Armando Sol?" wrote: > ?From a pure user perspective, I would not expect the abs function to > return a negative number. Returning +127 plus a warning the first time > that happens seems to me a good compromise. I guess the question is what's the common context to use small integers in the first place. If it is to save memory, then upcasting may not be the best solution. I may be wrong, but if you decide to use those types in the first place, you need to know about overflows. Abs is just one of them (dividing by -1 is another, although this one actually raises an exception). Detecting it may be costly, but this would need benchmarking. That being said, without context, I don't find 127 a better solution than -128. cheers, David From sole at esrf.fr Wed Oct 12 05:20:31 2011 From: sole at esrf.fr (=?UTF-8?B?IlYuIEFybWFuZG8gU29sw6ki?=) Date: Wed, 12 Oct 2011 11:20:31 +0200 Subject: [Numpy-discussion] abs for max negative integers - desired behavior? In-Reply-To: References: <4E954D4B.7040905@esrf.fr> Message-ID: <4E955BDF.3010704@esrf.fr> On 12/10/2011 10:46, David Cournapeau wrote: > On Wed, Oct 12, 2011 at 9:18 AM, "V. Armando Sol?" wrote: >> From a pure user perspective, I would not expect the abs function to >> return a negative number. Returning +127 plus a warning the first time >> that happens seems to me a good compromise. > I guess the question is what's the common context to use small > integers in the first place. If it is to save memory, then upcasting > may not be the best solution. I may be wrong, but if you decide to use > those types in the first place, you need to know about overflows. Abs > is just one of them (dividing by -1 is another, although this one > actually raises an exception). > > Detecting it may be costly, but this would need benchmarking. > > That being said, without context, I don't find 127 a better solution than -128. Well that choice is just based on getting the closest positive number to the true value (128). The context can be anything, for instance you could be using a look up table based on the result of an integer operation ... In terms of cost, it would imply to evaluate the cost of something like: a = abs(x); if (a < 0) {a -= MIN_INT;} return a; Basically is the cost of the evaluation of an if condition since the content of the block (with or without warning) will bot be executed very often. I find that even raising an exception is better than returning a negative number as result of the abs function. Anyways, I have just tested numpy.array([129], dtype=numpy.int8) and I have got the array as [-127] when I was expecting a sort of unsafe cast error/warning. I guess I will just stop here. In any case, I am very grateful to the mailing list and the original poster for exposing this behavior so that I can keep it in mind. Best regards, Armando From daniele at grinta.net Wed Oct 12 05:28:51 2011 From: daniele at grinta.net (Daniele Nicolodi) Date: Wed, 12 Oct 2011 11:28:51 +0200 Subject: [Numpy-discussion] abs for max negative integers - desired behavior? In-Reply-To: References: Message-ID: <4E955DD3.3050300@grinta.net> On 11/10/11 21:16, Charles R Harris wrote: > IIRC, matlab was said to return > +127 as abs(-128), which, if true, is quite curious. I just checked and this is indeed the case in Matlab 7.10.0 R2010a: >> abs(int8(-128)) ans = 127 Cheers, -- Daniele From cournape at gmail.com Wed Oct 12 08:31:53 2011 From: cournape at gmail.com (David Cournapeau) Date: Wed, 12 Oct 2011 13:31:53 +0100 Subject: [Numpy-discussion] abs for max negative integers - desired behavior? In-Reply-To: <4E955BDF.3010704@esrf.fr> References: <4E954D4B.7040905@esrf.fr> <4E955BDF.3010704@esrf.fr> Message-ID: On 10/12/11, "V. Armando Sol?" wrote: > On 12/10/2011 10:46, David Cournapeau wrote: >> On Wed, Oct 12, 2011 at 9:18 AM, "V. Armando Sol?" wrote: >>> From a pure user perspective, I would not expect the abs function to >>> return a negative number. Returning +127 plus a warning the first time >>> that happens seems to me a good compromise. >> I guess the question is what's the common context to use small >> integers in the first place. If it is to save memory, then upcasting >> may not be the best solution. I may be wrong, but if you decide to use >> those types in the first place, you need to know about overflows. Abs >> is just one of them (dividing by -1 is another, although this one >> actually raises an exception). >> >> Detecting it may be costly, but this would need benchmarking. >> >> That being said, without context, I don't find 127 a better solution than >> -128. > > Well that choice is just based on getting the closest positive number to > the true value (128). The context can be anything, for instance you > could be using a look up table based on the result of an integer > operation ... > > In terms of cost, it would imply to evaluate the cost of something like: > > a = abs(x); > if (a < 0) {a -= MIN_INT;} > return a; Yes, this is costly: it adds a branch to a trivial operation. I did some preliminary benchmarks (would need confirmation when I have more than one minute to spend on this): int8, 2**16 long array. Before check: 16 us. After check: 92 us. 5-6 times slower int8, 2**24 long array. Before check: 20ms. After check: 30ms. 30 % slower. There is also the issue of signaling the error in the ufunc machinery. I forgot whether this is possible at that level. cheers, David From charlesr.harris at gmail.com Wed Oct 12 11:24:23 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 12 Oct 2011 09:24:23 -0600 Subject: [Numpy-discussion] float128 casting rounding as if it were float64 In-Reply-To: References: Message-ID: On Tue, Oct 11, 2011 at 12:17 PM, Matthew Brett wrote: > Hi, > > While struggling with floating point precision, I ran into this: > > In [52]: a = 2**54+3 > > In [53]: a > Out[53]: 18014398509481987L > > In [54]: np.float128(a) > Out[54]: 18014398509481988.0 > > In [55]: np.float128(a)-1 > Out[55]: 18014398509481987.0 > > The line above tells us that float128 can exactly represent 2**54+3, > but the line above that says that np.float128(2**54+3) rounds upwards > as if it were a float64: > > In [59]: np.float64(a) > Out[59]: 18014398509481988.0 > > In [60]: np.float64(a)-1 > Out[60]: 18014398509481988.0 > > Similarly: > > In [66]: np.float128('1e308') > Out[66]: 1.000000000000000011e+308 > > In [67]: np.float128('1e309') > Out[67]: inf > > Is it possible that float64 is being used somewhere in float128 casting? > > The problem is probably in specifying the values. Python doesn't support long double and I expect python integers to be converted to doubles, then cast to long double. The only way to get around this is probably using string representations of the numbers, and I don't know how well/consistently numpy does that at the moment. If it calls python to do the job, then double is probably what is returned. It doesn't help on my system: In [1]: float128("18014398509481987.0") Out[1]: 18014398509481988.0 Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From linus.junden at gmail.com Thu Oct 13 09:03:17 2011 From: linus.junden at gmail.com (=?ISO-8859-1?Q?Linus_Jund=E9n?=) Date: Thu, 13 Oct 2011 15:03:17 +0200 Subject: [Numpy-discussion] NumPy foundations Message-ID: Hello everyone! I am about to make a NumPy presentation for my colleges in about a week. I want to tell them something about the history of the library and what kind of code it relies on. While researching and preparing for this presentation I found it very hard to find information about the origins of the numeric code. I might of course dive into the source code itself but thought that someone else might have the same questions. That's what the list is for, right? Is NumPy based on some external code like e.g. BLAS, LAPACK etc or is it coded from scratch? Anyone out there that can settle the question? Regards Linus Jund?n Ume? University, Sweden From chaoyuejoy at gmail.com Thu Oct 13 09:09:23 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Thu, 13 Oct 2011 15:09:23 +0200 Subject: [Numpy-discussion] NumPy foundations In-Reply-To: References: Message-ID: I think in the attached numpy guide (I downloaded some days ago from numpy website) by Travis E. Oliphant, he talked about some of the development of numpy. and how scipy and numeric evolved to numpy (if I am not wrong here....) hope it can be of a little help. Chao 2011/10/13 Linus Jund?n > Hello everyone! > > I am about to make a NumPy presentation for my colleges in about a > week. I want to tell them something about the history of the library > and what kind of code it relies on. While researching and preparing > for this presentation I found it very hard to find information about > the origins of the numeric code. I might of course dive into the > source code itself but thought that someone else might have the same > questions. That's what the list is for, right? > > Is NumPy based on some external code like e.g. BLAS, LAPACK etc or is > it coded from scratch? Anyone out there that can settle the question? > > Regards > Linus Jund?n > Ume? University, Sweden > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Guide to NumPy.pdf Type: application/pdf Size: 2148630 bytes Desc: not available URL: From chaoyuejoy at gmail.com Thu Oct 13 09:14:43 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Thu, 13 Oct 2011 15:14:43 +0200 Subject: [Numpy-discussion] how to list all the values in a ndarray without repeat (like filter in excel) Message-ID: Dear all, if I have a ndarray like array([1,2,3,2,3,1,1,1,2,2,....,2,2,3]) containing some values that are flag for data quality. how can list all the values in this array, like doing a descriptive statistics. I guess I should use Scipy statistics ? Thanks for any ideas. Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Thu Oct 13 10:05:27 2011 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 13 Oct 2011 09:05:27 -0500 Subject: [Numpy-discussion] NumPy foundations In-Reply-To: References: Message-ID: On Thursday, October 13, 2011, Linus Jund?n wrote: > Hello everyone! > > I am about to make a NumPy presentation for my colleges in about a > week. I want to tell them something about the history of the library > and what kind of code it relies on. While researching and preparing > for this presentation I found it very hard to find information about > the origins of the numeric code. I might of course dive into the > source code itself but thought that someone else might have the same > questions. That's what the list is for, right? > > Is NumPy based on some external code like e.g. BLAS, LAPACK etc or is > it coded from scratch? Anyone out there that can settle the question? > > Regards > Linus Jund?n > Ume? University, Sweden > Travis O. gave a good presentation about this at scipy 2010. Is there a recording somewhere? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Oct 13 10:09:11 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 13 Oct 2011 10:09:11 -0400 Subject: [Numpy-discussion] how to list all the values in a ndarray without repeat (like filter in excel) In-Reply-To: References: Message-ID: On Thu, Oct 13, 2011 at 9:14 AM, Chao YUE wrote: > Dear all, > > if I have a ndarray like array([1,2,3,2,3,1,1,1,2,2,....,2,2,3]) containing > some values that are flag for data quality. > how can list all the values in this array, like doing a descriptive > statistics. I guess I should use Scipy statistics ? > Thanks for any ideas. Not sure what you are asking for np.unique to get unique values selecting by mask: mask = (myarr == 3), arr[mask] or statistics of another variable by groups/flags: np.bincount(arr, weights=myvar) Josef > > Chao > > -- > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > ************************************************************************************ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From ben.root at ou.edu Thu Oct 13 10:09:19 2011 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 13 Oct 2011 09:09:19 -0500 Subject: [Numpy-discussion] how to list all the values in a ndarray without repeat (like filter in excel) In-Reply-To: References: Message-ID: On Thursday, October 13, 2011, Chao YUE wrote: > Dear all, > > if I have a ndarray like array([1,2,3,2,3,1,1,1,2,2,....,2,2,3]) containing some values that are flag for data quality. > how can list all the values in this array, like doing a descriptive statistics. I guess I should use Scipy statistics ? > Thanks for any ideas. > > Chao > Would np.unique() do the job? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Thu Oct 13 10:20:19 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Thu, 13 Oct 2011 16:20:19 +0200 Subject: [Numpy-discussion] how to list all the values in a ndarray without repeat (like filter in excel) In-Reply-To: References: Message-ID: Yes, np.unique() is exactly what I want. thanks. chao 2011/10/13 Benjamin Root > > > On Thursday, October 13, 2011, Chao YUE wrote: > > Dear all, > > > > if I have a ndarray like array([1,2,3,2,3,1,1,1,2,2,....,2,2,3]) > containing some values that are flag for data quality. > > how can list all the values in this array, like doing a descriptive > statistics. I guess I should use Scipy statistics ? > > Thanks for any ideas. > > > > Chao > > > > Would np.unique() do the job? > > Ben Root > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Thu Oct 13 11:53:47 2011 From: Chris.Barker at noaa.gov (Chris.Barker) Date: Thu, 13 Oct 2011 08:53:47 -0700 Subject: [Numpy-discussion] NumPy foundations In-Reply-To: References: Message-ID: <4E97098B.4000503@noaa.gov> On 10/13/11 6:03 AM, Linus Jund?n wrote: > I am about to make a NumPy presentation for my colleges in about a > week. I want to tell them something about the history of the library > and what kind of code it relies on. > Is NumPy based on some external code like e.g. BLAS, LAPACK etc or is > it coded from scratch? Anyone out there that can settle the question? It was coded from scratch -- though does have hooks to BLAS and LAPACK for linear algebra operations. It was originally written by Jim Hugunin, who later went on to write Jython, and then Iron Python. It doesn't look like he updates his web page often, but you should find some good stuff here: http://hugunin.net/index.html As you seem to know, the current numpy code base evolved from the original "Numeric" code, also informed by the "numarray" fork. Here is some intro text from "Numerical Python: An Open Source Project", Sept 7, 2001: """ Numerical Python is the outgrowth of a long collaborative design process carried out by the Matrix SIG of the Python Software Activity (PSA). Jim Hugunin, while a graduate student at MIT, wrote most of the code and initial documentation. When Jim joined CNRI and began working on JPython, he didn't have the time to maintain Numerical Python so Paul Dubois at LLNL agreed to become the maintainer of Numerical Python. David Ascher, working as a consultant to LLNL, wrote most of this document, incorporating contributions from Konrad Hinsen and Travis Oliphant, both of whom are major contributors to Numerical Python. """ I have a paper copy still, but managed to find it on the web, too: [http://dsnra.jpl.nasa.gov/software/Python/numpydoc/index.html] That's the oldest form of the doc I could find quickly. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From chaoyuejoy at gmail.com Thu Oct 13 12:13:12 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Thu, 13 Oct 2011 18:13:12 +0200 Subject: [Numpy-discussion] ndarray with double comparison Message-ID: Dear all, sorry for this stupid question but I cannot find it in numpy tutorial or google. suppose I have a=np.arange(11). In [32]: a < 8 Out[32]: array([ True, True, True, True, True, True, True, True, False, False, False], dtype=bool) In [34]: a > 4 Out[34]: array([False, False, False, False, False, True, True, True, True, True, True], dtype=bool) how can I have boolean index like 4 < a < 8 np.where(a>4 and a<8);or plainly input "a>4 and a<8" doesn't work. thanks, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokhansever at gmail.com Thu Oct 13 12:16:41 2011 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Thu, 13 Oct 2011 10:16:41 -0600 Subject: [Numpy-discussion] ndarray with double comparison In-Reply-To: References: Message-ID: On Thu, Oct 13, 2011 at 10:13 AM, Chao YUE wrote: > Dear all, > > sorry for this stupid question but I cannot find it in numpy tutorial or > google. > suppose I have a=np.arange(11). > > In [32]: a < 8 > Out[32]: > array([ True, True, True, True, True, True, True, True, False, > False, False], dtype=bool) > > In [34]: a > 4 > Out[34]: > array([False, False, False, False, False, True, True, True, True, > True, True], dtype=bool) > > how can I have boolean index like 4 < a < 8 > np.where(a>4 and a<8);or plainly input "a>4 and a<8" doesn't work. > > thanks, > > Chao > I1 a=np.arange(11) I2 a[(a<8) & (a>4)] O2 array([5, 6, 7]) -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Thu Oct 13 12:22:03 2011 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 13 Oct 2011 11:22:03 -0500 Subject: [Numpy-discussion] ndarray with double comparison In-Reply-To: References: Message-ID: On Thu, Oct 13, 2011 at 11:13 AM, Chao YUE wrote: > Dear all, > > sorry for this stupid question but I cannot find it in numpy tutorial or > google. > suppose I have a=np.arange(11). > > In [32]: a < 8 > Out[32]: > array([ True, True, True, True, True, True, True, True, False, > False, False], dtype=bool) > > In [34]: a > 4 > Out[34]: > array([False, False, False, False, False, True, True, True, True, > True, True], dtype=bool) > > how can I have boolean index like 4 < a < 8 > np.where(a>4 and a<8);or plainly input "a>4 and a<8" doesn't work. > > thanks, > > Chao > > Unfortunately, you can't use "and", "or", "not" keywords with boolean arrays because numpy can't overload them. Instead, use the bitwise operators: '&', '|', and '~'. Be careful, though, because of operator precedence is different for bitwise operators than the boolean keywords. I am in the habit of always wrapping my boolean expressions in parentheses, just in case. (a > 4) & (a < 8) is what you want. Note that "a > 4 & a < 8" would be evaluated in a different order -- "4 & a" would be first. I hope that helps! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Thu Oct 13 12:32:15 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Thu, 13 Oct 2011 18:32:15 +0200 Subject: [Numpy-discussion] ndarray with double comparison In-Reply-To: References: Message-ID: Thanks. I starts to use python do some real data processing and has bunch of questions. Chao 2011/10/13 Benjamin Root > On Thu, Oct 13, 2011 at 11:13 AM, Chao YUE wrote: > >> Dear all, >> >> sorry for this stupid question but I cannot find it in numpy tutorial or >> google. >> suppose I have a=np.arange(11). >> >> In [32]: a < 8 >> Out[32]: >> array([ True, True, True, True, True, True, True, True, False, >> False, False], dtype=bool) >> >> In [34]: a > 4 >> Out[34]: >> array([False, False, False, False, False, True, True, True, True, >> True, True], dtype=bool) >> >> how can I have boolean index like 4 < a < 8 >> np.where(a>4 and a<8);or plainly input "a>4 and a<8" doesn't work. >> >> thanks, >> >> Chao >> >> > Unfortunately, you can't use "and", "or", "not" keywords with boolean > arrays because numpy can't overload them. Instead, use the bitwise > operators: '&', '|', and '~'. Be careful, though, because of operator > precedence is different for bitwise operators than the boolean keywords. I > am in the habit of always wrapping my boolean expressions in parentheses, > just in case. > > (a > 4) & (a < 8) > > is what you want. Note that "a > 4 & a < 8" would be evaluated in a > different order -- "4 & a" would be first. > > I hope that helps! > > Ben Root > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.shivers at gmail.com Thu Oct 13 12:50:12 2011 From: marc.shivers at gmail.com (Marc Shivers) Date: Thu, 13 Oct 2011 12:50:12 -0400 Subject: [Numpy-discussion] ndarray with double comparison In-Reply-To: References: Message-ID: you could use bitwise comparison with paretheses: In [8]: (a>4)&(a<8) Out[8]: array([False, False, False, False, False, True, True, True, False, False, False], dtype=bool) On Thu, Oct 13, 2011 at 12:13 PM, Chao YUE wrote: > Dear all, > > sorry for this stupid question but I cannot find it in numpy tutorial or > google. > suppose I have a=np.arange(11). > > In [32]: a < 8 > Out[32]: > array([ True, True, True, True, True, True, True, True, False, > False, False], dtype=bool) > > In [34]: a > 4 > Out[34]: > array([False, False, False, False, False, True, True, True, True, > True, True], dtype=bool) > > how can I have boolean index like 4 < a < 8 > np.where(a>4 and a<8);or plainly input "a>4 and a<8" doesn't work. > > thanks, > > Chao > > -- > > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > ************************************************************************************ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Thu Oct 13 13:17:48 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Thu, 13 Oct 2011 19:17:48 +0200 Subject: [Numpy-discussion] the axis parameter in the np.ma.concatenate is not working? Message-ID: Dear all, I use numpy version 1.5.1 which is installed by default when I do sudo apt-get install numpy on ubuntu 11.04. but it seems that for np.ma.concatenate(arrays, axis), the axis parameter is not working? In [460]: a=np.arange(10) In [461]: a=np.ma.masked_array(a,a<3) In [462]: a Out[462]: masked_array(data = [-- -- -- 3 4 5 6 7 8 9], mask = [ True True True False False False False False False False], fill_value = 999999) In [463]: b=np.arange(10) In [464]: b=np.ma.masked_array(a,b>7) In [465]: b Out[465]: masked_array(data = [-- -- -- 3 4 5 6 7 -- --], mask = [ True True True False False False False False True True], fill_value = 999999) In [466]: c=np.ma.concatenate((a,b),axis=0) In [467]: c Out[467]: masked_array(data = [-- -- -- 3 4 5 6 7 8 9 -- -- -- 3 4 5 6 7 -- --], mask = [ True True True False False False False False False False True True True False False False False False True True], fill_value = 999999) In [468]: c.shape Out[468]: (20,) In [469]: c=np.ma.concatenate((a,b),axis=1) In [470]: c.shape Out[470]: (20,) cheers, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Oct 13 14:15:01 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 13 Oct 2011 14:15:01 -0400 Subject: [Numpy-discussion] the axis parameter in the np.ma.concatenate is not working? In-Reply-To: References: Message-ID: On Thu, Oct 13, 2011 at 1:17 PM, Chao YUE wrote: > Dear all, > > I use numpy version 1.5.1 which is installed by default when I do sudo > apt-get install numpy on ubuntu 11.04. > but it seems that for np.ma.concatenate(arrays, axis), the axis parameter is > not working? > > In [460]: a=np.arange(10) > > In [461]: a=np.ma.masked_array(a,a<3) > > In [462]: a > Out[462]: > masked_array(data = [-- -- -- 3 4 5 6 7 8 9], > ???????????? mask = [ True? True? True False False False False False False > False], > ?????? fill_value = 999999) > > > In [463]: b=np.arange(10) > > In [464]: b=np.ma.masked_array(a,b>7) > > In [465]: b > Out[465]: > masked_array(data = [-- -- -- 3 4 5 6 7 -- --], > ???????????? mask = [ True? True? True False False False False False? True > True], > ?????? fill_value = 999999) > > > In [466]: c=np.ma.concatenate((a,b),axis=0) > > In [467]: c > Out[467]: > masked_array(data = [-- -- -- 3 4 5 6 7 8 9 -- -- -- 3 4 5 6 7 -- --], > ???????????? mask = [ True? True? True False False False False False False > False? True? True > ? True False False False False False? True? True], > ?????? fill_value = 999999) > > > In [468]: c.shape > Out[468]: (20,) > > In [469]: c=np.ma.concatenate((a,b),axis=1) maybe you want numpy.ma.column_stack for concatenate you need to add extra axis first something like c=np.ma.concatenate((a[:,None], b[:,None]),axis=1) (not tested) Josef > > In [470]: c.shape > Out[470]: (20,) > > cheers, > > Chao > > -- > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > ************************************************************************************ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From bsouthey at gmail.com Thu Oct 13 14:31:10 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 13 Oct 2011 13:31:10 -0500 Subject: [Numpy-discussion] NumPy foundations In-Reply-To: <4E97098B.4000503@noaa.gov> References: <4E97098B.4000503@noaa.gov> Message-ID: <4E972E6E.4000607@gmail.com> On 10/13/2011 10:53 AM, Chris.Barker wrote: > On 10/13/11 6:03 AM, Linus Jund?n wrote: >> I am about to make a NumPy presentation for my colleges in about a >> week. I want to tell them something about the history of the library >> and what kind of code it relies on. >> Is NumPy based on some external code like e.g. BLAS, LAPACK etc or is >> it coded from scratch? Anyone out there that can settle the question? > It was coded from scratch -- though does have hooks to BLAS and LAPACK > for linear algebra operations. It was originally written by Jim Hugunin, > who later went on to write Jython, and then Iron Python. It doesn't look > like he updates his web page often, but you should find some good stuff > here: > > http://hugunin.net/index.html > > As you seem to know, the current numpy code base evolved from the > original "Numeric" code, also informed by the "numarray" fork. > > Here is some intro text from "Numerical Python: An Open Source Project", > Sept 7, 2001: > > """ > Numerical Python is the outgrowth of a long collaborative design process > carried out by the Matrix SIG of the Python Software Activity (PSA). Jim > Hugunin, while a graduate student at MIT, wrote most of the code and > initial documentation. When Jim joined CNRI and began working on > JPython, he didn't have the time to maintain Numerical Python so Paul > Dubois at LLNL agreed to become the maintainer of Numerical Python. > David Ascher, working as a consultant to LLNL, wrote most of this > document, incorporating contributions from Konrad Hinsen and Travis > Oliphant, both of whom are major contributors to Numerical Python. > """ > > I have a paper copy still, but managed to find it on the web, too: > > [http://dsnra.jpl.nasa.gov/software/Python/numpydoc/index.html] > > That's the oldest form of the doc I could find quickly. > > -Chris > > > > > A view of the history can be found at: http://www.scipy.org/History_of_SciPy/ I thought Paul DuBois had more on this as I only managed to find this: http://web.archive.org/web/20010410225234/http://pfdubois.com/numpy/ Not clear if Numerical-15 could not link to external lapack libraries but Numeric 16 (29-Aug-2000) onwards could. You can find Numerical-15.3.tgz (08-May-2000) or later Numeric versions on the web (sourceforge only has Numeric 24 onwards): http://ftp.heanet.ie/mirrors/sourceforge/n/project/nu/numpy/OldFiles/ Bruce From cwg at falma.de Thu Oct 13 14:58:25 2011 From: cwg at falma.de (Christoph Groth) Date: Thu, 13 Oct 2011 20:58:25 +0200 Subject: [Numpy-discussion] wanted: decent matplotlib alternative Message-ID: <8739ew90ry.fsf@falma.de> Hello, Is it just me who thinks that matplotlib is ugly and a pain to use? So far I haven't found a decent alternative usable from within python. (I haven't tried all the packages out there.) I'm mostly interested in 2d plots. Who is happy enough with a numpy-compatible plotting package to recommend it? Thanks, Christoph A few things I can't stand about matplotlib: * It works as a state machine. (There is an OO-API, too, but it's ugly and cumbersome to use, and most examples use the state machine mode or, even worse, a mixture of OO and global state.) * It uses inches by default. (I propose to switch to nails: 1 nail = 3 digits = 2 1/4 inches = 1/16 yard.) * subplot(211) (ugh!) * Concepts are named in a confusing way. ("ax = subplot(112)" anyone?) From jkington at wisc.edu Thu Oct 13 15:21:57 2011 From: jkington at wisc.edu (Joe Kington) Date: Thu, 13 Oct 2011 14:21:57 -0500 Subject: [Numpy-discussion] wanted: decent matplotlib alternative In-Reply-To: <8739ew90ry.fsf@falma.de> References: <8739ew90ry.fsf@falma.de> Message-ID: Have a look at Chaco: http://code.enthought.com/chaco/ If you're wanting a more pythonic api, it's a good choice. Personally, I still prefer matplotlib. You don't every need to touch the state machine interface. The OO interface is slighly un-pythonic, but it's hardly clunky. I think you're referring to one of the webpage examples of it which avoids _any_ convenience functions. You can still use the convenience functions without having to rely on the state machine in any way. E.g.: import matplotlib.pyplot as plt fig, axes = plt.subplots(ncols=4) for ax in axes: ax.plot(range(10)) plt.show() All in all, matplotlib deliberately tries to mimic matlab for a lot of the conventions. This is mostly to make it easier to switch if you're already familiar with matlab. To each his own, but for better or worse, matplotlib is the most widely used plotting library for python. It's worth getting a bit more familiar with, if nothing else just to see past some of the rough edges. -Joe On Thu, Oct 13, 2011 at 1:58 PM, Christoph Groth wrote: > Hello, > > Is it just me who thinks that matplotlib is ugly and a pain to use? So > far I haven't found a decent alternative usable from within python. (I > haven't tried all the packages out there.) I'm mostly interested in 2d > plots. Who is happy enough with a numpy-compatible plotting package to > recommend it? > > Thanks, > > Christoph > > > > > A few things I can't stand about matplotlib: > > * It works as a state machine. (There is an OO-API, too, but it's ugly > and cumbersome to use, and most examples use the state machine mode > or, even worse, a mixture of OO and global state.) > > * It uses inches by default. (I propose to switch to nails: 1 nail = 3 > digits = 2 1/4 inches = 1/16 yard.) > > * subplot(211) (ugh!) > > * Concepts are named in a confusing way. ("ax = subplot(112)" anyone?) > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rowen at uw.edu Thu Oct 13 17:18:00 2011 From: rowen at uw.edu (Russell E. Owen) Date: Thu, 13 Oct 2011 14:18:00 -0700 Subject: [Numpy-discussion] wanted: decent matplotlib alternative References: <8739ew90ry.fsf@falma.de> Message-ID: In article <8739ew90ry.fsf at falma.de>, Christoph Groth wrote: > Hello, > > Is it just me who thinks that matplotlib is ugly and a pain to use? So > far I haven't found a decent alternative usable from within python. (I > haven't tried all the packages out there.) I'm mostly interested in 2d > plots. Who is happy enough with a numpy-compatible plotting package to > recommend it? I know folks who like HippoDraw and use it instead of matplotlib due to its speed. Veusz sounds promising. Both use Qt as a back end. I've not used either because I need Tcl/TK as a back end for much of my work. -- Russell From zachary.pincus at yale.edu Thu Oct 13 17:21:47 2011 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Thu, 13 Oct 2011 17:21:47 -0400 Subject: [Numpy-discussion] wanted: decent matplotlib alternative In-Reply-To: References: <8739ew90ry.fsf@falma.de> Message-ID: <067C12E0-C018-4575-A06E-E79494F2C2DC@yale.edu> I keep meaning to use matplotlib as well, but every time I try I also get really turned off by the matlabish interface in the examples. I get that it's a selling point for matlab refugees, but I find it counterintuitive in the same way Christoph seems to. I'm glad to hear the OO interface isn't as clunky as it looks on some of the doc pages, though. This is good news. Can anyone point out any good tutorials/docs on using matplotlib idiomatically via its OO interface? Zach On Oct 13, 2011, at 3:21 PM, Joe Kington wrote: > Have a look at Chaco: http://code.enthought.com/chaco/ If you're wanting a more pythonic api, it's a good choice. > > Personally, I still prefer matplotlib. > > You don't every need to touch the state machine interface. > > The OO interface is slighly un-pythonic, but it's hardly clunky. I think you're referring to one of the webpage examples of it which avoids _any_ convenience functions. You can still use the convenience functions without having to rely on the state machine in any way. E.g.: > > import matplotlib.pyplot as plt > > fig, axes = plt.subplots(ncols=4) > > for ax in axes: > ax.plot(range(10)) > > plt.show() > > All in all, matplotlib deliberately tries to mimic matlab for a lot of the conventions. This is mostly to make it easier to switch if you're already familiar with matlab. > > To each his own, but for better or worse, matplotlib is the most widely used plotting library for python. It's worth getting a bit more familiar with, if nothing else just to see past some of the rough edges. > > -Joe > > > > On Thu, Oct 13, 2011 at 1:58 PM, Christoph Groth wrote: > Hello, > > Is it just me who thinks that matplotlib is ugly and a pain to use? So > far I haven't found a decent alternative usable from within python. (I > haven't tried all the packages out there.) I'm mostly interested in 2d > plots. Who is happy enough with a numpy-compatible plotting package to > recommend it? > > Thanks, > > Christoph > > > > > A few things I can't stand about matplotlib: > > * It works as a state machine. (There is an OO-API, too, but it's ugly > and cumbersome to use, and most examples use the state machine mode > or, even worse, a mixture of OO and global state.) > > * It uses inches by default. (I propose to switch to nails: 1 nail = 3 > digits = 2 1/4 inches = 1/16 yard.) > > * subplot(211) (ugh!) > > * Concepts are named in a confusing way. ("ax = subplot(112)" anyone?) > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From jsalvati at u.washington.edu Thu Oct 13 17:31:04 2011 From: jsalvati at u.washington.edu (John Salvatier) Date: Thu, 13 Oct 2011 14:31:04 -0700 Subject: [Numpy-discussion] wanted: decent matplotlib alternative In-Reply-To: <067C12E0-C018-4575-A06E-E79494F2C2DC@yale.edu> References: <8739ew90ry.fsf@falma.de> <067C12E0-C018-4575-A06E-E79494F2C2DC@yale.edu> Message-ID: I second that request. On Thu, Oct 13, 2011 at 2:21 PM, Zachary Pincus wrote: > I keep meaning to use matplotlib as well, but every time I try I also get > really turned off by the matlabish interface in the examples. I get that > it's a selling point for matlab refugees, but I find it counterintuitive in > the same way Christoph seems to. > > I'm glad to hear the OO interface isn't as clunky as it looks on some of > the doc pages, though. This is good news. Can anyone point out any good > tutorials/docs on using matplotlib idiomatically via its OO interface? > > Zach > > > > On Oct 13, 2011, at 3:21 PM, Joe Kington wrote: > > > Have a look at Chaco: http://code.enthought.com/chaco/ If you're > wanting a more pythonic api, it's a good choice. > > > > Personally, I still prefer matplotlib. > > > > You don't every need to touch the state machine interface. > > > > The OO interface is slighly un-pythonic, but it's hardly clunky. I think > you're referring to one of the webpage examples of it which avoids _any_ > convenience functions. You can still use the convenience functions without > having to rely on the state machine in any way. E.g.: > > > > import matplotlib.pyplot as plt > > > > fig, axes = plt.subplots(ncols=4) > > > > for ax in axes: > > ax.plot(range(10)) > > > > plt.show() > > > > All in all, matplotlib deliberately tries to mimic matlab for a lot of > the conventions. This is mostly to make it easier to switch if you're > already familiar with matlab. > > > > To each his own, but for better or worse, matplotlib is the most widely > used plotting library for python. It's worth getting a bit more familiar > with, if nothing else just to see past some of the rough edges. > > > > -Joe > > > > > > > > On Thu, Oct 13, 2011 at 1:58 PM, Christoph Groth wrote: > > Hello, > > > > Is it just me who thinks that matplotlib is ugly and a pain to use? So > > far I haven't found a decent alternative usable from within python. (I > > haven't tried all the packages out there.) I'm mostly interested in 2d > > plots. Who is happy enough with a numpy-compatible plotting package to > > recommend it? > > > > Thanks, > > > > Christoph > > > > > > > > > > A few things I can't stand about matplotlib: > > > > * It works as a state machine. (There is an OO-API, too, but it's ugly > > and cumbersome to use, and most examples use the state machine mode > > or, even worse, a mixture of OO and global state.) > > > > * It uses inches by default. (I propose to switch to nails: 1 nail = 3 > > digits = 2 1/4 inches = 1/16 yard.) > > > > * subplot(211) (ugh!) > > > > * Concepts are named in a confusing way. ("ax = subplot(112)" anyone?) > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdh2358 at gmail.com Thu Oct 13 17:39:08 2011 From: jdh2358 at gmail.com (John Hunter) Date: Thu, 13 Oct 2011 16:39:08 -0500 Subject: [Numpy-discussion] wanted: decent matplotlib alternative In-Reply-To: <067C12E0-C018-4575-A06E-E79494F2C2DC@yale.edu> References: <8739ew90ry.fsf@falma.de> <067C12E0-C018-4575-A06E-E79494F2C2DC@yale.edu> Message-ID: <3B9D05F4-ABC3-4870-ADA9-1C83613B8C9E@gmail.com> On Oct 13, 2011, at 4:21 PM, Zachary Pincus wrote: > I keep meaning to use matplotlib as well, but every time I try I also get really turned off by the matlabish interface in the examples. I get that it's a selling point for matlab refugees, but I find it counterintuitive in the same way Christoph seems to. > > I'm glad to hear the OO interface isn't as clunky as it looks on some of the doc pages, though. This is good news. Can anyone point out any good tutorials/docs on using matplotlib idiomatically via its OO interface? > > I would start with these examples http://matplotlib.sourceforge.net/examples/api/index.html These examples use pyplot only for figure generation, mostly because this is the easiest way to get a Figure instance correctly wired across user interface toolkits, but use the API for everything else. And this tutorial, which explains the central object hierarchy: http://matplotlib.sourceforge.net/users/artists.html For a deeper dive, these tutorials may be of interest too: http://matplotlib.sourceforge.net/users/transforms_tutorial.html http://matplotlib.sourceforge.net/users/path_tutorial.html http://matplotlib.sourceforge.net/users/event_handling.html From jsalvati at u.washington.edu Thu Oct 13 17:53:32 2011 From: jsalvati at u.washington.edu (John Salvatier) Date: Thu, 13 Oct 2011 14:53:32 -0700 Subject: [Numpy-discussion] wanted: decent matplotlib alternative In-Reply-To: <3B9D05F4-ABC3-4870-ADA9-1C83613B8C9E@gmail.com> References: <8739ew90ry.fsf@falma.de> <067C12E0-C018-4575-A06E-E79494F2C2DC@yale.edu> <3B9D05F4-ABC3-4870-ADA9-1C83613B8C9E@gmail.com> Message-ID: Thank you John, those are looking useful. On Thu, Oct 13, 2011 at 2:39 PM, John Hunter wrote: > > > > > On Oct 13, 2011, at 4:21 PM, Zachary Pincus > wrote: > > > I keep meaning to use matplotlib as well, but every time I try I also get > really turned off by the matlabish interface in the examples. I get that > it's a selling point for matlab refugees, but I find it counterintuitive in > the same way Christoph seems to. > > > > I'm glad to hear the OO interface isn't as clunky as it looks on some of > the doc pages, though. This is good news. Can anyone point out any good > tutorials/docs on using matplotlib idiomatically via its OO interface? > > > > > > I would start with these examples > > http://matplotlib.sourceforge.net/examples/api/index.html > > These examples use pyplot only for figure generation, mostly because this > is the easiest way to get a Figure instance correctly wired across user > interface toolkits, but use the API for everything else. > > And this tutorial, which explains the central object hierarchy: > > http://matplotlib.sourceforge.net/users/artists.html > > For a deeper dive, these tutorials may be of interest too: > > http://matplotlib.sourceforge.net/users/transforms_tutorial.html > > http://matplotlib.sourceforge.net/users/path_tutorial.html > > > http://matplotlib.sourceforge.net/users/event_handling.html > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokhansever at gmail.com Thu Oct 13 18:03:49 2011 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Thu, 13 Oct 2011 16:03:49 -0600 Subject: [Numpy-discussion] wanted: decent matplotlib alternative In-Reply-To: <067C12E0-C018-4575-A06E-E79494F2C2DC@yale.edu> References: <8739ew90ry.fsf@falma.de> <067C12E0-C018-4575-A06E-E79494F2C2DC@yale.edu> Message-ID: On Thu, Oct 13, 2011 at 3:21 PM, Zachary Pincus wrote: > I keep meaning to use matplotlib as well, but every time I try I also get > really turned off by the matlabish interface in the examples. I get that > it's a selling point for matlab refugees, but I find it counterintuitive in > the same way Christoph seems to. > > I'm glad to hear the OO interface isn't as clunky as it looks on some of > the doc pages, though. This is good news. Can anyone point out any good > tutorials/docs on using matplotlib idiomatically via its OO interface? > > Zach I think, IPython is great for interaction with the OO interface of the matlab. Just starting simple with: fig=plt.figure() ax=plt.gca() and keep tabbing ax., fig. or any object you create on the canvas .tab to get its methods and attributes. Another approach is start with the pylab interface and query detailed help/code with ?? in IPython (e.g. plt.xlabel??) -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokhansever at gmail.com Thu Oct 13 18:06:21 2011 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Thu, 13 Oct 2011 16:06:21 -0600 Subject: [Numpy-discussion] wanted: decent matplotlib alternative In-Reply-To: References: <8739ew90ry.fsf@falma.de> <067C12E0-C018-4575-A06E-E79494F2C2DC@yale.edu> Message-ID: On Thu, Oct 13, 2011 at 4:03 PM, G?khan Sever wrote: > > I think, IPython is great for interaction with the OO interface of the > matlab. Just starting simple with: > > fig=plt.figure() > ax=plt.gca() > and keep tabbing ax., fig. or any object you create on the canvas > .tab to get its methods and attributes. Another approach is start with the > pylab interface and query detailed help/code with ?? in IPython (e.g. > plt.xlabel??) > Sorry s/matlab/matplotlib. I am not sure if matlab has IPython like interface to introspect objects. Definitely IDL doesn't. -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Thu Oct 13 18:15:34 2011 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 13 Oct 2011 17:15:34 -0500 Subject: [Numpy-discussion] wanted: decent matplotlib alternative In-Reply-To: References: <8739ew90ry.fsf@falma.de> <067C12E0-C018-4575-A06E-E79494F2C2DC@yale.edu> <3B9D05F4-ABC3-4870-ADA9-1C83613B8C9E@gmail.com> Message-ID: On Thursday, October 13, 2011, John Salvatier wrote: > Thank you John, those are looking useful. > I have been working to improve the docs. One of the frustrating things about the docs is the information overload in some places and the lack of information elsewhere. Further, the rigidity of the docs have not been helpful. The docs really present itself in 2 ways: the gallery/examples and the APIs. The latter is geared for the devs while the former is geared for helping users with the question "how do I make a plot look like this?". Myself and other developers would greatly appreciate help from the community to point out which examples are too confusing or out of date. We would also greatly welcome all critiques, suggestions, and comments on the docs. Of course, what would be welcomed even more are patches! Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokhansever at gmail.com Thu Oct 13 18:22:10 2011 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Thu, 13 Oct 2011 16:22:10 -0600 Subject: [Numpy-discussion] wanted: decent matplotlib alternative In-Reply-To: References: <8739ew90ry.fsf@falma.de> <067C12E0-C018-4575-A06E-E79494F2C2DC@yale.edu> <3B9D05F4-ABC3-4870-ADA9-1C83613B8C9E@gmail.com> Message-ID: On Thu, Oct 13, 2011 at 4:15 PM, Benjamin Root wrote: > Myself and other developers would greatly appreciate help from the > community to point out which examples are too confusing or out of date. We > It would be nice to have a social interface for the mpl gallery like the one similar to the R-gallery [ http://www.r-bloggers.com/the-r-graph-gallery-goes-social/] -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Thu Oct 13 18:25:36 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Thu, 13 Oct 2011 18:25:36 -0400 Subject: [Numpy-discussion] wanted: decent matplotlib alternative In-Reply-To: References: <8739ew90ry.fsf@falma.de> <067C12E0-C018-4575-A06E-E79494F2C2DC@yale.edu> <3B9D05F4-ABC3-4870-ADA9-1C83613B8C9E@gmail.com> Message-ID: On Thu, Oct 13, 2011 at 6:22 PM, G?khan Sever wrote: > > > On Thu, Oct 13, 2011 at 4:15 PM, Benjamin Root wrote: >> >> Myself and other developers would greatly appreciate help from the >> community to point out which examples are too confusing or out of date. We > > It would be nice to have a social interface for the mpl gallery like the one > similar to the R-gallery > [http://www.r-bloggers.com/the-r-graph-gallery-goes-social/] Big +1. Just yesterday I wanted to add some cool "notes to self" plots. IIRC there was a lightning talk at SciPy conference two summers ago about starting a web site just like this. Don't know what happened though. Skipper From efiring at hawaii.edu Thu Oct 13 18:36:59 2011 From: efiring at hawaii.edu (Eric Firing) Date: Thu, 13 Oct 2011 12:36:59 -1000 Subject: [Numpy-discussion] wanted: decent matplotlib alternative In-Reply-To: References: <8739ew90ry.fsf@falma.de> <067C12E0-C018-4575-A06E-E79494F2C2DC@yale.edu> <3B9D05F4-ABC3-4870-ADA9-1C83613B8C9E@gmail.com> Message-ID: <4E97680B.5000509@hawaii.edu> On 10/13/2011 12:22 PM, G?khan Sever wrote: > > > On Thu, Oct 13, 2011 at 4:15 PM, Benjamin Root > wrote: > > Myself and other developers would greatly appreciate help from the > community to point out which examples are too confusing or out of > date. We > > > It would be nice to have a social interface for the mpl gallery like the > one similar to the R-gallery > [http://www.r-bloggers.com/the-r-graph-gallery-goes-social/] I think that the priority should go towards massive pruning, organization, and cleanup of the gallery. This would be a great project for a new contributor to mpl. Eric > > > -- > G?khan From jdh2358 at gmail.com Thu Oct 13 18:42:50 2011 From: jdh2358 at gmail.com (John Hunter) Date: Thu, 13 Oct 2011 17:42:50 -0500 Subject: [Numpy-discussion] wanted: decent matplotlib alternative In-Reply-To: <4E97680B.5000509@hawaii.edu> References: <8739ew90ry.fsf@falma.de> <067C12E0-C018-4575-A06E-E79494F2C2DC@yale.edu> <3B9D05F4-ABC3-4870-ADA9-1C83613B8C9E@gmail.com> <4E97680B.5000509@hawaii.edu> Message-ID: On Thu, Oct 13, 2011 at 5:36 PM, Eric Firing wrote: >> It would be nice to have a social interface for the mpl gallery like the >> one similar to the R-gallery >> [http://www.r-bloggers.com/the-r-graph-gallery-goes-social/] > > I think that the priority should go towards massive pruning, > organization, and cleanup of the gallery. ?This would be a great project > for a new contributor to mpl. So as to not hijack poor Christoph's thread, who after all is looking for mpl alternatives, and not to abuse numpy-discussion's bandwidth with mpl issues, I have opened and issue on the github tracker as an "open thread" to register gripes and suggestions for mpl doc and gallery improvements, particularly as it regards API usage https://github.com/matplotlib/matplotlib/issues/524 JDH From josef.pktd at gmail.com Thu Oct 13 19:16:36 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 13 Oct 2011 19:16:36 -0400 Subject: [Numpy-discussion] wanted: decent matplotlib alternative In-Reply-To: References: <8739ew90ry.fsf@falma.de> <067C12E0-C018-4575-A06E-E79494F2C2DC@yale.edu> <3B9D05F4-ABC3-4870-ADA9-1C83613B8C9E@gmail.com> <4E97680B.5000509@hawaii.edu> Message-ID: On Thu, Oct 13, 2011 at 6:42 PM, John Hunter wrote: > On Thu, Oct 13, 2011 at 5:36 PM, Eric Firing wrote: > >>> It would be nice to have a social interface for the mpl gallery like the >>> one similar to the R-gallery >>> [http://www.r-bloggers.com/the-r-graph-gallery-goes-social/] >> >> I think that the priority should go towards massive pruning, >> organization, and cleanup of the gallery. ?This would be a great project >> for a new contributor to mpl. > > So as to not hijack poor Christoph's thread, who after all is looking > for mpl alternatives, and not to abuse numpy-discussion's bandwidth > with mpl issues, I have opened and issue on the github tracker as an > "open thread" to register gripes and suggestions for mpl doc and > gallery improvements, particularly as it regards API usage > > https://github.com/matplotlib/matplotlib/issues/524 I started, but I would like to mention that I'm very happy with the pyplot interface (even if sometimes it takes a bit of time to figure out which options to use). Josef > > JDH > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From chaoyuejoy at gmail.com Fri Oct 14 04:30:51 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Fri, 14 Oct 2011 10:30:51 +0200 Subject: [Numpy-discussion] the axis parameter in the np.ma.concatenate is not working? In-Reply-To: References: Message-ID: Thanks Josef, you're right. Could you explain me what's the difference between In [4]: a=np.arange(10) In [5]: a.shape Out[5]: (10,) and In [6]: a=np.arange(10).reshape(10,1) In [7]: a.shape Out[7]: (10, 1) (10) means the first a is only a one-dimensional ndarray, but the (10,1) means the second a is a two-dimensional ndarray? another question, if I have In [70]: f Out[70]: masked_array(data = [[0 0] [1 1] [2 2] [3 3] [-- 4] [5 5] [6 6] [7 --] [8 8] [9 9]], mask = [[False False] [False False] [False False] [False False] [ True False] [False False] [False False] [False True] [False False] [False False]], fill_value = 999999) but when I do In [71]: f.data Out[71]: array([[0, 0], [1, 1], [2, 2], [3, 3], [4, 4], [5, 5], [6, 6], [7, 7], [8, 8], [9, 9]]) it still shows the original value, so what 's the usage of fill_value in masked array? can I set a fill_value as np.nan? Thanks, Chao 2011/10/13 > On Thu, Oct 13, 2011 at 1:17 PM, Chao YUE wrote: > > Dear all, > > > > I use numpy version 1.5.1 which is installed by default when I do sudo > > apt-get install numpy on ubuntu 11.04. > > but it seems that for np.ma.concatenate(arrays, axis), the axis parameter > is > > not working? > > > > In [460]: a=np.arange(10) > > > > In [461]: a=np.ma.masked_array(a,a<3) > > > > In [462]: a > > Out[462]: > > masked_array(data = [-- -- -- 3 4 5 6 7 8 9], > > mask = [ True True True False False False False False > False > > False], > > fill_value = 999999) > > > > > > In [463]: b=np.arange(10) > > > > In [464]: b=np.ma.masked_array(a,b>7) > > > > In [465]: b > > Out[465]: > > masked_array(data = [-- -- -- 3 4 5 6 7 -- --], > > mask = [ True True True False False False False False > True > > True], > > fill_value = 999999) > > > > > > In [466]: c=np.ma.concatenate((a,b),axis=0) > > > > In [467]: c > > Out[467]: > > masked_array(data = [-- -- -- 3 4 5 6 7 8 9 -- -- -- 3 4 5 6 7 -- --], > > mask = [ True True True False False False False False > False > > False True True > > True False False False False False True True], > > fill_value = 999999) > > > > > > In [468]: c.shape > > Out[468]: (20,) > > > > In [469]: c=np.ma.concatenate((a,b),axis=1) > > maybe you want numpy.ma.column_stack > > for concatenate you need to add extra axis first > > something like > c=np.ma.concatenate((a[:,None], b[:,None]),axis=1) (not tested) > > Josef > > > > > In [470]: c.shape > > Out[470]: (20,) > > > > cheers, > > > > Chao > > > > -- > > > *********************************************************************************** > > Chao YUE > > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > > UMR 1572 CEA-CNRS-UVSQ > > Batiment 712 - Pe 119 > > 91191 GIF Sur YVETTE Cedex > > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > > ************************************************************************************ > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Fri Oct 14 04:33:48 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Fri, 14 Oct 2011 10:33:48 +0200 Subject: [Numpy-discussion] the difference of np.nan, np.NaN, np.NAN? Message-ID: Dear all, is there any difference between np.nan, np.NaN and np.NAN? they really confuse me.... they are all Not a Number? In [75]: np.nan==np.NaN Out[75]: False In [77]: np.NaN==np.NAN Out[77]: False Thanks a lot, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Oct 14 05:04:46 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 14 Oct 2011 05:04:46 -0400 Subject: [Numpy-discussion] the difference of np.nan, np.NaN, np.NAN? In-Reply-To: References: Message-ID: Hi, On Fri, Oct 14, 2011 at 4:33 AM, Chao YUE wrote: > Dear all, > > is there any difference between np.nan, np.NaN and np.NAN? they really > confuse me.... > they are all Not a Number? > > In [75]: np.nan==np.NaN > Out[75]: False > > In [77]: np.NaN==np.NAN > Out[77]: False The nan value is not equal to itself: In [70]: np.nan == np.nan Out[70]: False See: http://en.wikipedia.org/wiki/NaN But: In [71]: np.isnan(np.nan) Out[71]: True In [72]: np.isnan(np.NAN) Out[72]: True In [73]: np.isnan(np.NaN) Out[73]: True Best, Matthew From chaoyuejoy at gmail.com Fri Oct 14 05:14:07 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Fri, 14 Oct 2011 11:14:07 +0200 Subject: [Numpy-discussion] the difference of np.nan, np.NaN, np.NAN? In-Reply-To: References: Message-ID: good answer.... 2011/10/14 Matthew Brett > Hi, > > On Fri, Oct 14, 2011 at 4:33 AM, Chao YUE wrote: > > Dear all, > > > > is there any difference between np.nan, np.NaN and np.NAN? they really > > confuse me.... > > they are all Not a Number? > > > > In [75]: np.nan==np.NaN > > Out[75]: False > > > > In [77]: np.NaN==np.NAN > > Out[77]: False > > The nan value is not equal to itself: > > In [70]: np.nan == np.nan > Out[70]: False > > See: > > http://en.wikipedia.org/wiki/NaN > > But: > > In [71]: np.isnan(np.nan) > Out[71]: True > > In [72]: np.isnan(np.NAN) > Out[72]: True > > In [73]: np.isnan(np.NaN) > Out[73]: True > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Fri Oct 14 06:53:35 2011 From: shish at keba.be (Olivier Delalleau) Date: Fri, 14 Oct 2011 06:53:35 -0400 Subject: [Numpy-discussion] the difference of np.nan, np.NaN, np.NAN? In-Reply-To: References: Message-ID: 2011/10/14 Matthew Brett > Hi, > > On Fri, Oct 14, 2011 at 4:33 AM, Chao YUE wrote: > > Dear all, > > > > is there any difference between np.nan, np.NaN and np.NAN? they really > > confuse me.... > > they are all Not a Number? > > > > In [75]: np.nan==np.NaN > > Out[75]: False > > > > In [77]: np.NaN==np.NAN > > Out[77]: False > > The nan value is not equal to itself: > > In [70]: np.nan == np.nan > Out[70]: False > > See: > > http://en.wikipedia.org/wiki/NaN > > But: > > In [71]: np.isnan(np.nan) > Out[71]: True > > In [72]: np.isnan(np.NAN) > Out[72]: True > > In [73]: np.isnan(np.NaN) > Out[73]: True > > Best, > > Matthew > Also on my computer: >>> numpy.nan is numpy.NAN True >>> numpy.nan is numpy.NaN True So they really are the same. But you shouldn't rely on this (always use numpy.isnan to test for nan-ness). -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Fri Oct 14 07:44:29 2011 From: cournape at gmail.com (David Cournapeau) Date: Fri, 14 Oct 2011 12:44:29 +0100 Subject: [Numpy-discussion] the difference of np.nan, np.NaN, np.NAN? In-Reply-To: References: Message-ID: On Fri, Oct 14, 2011 at 11:53 AM, Olivier Delalleau wrote: > 2011/10/14 Matthew Brett >> >> Hi, >> >> On Fri, Oct 14, 2011 at 4:33 AM, Chao YUE wrote: >> > Dear all, >> > >> > is there any difference between np.nan, np.NaN and np.NAN? they really >> > confuse me.... >> > they are all Not a Number? >> > >> > In [75]: np.nan==np.NaN >> > Out[75]: False >> > >> > In [77]: np.NaN==np.NAN >> > Out[77]: False >> >> The nan value is not equal to itself: >> >> In [70]: np.nan == np.nan >> Out[70]: False >> >> See: >> >> http://en.wikipedia.org/wiki/NaN >> >> But: >> >> In [71]: np.isnan(np.nan) >> Out[71]: True >> >> In [72]: np.isnan(np.NAN) >> Out[72]: True >> >> In [73]: np.isnan(np.NaN) >> Out[73]: True >> >> Best, >> >> Matthew > > Also on my computer: >>>> numpy.nan is numpy.NAN > True >>>> numpy.nan is numpy.NaN > True > > So they really are the same. But you shouldn't rely on this (always use > numpy.isnan to test for nan-ness). They are the same, just not equal to each other (or even to themselves). As for the different NaN names in numpy namespace, I think this is for historical reason (maybe because python itself used to print nan differently on different platforms, although this should not be the case anymore since python 2.6). David From ndbecker2 at gmail.com Fri Oct 14 08:04:39 2011 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 14 Oct 2011 08:04:39 -0400 Subject: [Numpy-discussion] yet another indexing question Message-ID: suppose I have: In [10]: u Out[10]: array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]) And I have a vector v: v = np.array ((0,1,0,1,0)) I want to form an output vector which selects items from u where v is the index of the row of u to be selected. In the above example, I want: w = [0,6,2,8,4] I can't seem to find a syntax that does this. Now, more importantly, I need the result to be a reference to the original array (not a copy), because I'm going to use it on the LHS of an assignment. Is this possible? From warren.weckesser at enthought.com Fri Oct 14 08:59:55 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Fri, 14 Oct 2011 07:59:55 -0500 Subject: [Numpy-discussion] yet another indexing question In-Reply-To: References: Message-ID: On Fri, Oct 14, 2011 at 7:04 AM, Neal Becker wrote: > suppose I have: > > In [10]: u > Out[10]: > array([[0, 1, 2, 3, 4], > [5, 6, 7, 8, 9]]) > > And I have a vector v: > v = np.array ((0,1,0,1,0)) > > I want to form an output vector which selects items from u where v is the > index > of the row of u to be selected. > > In the above example, I want: > > w = [0,6,2,8,4] > > I can't seem to find a syntax that does this. > > Now, more importantly, I need the result to be a reference to the original > array > (not a copy), because I'm going to use it on the LHS of an assignment. Is > this > possible? > In [27]: a Out[27]: array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]) In [28]: v = array([0,1,0,1,0]) In [29]: a[v,range(5)] Out[29]: array([0, 6, 2, 8, 4]) In [30]: a[v,range(5)] = 99 In [31]: a Out[31]: array([[99, 1, 99, 3, 99], [ 5, 99, 7, 99, 9]]) In line [29], the result is a copy, *not* a reference. In [30], however, the assignment does write 99 into a. Warren > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanluc.menut at free.fr Fri Oct 14 09:06:27 2011 From: jeanluc.menut at free.fr (Jean-Luc Menut) Date: Fri, 14 Oct 2011 15:06:27 +0200 Subject: [Numpy-discussion] yet another indexing question In-Reply-To: References: Message-ID: <4E9833D3.9030302@free.fr> What about a=arange(len(v)) w=u[v,a] ? From silva at lma.cnrs-mrs.fr Fri Oct 14 09:51:38 2011 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Fri, 14 Oct 2011 15:51:38 +0200 Subject: [Numpy-discussion] yet another indexing question In-Reply-To: References: Message-ID: <1318600298.19437.3.camel@lma-98.cnrs-mrs.fr> Le vendredi 14 octobre 2011 ? 08:04 -0400, Neal Becker a ?crit : > suppose I have: > > In [10]: u > Out[10]: > array([[0, 1, 2, 3, 4], > [5, 6, 7, 8, 9]]) > > And I have a vector v: > v = np.array ((0,1,0,1,0)) > > I want to form an output vector which selects items from u where v is the index > of the row of u to be selected. > > In the above example, I want: > > w = [0,6,2,8,4] > > I can't seem to find a syntax that does this. > > Now, more importantly, I need the result to be a reference to the original array > (not a copy), because I'm going to use it on the LHS of an assignment. Is this > possible? What about np.where? http://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html w = np.where(v, u[1], u[0]) if you may want to have more than two options (more than two lines for u), then np.choose may be more appropriate http://docs.scipy.org/doc/numpy/reference/generated/numpy.choose.html -- Fabrice Silva From alan.isaac at gmail.com Fri Oct 14 10:24:59 2011 From: alan.isaac at gmail.com (Alan G Isaac) Date: Fri, 14 Oct 2011 10:24:59 -0400 Subject: [Numpy-discussion] simulate AR Message-ID: <4E98463B.8070503@gmail.com> As a simple example, if I have y0 and a white noise series e, what is the best way to produces a series y such that y[t] = 0.9*y[t-1] + e[t] for t=1,2,...? 1. How can I best simulate an autoregressive process using NumPy? 2. With SciPy, it looks like I could do this as e[0] = y0 signal.lfilter((1,),(1,-0.9),e) Am I overlooking similar (or substitute) functionality in NumPy? Thanks, Alan Isaac From ndbecker2 at gmail.com Fri Oct 14 10:33:27 2011 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 14 Oct 2011 10:33:27 -0400 Subject: [Numpy-discussion] yet another indexing question References: <1318600298.19437.3.camel@lma-98.cnrs-mrs.fr> Message-ID: Fabrice Silva wrote: > Le vendredi 14 octobre 2011 ? 08:04 -0400, Neal Becker a ?crit : >> suppose I have: >> >> In [10]: u >> Out[10]: >> array([[0, 1, 2, 3, 4], >> [5, 6, 7, 8, 9]]) >> >> And I have a vector v: >> v = np.array ((0,1,0,1,0)) >> >> I want to form an output vector which selects items from u where v is the >> index of the row of u to be selected. >> >> In the above example, I want: >> >> w = [0,6,2,8,4] >> >> I can't seem to find a syntax that does this. >> >> Now, more importantly, I need the result to be a reference to the original >> array >> (not a copy), because I'm going to use it on the LHS of an assignment. Is >> this possible? > > What about np.where? > http://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html > > w = np.where(v, u[1], u[0]) > > if you may want to have more than two options (more than two lines for > u), then np.choose may be more appropriate > http://docs.scipy.org/doc/numpy/reference/generated/numpy.choose.html Will np.choose result in a lval? From josef.pktd at gmail.com Fri Oct 14 10:49:04 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 14 Oct 2011 10:49:04 -0400 Subject: [Numpy-discussion] simulate AR In-Reply-To: <4E98463B.8070503@gmail.com> References: <4E98463B.8070503@gmail.com> Message-ID: On Fri, Oct 14, 2011 at 10:24 AM, Alan G Isaac wrote: > As a simple example, if I have y0 and a white noise series e, > what is the best way to produces a series y such that y[t] = 0.9*y[t-1] + e[t] > for t=1,2,...? > > 1. How can I best simulate an autoregressive process using NumPy? > > 2. With SciPy, it looks like I could do this as > e[0] = y0 > signal.lfilter((1,),(1,-0.9),e) > Am I overlooking similar (or substitute) functionality in NumPy? I don't think so. At least I didn't find anything in numpy for this. An MA process would be a convolution, but for simulating AR I only found signal.lfilter. (unless numpy has gained extra features that I don't have in 1.5) Except, I think it's possible to do it with fft, if you want to fft-inverse-convolve (?) But simulating an ARMA with fft was much slower than lfilter in my short experimentation with it. Josef > > Thanks, > Alan Isaac > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From silva at lma.cnrs-mrs.fr Fri Oct 14 11:56:30 2011 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Fri, 14 Oct 2011 17:56:30 +0200 Subject: [Numpy-discussion] simulate AR In-Reply-To: References: <4E98463B.8070503@gmail.com> Message-ID: <1318607790.19437.4.camel@lma-98.cnrs-mrs.fr> Le vendredi 14 octobre 2011 ? 10:49 -0400, josef.pktd at gmail.com a ?crit : > On Fri, Oct 14, 2011 at 10:24 AM, Alan G Isaac wrote: > > As a simple example, if I have y0 and a white noise series e, > > what is the best way to produces a series y such that y[t] = 0.9*y[t-1] + e[t] > > for t=1,2,...? > > > > 1. How can I best simulate an autoregressive process using NumPy? > > > > 2. With SciPy, it looks like I could do this as > > e[0] = y0 > > signal.lfilter((1,),(1,-0.9),e) > > Am I overlooking similar (or substitute) functionality in NumPy? > > I don't think so. At least I didn't find anything in numpy for this. > An MA process would be a convolution, but for simulating AR I only > found signal.lfilter. (unless numpy has gained extra features that I > don't have in 1.5) > > Except, I think it's possible to do it with fft, if you want to > fft-inverse-convolve (?) > But simulating an ARMA with fft was much slower than lfilter in my > short experimentation with it. About speed comparison between lfilter, convolve, etc... http://www.scipy.org/Cookbook/ApplyFIRFilter -- Fabrice Silva From radimrehurek at seznam.cz Fri Oct 14 11:58:33 2011 From: radimrehurek at seznam.cz (Radim) Date: Fri, 14 Oct 2011 08:58:33 -0700 (PDT) Subject: [Numpy-discussion] statistics in python In-Reply-To: References: Message-ID: <840e8d0d-9711-4ccd-a392-5827fcad57b2@e37g2000yqa.googlegroups.com> Hi Rense (cross-posting to the numpy mailing list because these guys are awesome), On Oct 13, 10:01?pm, Rense Lange wrote: > I have potentially millions of tuples and > I want to create frequency distributions conditional on the values of > discrete variables v1, v2, ... (e.g. the sums for boys vs. girls), or > combinations thereof (poor boys, poor girls, rich boys, rich girls). > Very few of the v1 x v2 x ... combinations might actually occur. Also, > it is sometimes necessary to combine different data sets. > > Should I just use some DB system (and if so, which one is best within > Python), or are there sparse matrix methods that are to be preferred? yes, that sounds like a job for a database. Sqlite is built-in (=part of standard Python library: `import sqlite3`). Numpy supports structured arrays (records) as well: >>> import numpy >>> dt = numpy.dtype([('name', 'a10'), ('wealth', numpy.int32), ('sex', 'a1')]) >>> x = numpy.array([('Mary', 10000, 'F'), ('John', 1000, 'M'), ('unknown', -1, '?')], dtype=dt) >>> print x[(x['wealth'] > 5000) & (x['sex'] == 'F')] # print records for all rich girls [('Mary', 10000, 'F')] so perhaps that could also fit your bill. There are no indexes but it's more pleasant to work with, imo. Note that "potentially millions of records" is not particularly big, so as long as you don't have too many variables, some in-memory db should be ok and will save you from headaches of dealing with complex db setups. I have also heard very good things about pytables, though i've never used it myself (gensim uses plain float matrices), you can have a look there: http://pytables.org HTH, Radim > > On Oct 12, 10:01?pm, Rense Lange wrote: > > > > Gensim must be storing data very efficiently, and I need to do something > > > similar for another application. Can you tell me what Python programming > > > approach was used in Gensim, and are there perhaps particular sections in > > > the Gensim code that I shoGuld be looking at for inspiration and examples? > > > that question is too broad. What kind of data and application do you > > have? What sort of efficiency are you after? (fast access/little disk > > space/fast load time/...?) > > > Radim From josef.pktd at gmail.com Fri Oct 14 12:21:43 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 14 Oct 2011 12:21:43 -0400 Subject: [Numpy-discussion] simulate AR In-Reply-To: <1318607790.19437.4.camel@lma-98.cnrs-mrs.fr> References: <4E98463B.8070503@gmail.com> <1318607790.19437.4.camel@lma-98.cnrs-mrs.fr> Message-ID: On Fri, Oct 14, 2011 at 11:56 AM, Fabrice Silva wrote: > Le vendredi 14 octobre 2011 ? 10:49 -0400, josef.pktd at gmail.com a > ?crit : >> On Fri, Oct 14, 2011 at 10:24 AM, Alan G Isaac wrote: >> > As a simple example, if I have y0 and a white noise series e, >> > what is the best way to produces a series y such that y[t] = 0.9*y[t-1] + e[t] >> > for t=1,2,...? >> > >> > 1. How can I best simulate an autoregressive process using NumPy? >> > >> > 2. With SciPy, it looks like I could do this as >> > e[0] = y0 >> > signal.lfilter((1,),(1,-0.9),e) >> > Am I overlooking similar (or substitute) functionality in NumPy? >> >> I don't think so. At least I didn't find anything in numpy for this. >> An MA process would be a convolution, but for simulating AR I only >> found signal.lfilter. (unless numpy has gained extra features that I >> don't have in 1.5) >> >> Except, I think it's possible to do it with fft, if you want to >> fft-inverse-convolve (?) >> But simulating an ARMA with fft was much slower than lfilter in my >> short experimentation with it. > > About speed comparison between lfilter, convolve, etc... > http://www.scipy.org/Cookbook/ApplyFIRFilter One other way to simulate the AR is to get the (truncated) MA-representation, and then convolve can be used, as in AppluFIRFilter. numpy polynomials can be used it invert the AR-polynomial (with a bit of juggling.) Josef > > -- > Fabrice Silva > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From alan.isaac at gmail.com Fri Oct 14 12:49:43 2011 From: alan.isaac at gmail.com (Alan G Isaac) Date: Fri, 14 Oct 2011 12:49:43 -0400 Subject: [Numpy-discussion] simulate AR In-Reply-To: References: <4E98463B.8070503@gmail.com> <1318607790.19437.4.camel@lma-98.cnrs-mrs.fr> Message-ID: <4E986827.7000701@gmail.com> On 10/14/2011 12:21 PM, josef.pktd at gmail.com wrote: > One other way to simulate the AR is to get the (truncated) > MA-representation, and then convolve can be used Assuming stationarity ... Alan From josef.pktd at gmail.com Fri Oct 14 13:22:17 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 14 Oct 2011 13:22:17 -0400 Subject: [Numpy-discussion] simulate AR In-Reply-To: <4E986827.7000701@gmail.com> References: <4E98463B.8070503@gmail.com> <1318607790.19437.4.camel@lma-98.cnrs-mrs.fr> <4E986827.7000701@gmail.com> Message-ID: On Fri, Oct 14, 2011 at 12:49 PM, Alan G Isaac wrote: > On 10/14/2011 12:21 PM, josef.pktd at gmail.com wrote: >> One other way to simulate the AR is to get the (truncated) >> MA-representation, and then convolve can be used > > > Assuming stationarity ... maybe ? If it's integrated, then you need a starting point and cumsum might still work. (like in a random walk) No idea about seasonal integration, it would require too much thinking (not tested) Josef > > Alan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From alan.isaac at gmail.com Fri Oct 14 13:26:34 2011 From: alan.isaac at gmail.com (Alan G Isaac) Date: Fri, 14 Oct 2011 13:26:34 -0400 Subject: [Numpy-discussion] simulate AR In-Reply-To: References: <4E98463B.8070503@gmail.com> <1318607790.19437.4.camel@lma-98.cnrs-mrs.fr> <4E986827.7000701@gmail.com> Message-ID: <4E9870CA.9080705@gmail.com> >> Assuming stationarity ... On 10/14/2011 1:22 PM, josef.pktd at gmail.com wrote: > maybe ? I just meant that the MA approximation is not reliable for a non-stationary AR. E.g., http://www.jstor.org/stable/2348631 Cheers, Alan From josef.pktd at gmail.com Fri Oct 14 13:42:18 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 14 Oct 2011 13:42:18 -0400 Subject: [Numpy-discussion] simulate AR In-Reply-To: <4E9870CA.9080705@gmail.com> References: <4E98463B.8070503@gmail.com> <1318607790.19437.4.camel@lma-98.cnrs-mrs.fr> <4E986827.7000701@gmail.com> <4E9870CA.9080705@gmail.com> Message-ID: On Fri, Oct 14, 2011 at 1:26 PM, Alan G Isaac wrote: >>> Assuming stationarity ... > > On 10/14/2011 1:22 PM, josef.pktd at gmail.com wrote: >> maybe ? > > I just meant that the MA approximation is > not reliable for a non-stationary AR. > E.g., http://www.jstor.org/stable/2348631 section 5: simulating an ARIMA: simulate stationary ARMA, then cumsum it. I guess, this only applies to simple integrated processes, where we can split it up into ar(L)(1-L) y_t with ar(L) a stationary polynomials. (besides seasonal integration, I haven't seen or used any other non-stationary AR processes.) If I remember correctly, signal.lfilter doesn't require stationarity, but handling of the starting values is a bit difficult. Josef > > Cheers, > Alan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From alan.isaac at gmail.com Fri Oct 14 14:18:29 2011 From: alan.isaac at gmail.com (Alan G Isaac) Date: Fri, 14 Oct 2011 14:18:29 -0400 Subject: [Numpy-discussion] simulate AR In-Reply-To: References: <4E98463B.8070503@gmail.com> <1318607790.19437.4.camel@lma-98.cnrs-mrs.fr> <4E986827.7000701@gmail.com> <4E9870CA.9080705@gmail.com> Message-ID: <4E987CF5.9060906@gmail.com> On 10/14/2011 1:42 PM, josef.pktd at gmail.com wrote: > If I remember correctly, signal.lfilter doesn't require stationarity, > but handling of the starting values is a bit difficult. Hmm. Yes. AR(1) is trivial, but how do you handle higher orders? Thanks, Alan From josef.pktd at gmail.com Fri Oct 14 14:29:46 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 14 Oct 2011 14:29:46 -0400 Subject: [Numpy-discussion] simulate AR In-Reply-To: <4E987CF5.9060906@gmail.com> References: <4E98463B.8070503@gmail.com> <1318607790.19437.4.camel@lma-98.cnrs-mrs.fr> <4E986827.7000701@gmail.com> <4E9870CA.9080705@gmail.com> <4E987CF5.9060906@gmail.com> Message-ID: On Fri, Oct 14, 2011 at 2:18 PM, Alan G Isaac wrote: > On 10/14/2011 1:42 PM, josef.pktd at gmail.com wrote: >> If I remember correctly, signal.lfilter doesn't require stationarity, >> but handling of the starting values is a bit difficult. > > > Hmm. ?Yes. > AR(1) is trivial, but how do you handle higher orders? I would have to look for it. You can invert the stationary part of the AR polynomial with the numpy polynomial classes using division. The main thing is to pad with enough zeros corresponding to the truncation that you want. And in the old classes to watch out because the order is reversed high to low instead of low to high or the other way around. I switched to using mostly lfilter, but I guess the statsmodels sandbox (and the mailing list) still has my "playing with ARMA polynomials" code. (I think it might be pretty useful for teaching. I wished I had the functions to calculate some examples when I learned this.) Josef > > Thanks, > Alan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jsseabold at gmail.com Fri Oct 14 14:39:03 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Fri, 14 Oct 2011 14:39:03 -0400 Subject: [Numpy-discussion] simulate AR In-Reply-To: <4E987CF5.9060906@gmail.com> References: <4E98463B.8070503@gmail.com> <1318607790.19437.4.camel@lma-98.cnrs-mrs.fr> <4E986827.7000701@gmail.com> <4E9870CA.9080705@gmail.com> <4E987CF5.9060906@gmail.com> Message-ID: On Fri, Oct 14, 2011 at 2:18 PM, Alan G Isaac wrote: > > On 10/14/2011 1:42 PM, josef.pktd at gmail.com wrote: > > If I remember correctly, signal.lfilter doesn't require stationarity, > > but handling of the starting values is a bit difficult. > > > Hmm. ?Yes. > AR(1) is trivial, but how do you handle higher orders? > Not sure if this is what you're after, but here I go the other way signal -> noise with known initial values of an ARMA(p,q) process. Here I want to set it such that the first p error terms are zero, I had to solve for the zi that make this so https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/tsa/arima_model.py#L295 This is me talking to myself about this. http://thread.gmane.org/gmane.comp.python.scientific.user/27162/focus=27162 Skipper From josef.pktd at gmail.com Fri Oct 14 14:46:52 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 14 Oct 2011 14:46:52 -0400 Subject: [Numpy-discussion] simulate AR In-Reply-To: References: <4E98463B.8070503@gmail.com> <1318607790.19437.4.camel@lma-98.cnrs-mrs.fr> <4E986827.7000701@gmail.com> <4E9870CA.9080705@gmail.com> <4E987CF5.9060906@gmail.com> Message-ID: On Fri, Oct 14, 2011 at 2:39 PM, Skipper Seabold wrote: > On Fri, Oct 14, 2011 at 2:18 PM, Alan G Isaac wrote: >> >> On 10/14/2011 1:42 PM, josef.pktd at gmail.com wrote: >> > If I remember correctly, signal.lfilter doesn't require stationarity, >> > but handling of the starting values is a bit difficult. >> >> >> Hmm. ?Yes. >> AR(1) is trivial, but how do you handle higher orders? >> > > Not sure if this is what you're after, but here I go the other way > signal -> noise with known initial values of an ARMA(p,q) process. > Here I want to set it such that the first p error terms are zero, I > had to solve for the zi that make this so > > https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/tsa/arima_model.py#L295 > > This is me talking to myself about this. > > http://thread.gmane.org/gmane.comp.python.scientific.user/27162/focus=27162 with two more simultaneous threads on the statsmodels mailing list. :) Josef > > Skipper > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Fri Oct 14 14:59:32 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 14 Oct 2011 14:59:32 -0400 Subject: [Numpy-discussion] simulate AR In-Reply-To: References: <4E98463B.8070503@gmail.com> <1318607790.19437.4.camel@lma-98.cnrs-mrs.fr> <4E986827.7000701@gmail.com> <4E9870CA.9080705@gmail.com> <4E987CF5.9060906@gmail.com> Message-ID: On Fri, Oct 14, 2011 at 2:29 PM, wrote: > On Fri, Oct 14, 2011 at 2:18 PM, Alan G Isaac wrote: >> On 10/14/2011 1:42 PM, josef.pktd at gmail.com wrote: >>> If I remember correctly, signal.lfilter doesn't require stationarity, >>> but handling of the starting values is a bit difficult. >> >> >> Hmm. ?Yes. >> AR(1) is trivial, but how do you handle higher orders? > > I would have to look for it. > You can invert the stationary part of the AR polynomial with the numpy > polynomial classes using division. The main thing is to pad with > enough zeros corresponding to the truncation that you want. And in the > old classes to watch out because the order is reversed high to low > instead of low to high or the other way around. I found it in the examples folder (pure numpy) https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/examples/tsa/lagpolynomial.py >>> ar = LagPolynomial([1, -0.8]) >>> ma = LagPolynomial([1]) >>> ma.div(ar) Polynomial([ 1. , 0.8], [-1., 1.]) >>> ma.div(ar, maxlag=50) Polynomial([ 1.00000000e+00, 8.00000000e-01, 6.40000000e-01, 5.12000000e-01, 4.09600000e-01, 3.27680000e-01, 2.62144000e-01, 2.09715200e-01, 1.67772160e-01, 1.34217728e-01, 1.07374182e-01, 8.58993459e-02, 6.87194767e-02, 5.49755814e-02, 4.39804651e-02, 3.51843721e-02, 2.81474977e-02, 2.25179981e-02, 1.80143985e-02, 1.44115188e-02, 1.15292150e-02, 9.22337204e-03, 7.37869763e-03, 5.90295810e-03, 4.72236648e-03, 3.77789319e-03, 3.02231455e-03, 2.41785164e-03, 1.93428131e-03, 1.54742505e-03, 1.23794004e-03, 9.90352031e-04, 7.92281625e-04, 6.33825300e-04, 5.07060240e-04, 4.05648192e-04, 3.24518554e-04, 2.59614843e-04, 2.07691874e-04, 1.66153499e-04, 1.32922800e-04, 1.06338240e-04, 8.50705917e-05, 6.80564734e-05, 5.44451787e-05, 4.35561430e-05, 3.48449144e-05, 2.78759315e-05, 2.23007452e-05], [-1., 1.]) >>> ar = LagPolynomial([1, -0.8, 0.2, 0.1, 0.1]) >>> ma.div(ar, maxlag=50) Polynomial([ 1.00000000e+00, 8.00000000e-01, 4.40000000e-01, 9.20000000e-02, -1.94400000e-01, -2.97920000e-01, -2.52656000e-01, -1.32300800e-01, -6.07744000e-03, 7.66558080e-02, 1.01035814e-01, 7.93353139e-02, 3.62032515e-02, -4.67362386e-03, -2.90166622e-02, -3.38324615e-02, -2.44155995e-02, -9.39695872e-03, 3.65046531e-03, 1.06245701e-02, 1.11508188e-02, 7.37039040e-03, 2.23864501e-03, -1.86070097e-03, -3.78841070e-03, -3.61949191e-03, -2.17570579e-03, -4.51755084e-04, 8.14527351e-04, 1.32149267e-03, 1.15703475e-03, 6.25052041e-04, 5.50326804e-05, -3.28837006e-04, -4.52284820e-04, -3.64068927e-04, -1.73417745e-04, 1.21917720e-05, 1.26072341e-04, 1.52168186e-04, 1.12642678e-04, 4.58540937e-05, -1.36693133e-05, -4.65873557e-05, -5.03856990e-05, -3.42095661e-05], [-1., 1.]) no guarantees, I don't remember how much I tested these things, but I spent a lot of time doing it 3 or 4 different ways. Josef > > I switched to using mostly lfilter, but I guess the statsmodels > sandbox (and the mailing list) still has my "playing with ARMA > polynomials" code. > (I think it might be pretty useful for teaching. I wished I had the > functions to calculate some examples when I learned this.) > > Josef >> >> Thanks, >> Alan >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > From josef.pktd at gmail.com Fri Oct 14 16:45:29 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 14 Oct 2011 16:45:29 -0400 Subject: [Numpy-discussion] simulate AR In-Reply-To: References: <4E98463B.8070503@gmail.com> <1318607790.19437.4.camel@lma-98.cnrs-mrs.fr> <4E986827.7000701@gmail.com> <4E9870CA.9080705@gmail.com> <4E987CF5.9060906@gmail.com> Message-ID: On Fri, Oct 14, 2011 at 2:59 PM, wrote: > On Fri, Oct 14, 2011 at 2:29 PM, ? wrote: >> On Fri, Oct 14, 2011 at 2:18 PM, Alan G Isaac wrote: >>> On 10/14/2011 1:42 PM, josef.pktd at gmail.com wrote: >>>> If I remember correctly, signal.lfilter doesn't require stationarity, >>>> but handling of the starting values is a bit difficult. >>> >>> >>> Hmm. ?Yes. >>> AR(1) is trivial, but how do you handle higher orders? >> >> I would have to look for it. >> You can invert the stationary part of the AR polynomial with the numpy >> polynomial classes using division. The main thing is to pad with >> enough zeros corresponding to the truncation that you want. And in the >> old classes to watch out because the order is reversed high to low >> instead of low to high or the other way around. > > I found it in the examples folder (pure numpy) > https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/examples/tsa/lagpolynomial.py > >>>> ar = LagPolynomial([1, -0.8]) >>>> ma = LagPolynomial([1]) >>>> ma.div(ar) > Polynomial([ 1. , ?0.8], [-1., ?1.]) >>>> ma.div(ar, maxlag=50) > Polynomial([ ?1.00000000e+00, ? 8.00000000e-01, ? 6.40000000e-01, > ? ? ? ? 5.12000000e-01, ? 4.09600000e-01, ? 3.27680000e-01, > ? ? ? ? 2.62144000e-01, ? 2.09715200e-01, ? 1.67772160e-01, > ? ? ? ? 1.34217728e-01, ? 1.07374182e-01, ? 8.58993459e-02, > ? ? ? ? 6.87194767e-02, ? 5.49755814e-02, ? 4.39804651e-02, > ? ? ? ? 3.51843721e-02, ? 2.81474977e-02, ? 2.25179981e-02, > ? ? ? ? 1.80143985e-02, ? 1.44115188e-02, ? 1.15292150e-02, > ? ? ? ? 9.22337204e-03, ? 7.37869763e-03, ? 5.90295810e-03, > ? ? ? ? 4.72236648e-03, ? 3.77789319e-03, ? 3.02231455e-03, > ? ? ? ? 2.41785164e-03, ? 1.93428131e-03, ? 1.54742505e-03, > ? ? ? ? 1.23794004e-03, ? 9.90352031e-04, ? 7.92281625e-04, > ? ? ? ? 6.33825300e-04, ? 5.07060240e-04, ? 4.05648192e-04, > ? ? ? ? 3.24518554e-04, ? 2.59614843e-04, ? 2.07691874e-04, > ? ? ? ? 1.66153499e-04, ? 1.32922800e-04, ? 1.06338240e-04, > ? ? ? ? 8.50705917e-05, ? 6.80564734e-05, ? 5.44451787e-05, > ? ? ? ? 4.35561430e-05, ? 3.48449144e-05, ? 2.78759315e-05, > ? ? ? ? 2.23007452e-05], [-1., ?1.]) > >>>> ar = LagPolynomial([1, -0.8, 0.2, 0.1, 0.1]) >>>> ma.div(ar, maxlag=50) > Polynomial([ ?1.00000000e+00, ? 8.00000000e-01, ? 4.40000000e-01, > ? ? ? ? 9.20000000e-02, ?-1.94400000e-01, ?-2.97920000e-01, > ? ? ? ?-2.52656000e-01, ?-1.32300800e-01, ?-6.07744000e-03, > ? ? ? ? 7.66558080e-02, ? 1.01035814e-01, ? 7.93353139e-02, > ? ? ? ? 3.62032515e-02, ?-4.67362386e-03, ?-2.90166622e-02, > ? ? ? ?-3.38324615e-02, ?-2.44155995e-02, ?-9.39695872e-03, > ? ? ? ? 3.65046531e-03, ? 1.06245701e-02, ? 1.11508188e-02, > ? ? ? ? 7.37039040e-03, ? 2.23864501e-03, ?-1.86070097e-03, > ? ? ? ?-3.78841070e-03, ?-3.61949191e-03, ?-2.17570579e-03, > ? ? ? ?-4.51755084e-04, ? 8.14527351e-04, ? 1.32149267e-03, > ? ? ? ? 1.15703475e-03, ? 6.25052041e-04, ? 5.50326804e-05, > ? ? ? ?-3.28837006e-04, ?-4.52284820e-04, ?-3.64068927e-04, > ? ? ? ?-1.73417745e-04, ? 1.21917720e-05, ? 1.26072341e-04, > ? ? ? ? 1.52168186e-04, ? 1.12642678e-04, ? 4.58540937e-05, > ? ? ? ?-1.36693133e-05, ?-4.65873557e-05, ?-5.03856990e-05, > ? ? ? ?-3.42095661e-05], [-1., ?1.]) I just realized that my LagPolynomial has a filter method >>> marep = ma.div(ar, maxlag=50) >>> u = np.random.randn(5000) >>> x = marep.filter(u)[1000:] >>> import scikits.statsmodels.api as sm >>> sm.tsa.AR(x).fit(4, trend='nc').params array([ 0.80183437, -0.22098967, -0.08484519, -0.12590277]) >>> #true (different convention) >>> -np.array([1, -0.8, 0.2, 0.1, 0.1])[1:] array([ 0.8, -0.2, -0.1, -0.1]) not bad, if the sample is large enough. I don't remember what numpy polynomial use under the hood (maybe convolve) > > no guarantees, I don't remember how much I tested these things, but I > spent a lot of time doing it 3 or 4 different ways. Josef > > Josef > >> >> I switched to using mostly lfilter, but I guess the statsmodels >> sandbox (and the mailing list) still has my "playing with ARMA >> polynomials" code. >> (I think it might be pretty useful for teaching. I wished I had the >> functions to calculate some examples when I learned this.) >> >> Josef >>> >>> Thanks, >>> Alan >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> > From neilcrighton at gmail.com Sat Oct 15 07:05:14 2011 From: neilcrighton at gmail.com (Neil) Date: Sat, 15 Oct 2011 11:05:14 +0000 (UTC) Subject: [Numpy-discussion] ndarray with double comparison References: Message-ID: Marc Shivers gmail.com> writes: > > you could use bitwise comparison with paretheses:? In [8]: (a>4)&(a<8)Out[8]: array([False, False, False, False, False,? True,? True,? True, False,?????? False, False], dtype=bool) > For cases like this I find it very useful to define a function between() - e.g. https://bitbucket.org/nhmc/pyserpens/src/4e2cc9b656ae/utilities.py#cl-88 Then you can use between(a, 4, 8) instead of (4 < a) & (a < 8), which I find less readable and more difficult to type. Neil From chaoyuejoy at gmail.com Sat Oct 15 07:30:26 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Sat, 15 Oct 2011 13:30:26 +0200 Subject: [Numpy-discussion] ndarray with double comparison In-Reply-To: References: Message-ID: Thanks. quite useful!! Chao 2011/10/15 Neil > Marc Shivers gmail.com> writes: > > > > > you could use bitwise comparison with paretheses: In [8]: > (a>4)&(a<8)Out[8]: > array([False, False, False, False, False, True, True, True, False, > False, False], dtype=bool) > > > > For cases like this I find it very useful to define a function between() - > e.g. > https://bitbucket.org/nhmc/pyserpens/src/4e2cc9b656ae/utilities.py#cl-88 > > Then you can use > > between(a, 4, 8) > > instead of > > (4 < a) & (a < 8), > > which I find less readable and more difficult to type. > > Neil > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Oct 15 14:12:41 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 15 Oct 2011 14:12:41 -0400 Subject: [Numpy-discussion] Float128 integer comparison Message-ID: Hi, Continuing the exploration of float128 - can anyone explain this behavior? >>> np.float64(9223372036854775808.0) == 9223372036854775808L True >>> np.float128(9223372036854775808.0) == 9223372036854775808L False >>> int(np.float128(9223372036854775808.0)) == 9223372036854775808L True >>> np.round(np.float128(9223372036854775808.0)) == np.float128(9223372036854775808.0) True Thanks for any pointers, Best, Matthew From matthew.brett at gmail.com Sat Oct 15 14:54:33 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 15 Oct 2011 14:54:33 -0400 Subject: [Numpy-discussion] float128 casting rounding as if it were float64 In-Reply-To: References: Message-ID: Hi, On Wed, Oct 12, 2011 at 11:24 AM, Charles R Harris wrote: > > > On Tue, Oct 11, 2011 at 12:17 PM, Matthew Brett > wrote: >> >> Hi, >> >> While struggling with floating point precision, I ran into this: >> >> In [52]: a = 2**54+3 >> >> In [53]: a >> Out[53]: 18014398509481987L >> >> In [54]: np.float128(a) >> Out[54]: 18014398509481988.0 >> >> In [55]: np.float128(a)-1 >> Out[55]: 18014398509481987.0 >> >> The line above tells us that float128 can exactly represent 2**54+3, >> but the line above that says that np.float128(2**54+3) rounds upwards >> as if it were a float64: >> >> In [59]: np.float64(a) >> Out[59]: 18014398509481988.0 >> >> In [60]: np.float64(a)-1 >> Out[60]: 18014398509481988.0 >> >> Similarly: >> >> In [66]: np.float128('1e308') >> Out[66]: 1.000000000000000011e+308 >> >> In [67]: np.float128('1e309') >> Out[67]: inf >> >> Is it possible that float64 is being used somewhere in float128 casting? >> > > The problem is probably in specifying the values. Python doesn't support > long double and I expect python integers to be converted to doubles, then > cast to long double. Presumably our (numpy) casting function receives the python integer, and therefore its us who are doing the conversion? If so, surely that is a bug? > The only way to get around this is probably using > string representations of the numbers,? and I don't know how > well/consistently numpy does that at the moment. If it calls python to do > the job, then double is probably what is returned. It doesn't help on my > system: > > In [1]: float128("18014398509481987.0") > Out[1]: 18014398509481988.0 Note though that: >> In [66]: np.float128('1e308') >> Out[66]: 1.000000000000000011e+308 >> >> In [67]: np.float128('1e309') >> Out[67]: inf and I infer that we (numpy) are using float64 for converting the strings; that seems to me to be the likely explanation of both phenomena - do you agree? See you, Matthew From matthew.brett at gmail.com Sat Oct 15 15:00:45 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 15 Oct 2011 15:00:45 -0400 Subject: [Numpy-discussion] abs for max negative integers - desired behavior? In-Reply-To: References: <4E954D4B.7040905@esrf.fr> <4E955BDF.3010704@esrf.fr> Message-ID: Hi, On Wed, Oct 12, 2011 at 8:31 AM, David Cournapeau wrote: > On 10/12/11, "V. Armando Sol?" wrote: >> On 12/10/2011 10:46, David Cournapeau wrote: >>> On Wed, Oct 12, 2011 at 9:18 AM, "V. Armando Sol?" wrote: >>>> ? From a pure user perspective, I would not expect the abs function to >>>> return a negative number. Returning +127 plus a warning the first time >>>> that happens seems to me a good compromise. >>> I guess the question is what's the common context to use small >>> integers in the first place. If it is to save memory, then upcasting >>> may not be the best solution. I may be wrong, but if you decide to use >>> those types in the first place, you need to know about overflows. Abs >>> is just one of them (dividing by -1 is another, although this one >>> actually raises an exception). >>> >>> Detecting it may be costly, but this would need benchmarking. >>> >>> That being said, without context, I don't find 127 a better solution than >>> -128. >> >> Well that choice is just based on getting the closest positive number to >> the true value (128). The context can be anything, for instance you >> could be using a look up table based on the result of an integer >> operation ... >> >> In terms of cost, it would imply to evaluate the cost of something like: >> >> a = abs(x); >> ? if (a < 0) {a -= MIN_INT;} >> return a; > > Yes, this is costly: it adds a branch to a trivial operation. I did > some preliminary benchmarks (would need confirmation when I have more > than one minute to spend on this): > > ?int8, 2**16 long array. Before check: 16 us. After check: 92 us. 5-6 > times slower > ?int8, 2**24 long array. Before check: 20ms. After check: 30ms. 30 % slower. > > There is also the issue of signaling the error in the ufunc machinery. > I forgot whether this is possible at that level. I suppose that returning the equivalent uint type would be of zero cost though? I don't think the problem should be relegated to 'people should know about this' because this a problem for any signed integer type, and it can lead to nasty errors which people are unlikely to test for. See you, Matthew From matthew.brett at gmail.com Sat Oct 15 15:20:51 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 15 Oct 2011 15:20:51 -0400 Subject: [Numpy-discussion] Nice float -> integer conversion? In-Reply-To: References: Message-ID: Hi, On Tue, Oct 11, 2011 at 7:32 PM, Benjamin Root wrote: > On Tue, Oct 11, 2011 at 2:06 PM, Derek Homeier > wrote: >> >> On 11 Oct 2011, at 20:06, Matthew Brett wrote: >> >> > Have I missed a fast way of doing nice float to integer conversion? >> > >> > By nice I mean, rounding to the nearest integer, converting NaN to 0, >> > inf, -inf to the max and min of the integer range? ?The astype method >> > and cast functions don't do what I need here: >> > >> > In [40]: np.array([1.6, np.nan, np.inf, -np.inf]).astype(np.int16) >> > Out[40]: array([1, 0, 0, 0], dtype=int16) >> > >> > In [41]: np.cast[np.int16](np.array([1.6, np.nan, np.inf, -np.inf])) >> > Out[41]: array([1, 0, 0, 0], dtype=int16) >> > >> > Have I missed something obvious? >> >> np.[a]round comes closer to what you wish (is there consensus >> that NaN should map to 0?), but not quite there, and it's not really >> consistent either! >> > > In a way, there is already consensus in the code.? np.nan_to_num() by > default converts nans to zero, and the infinities go to very large and very > small. > > ??? >>> np.set_printoptions(precision=8) > ??? >>> x = np.array([np.inf, -np.inf, np.nan, -128, 128]) > ??? >>> np.nan_to_num(x) > ??? array([? 1.79769313e+308,? -1.79769313e+308,?? 0.00000000e+000, > ??????????? -1.28000000e+002,?? 1.28000000e+002]) Right - but - we'd still need to round, and take care of the nasty issue of thresholding: >>> x = np.array([np.inf, -np.inf, np.nan, -128, 128]) >>> x array([ inf, -inf, nan, -128., 128.]) >>> nnx = np.nan_to_num(x) >>> nnx array([ 1.79769313e+308, -1.79769313e+308, 0.00000000e+000, -1.28000000e+002, 1.28000000e+002]) >>> np.rint(nnx).astype(np.int8) array([ 0, 0, 0, -128, -128], dtype=int8) So, I think nice_round would look something like: def nice_round(arr, out_type): in_type = arr.dtype.type mx = floor_exact(np.iinfo(out_type).max, in_type) mn = floor_exact(np.iinfo(out_type).max, in_type) nans = np.isnan(arr) out = np.rint(np.clip(arr, mn, mx)).astype(out_type) out[nans] = 0 return out with floor_exact being something like: https://github.com/matthew-brett/nibabel/blob/range-dtype-conversions/nibabel/floating.py See you, Matthew From sourceforge.numpy at user.fastmail.fm Sat Oct 15 15:21:54 2011 From: sourceforge.numpy at user.fastmail.fm (Hugo Gagnon) Date: Sat, 15 Oct 2011 15:21:54 -0400 Subject: [Numpy-discussion] Printing individual array elements with at least 15 significant digits Message-ID: <1318706514.2447.140660986326665@webmail.messagingengine.com> Hello, I need to print individual elements of a float64 array to a text file. However in the file I only get 12 significant digits, the same as with: >>> a = np.zeros(3) >>> a.fill(1./3) >>> print a[0] 0.333333333333 >>> len(str(a[0])) - 2 12 whereas >>> len(repr(a[0])) - 2 17 which makes more sense since I am expecting at least 15 significant digits? So how can I print a np.float64 with at least 15 significant digits (without repr!)? From derek at astro.physik.uni-goettingen.de Sat Oct 15 15:34:10 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Sat, 15 Oct 2011 21:34:10 +0200 Subject: [Numpy-discussion] Printing individual array elements with at least 15 significant digits In-Reply-To: <1318706514.2447.140660986326665@webmail.messagingengine.com> References: <1318706514.2447.140660986326665@webmail.messagingengine.com> Message-ID: <088CBB2F-63C5-49CE-9380-101E91FAFC6B@astro.physik.uni-goettingen.de> On 15.10.2011, at 9:21PM, Hugo Gagnon wrote: > I need to print individual elements of a float64 array to a text file. > However in the file I only get 12 significant digits, the same as with: > >>>> a = np.zeros(3) >>>> a.fill(1./3) >>>> print a[0] > 0.333333333333 >>>> len(str(a[0])) - 2 > 12 > > whereas > >>>> len(repr(a[0])) - 2 > 17 > > which makes more sense since I am expecting at least 15 significant > digits? > > So how can I print a np.float64 with at least 15 significant digits > (without repr!)? You mean like >>> '%.15e' % (1./3) '3.333333333333333e-01' ? If you are using e.g. savetxt to print to the file, you can specify the format the same way (actually the default for savetxt is already "%.18e", which should satisfy your demands). HTH, Derek From aronne.merrelli at gmail.com Sat Oct 15 15:42:11 2011 From: aronne.merrelli at gmail.com (Aronne Merrelli) Date: Sat, 15 Oct 2011 14:42:11 -0500 Subject: [Numpy-discussion] Float128 integer comparison In-Reply-To: References: Message-ID: On Sat, Oct 15, 2011 at 1:12 PM, Matthew Brett wrote: > Hi, > > Continuing the exploration of float128 - can anyone explain this behavior? > > >>> np.float64(9223372036854775808.0) == 9223372036854775808L > True > >>> np.float128(9223372036854775808.0) == 9223372036854775808L > False > >>> int(np.float128(9223372036854775808.0)) == 9223372036854775808L > True > >>> np.round(np.float128(9223372036854775808.0)) == > np.float128(9223372036854775808.0) > True > > I know little about numpy internals, but while fiddling with this, I noticed a possible clue: >>> np.float128(9223372036854775808.0) == 9223372036854775808L False >>> np.float128(4611686018427387904.0) == 4611686018427387904L True >>> np.float128(9223372036854775808.0) - 9223372036854775808L Traceback (most recent call last): File "", line 1, in TypeError: unsupported operand type(s) for -: 'numpy.float128' and 'long' >>> np.float128(4611686018427387904.0) - 4611686018427387904L 0.0 My speculation - 9223372036854775808L is the first integer that is too big to fit into a signed 64 bit integer. Python is OK with this but that means it must be containing that value in some more complicated object. Since you don't get the type error between float64() and long: >>> np.float64(9223372036854775808.0) - 9223372036854775808L 0.0 Maybe there are some unimplemented pieces in numpy for dealing with operations between float128 and python "arbitrary longs"? I could see the == test just producing false in that case, because it defaults back to some object equality test which isn't actually looking at the numbers. Aronne -------------- next part -------------- An HTML attachment was scrubbed... URL: From derek at astro.physik.uni-goettingen.de Sat Oct 15 16:34:18 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Sat, 15 Oct 2011 22:34:18 +0200 Subject: [Numpy-discussion] Float128 integer comparison In-Reply-To: References: Message-ID: <35AAFB94-08D2-47C1-A066-7C8B2CFD2961@astro.physik.uni-goettingen.de> On 15.10.2011, at 9:42PM, Aronne Merrelli wrote: > > On Sat, Oct 15, 2011 at 1:12 PM, Matthew Brett wrote: > Hi, > > Continuing the exploration of float128 - can anyone explain this behavior? > > >>> np.float64(9223372036854775808.0) == 9223372036854775808L > True > >>> np.float128(9223372036854775808.0) == 9223372036854775808L > False > >>> int(np.float128(9223372036854775808.0)) == 9223372036854775808L > True > >>> np.round(np.float128(9223372036854775808.0)) == np.float128(9223372036854775808.0) > True > > > I know little about numpy internals, but while fiddling with this, I noticed a possible clue: > > >>> np.float128(9223372036854775808.0) == 9223372036854775808L > False > >>> np.float128(4611686018427387904.0) == 4611686018427387904L > True > >>> np.float128(9223372036854775808.0) - 9223372036854775808L > Traceback (most recent call last): > File "", line 1, in > TypeError: unsupported operand type(s) for -: 'numpy.float128' and 'long' > >>> np.float128(4611686018427387904.0) - 4611686018427387904L > 0.0 > > > My speculation - 9223372036854775808L is the first integer that is too big to fit into a signed 64 bit integer. Python is OK with this but that means it must be containing that value in some more complicated object. Since you don't get the type error between float64() and long: > > >>> np.float64(9223372036854775808.0) - 9223372036854775808L > 0.0 > > Maybe there are some unimplemented pieces in numpy for dealing with operations between float128 and python "arbitrary longs"? I could see the == test just producing false in that case, because it defaults back to some object equality test which isn't actually looking at the numbers. That seems to make sense, since even upcasting from a np.float64 still lets the test fail: >>> np.float128(np.float64(9223372036854775808.0)) == 9223372036854775808L False while >>> np.float128(9223372036854775808.0) == np.uint64(9223372036854775808L) True and >>> np.float128(9223372036854775809) == np.uint64(9223372036854775809L) False >>> np.float128(np.uint(9223372036854775809L) == np.uint64(9223372036854775809L) True Showing again that the normal casting to, or reading in of, a np.float128 internally inevitably calls the python float(), as already suggested in one of the parallel threads (I think this also came up with some of the tests for precision) - leading to different results than when you can convert from a np.int64 - this makes the outcome look even weirder: >>> np.float128(9223372036854775807.0) - np.float128(np.int64(9223372036854775807)) 1.0 >>> np.float128(9223372036854775296.0) - np.float128(np.int64(9223372036854775807)) 1.0 >>> np.float128(9223372036854775295.0) - np.float128(np.int64(9223372036854775807)) -1023.0 >>> np.float128(np.int64(9223372036854775296)) - np.float128(np.int64(9223372036854775807)) -511.0 simply due to the nearest np.float64 always being equal to MAX_INT64 in the two first cases above (or anything in between)... Cheers, Derek From matthew.brett at gmail.com Sat Oct 15 19:29:39 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 15 Oct 2011 16:29:39 -0700 Subject: [Numpy-discussion] float128 in fact float80 Message-ID: Hi, After getting rather confused, I concluded that float128 on a couple of Intel systems I have, is in fact an 80 bit extended precision number: http://en.wikipedia.org/wiki/Extended_precision >>> np.finfo(np.float128).nmant 63 >>> np.finfo(np.float128).nexp 15 That is rather confusing. What is the rationale for calling this float128? It is not IEEE 754 float128, and yet it seems to claim so. Best, Matthew From alanf333 at yahoo.com Sat Oct 15 20:59:42 2011 From: alanf333 at yahoo.com (Alan Frankel) Date: Sat, 15 Oct 2011 17:59:42 -0700 (PDT) Subject: [Numpy-discussion] NumPy example list is corrupted In-Reply-To: References: Message-ID: <1318726782.72878.YahooMailNeo@web161919.mail.bf1.yahoo.com> I've been editing the "Tentative NumPy? Tutorial" and occasionally referring to the "NumPy? Example List" ( http://www.scipy.org/Numpy_Example_List ). In the process, I think I mistakenly corrupted the NumPy Example List. Since the website does not offer any wiki-type functionality for reverting changes or referring to a history of changes, there doesn't seem to be a way for me to fix the problem. Nor is there a "Contact Us" link on the page. I tried the scipy channel on irc/freenode, but no one was there. I eventually filed a bug (#1963), then later decided that maybe this mailing list would be another and perhaps better way to report the problem. In addition to fixing the immediate problem, it would be nice if someone could give registered users the ability for reverting changes to the page as well. That should prevent problems like this in the future. Thanks, Alan -------------- next part -------------- An embedded message was scrubbed... From: Alan Frankel Subject: NumPy example list is corrupted Date: Sat, 15 Oct 2011 17:40:26 -0700 (PDT) Size: 3578 URL: From charlesr.harris at gmail.com Sat Oct 15 22:34:45 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 15 Oct 2011 20:34:45 -0600 Subject: [Numpy-discussion] float128 casting rounding as if it were float64 In-Reply-To: References: Message-ID: On Sat, Oct 15, 2011 at 12:54 PM, Matthew Brett wrote: > Hi, > > On Wed, Oct 12, 2011 at 11:24 AM, Charles R Harris > wrote: > > > > > > On Tue, Oct 11, 2011 at 12:17 PM, Matthew Brett > > > wrote: > >> > >> Hi, > >> > >> While struggling with floating point precision, I ran into this: > >> > >> In [52]: a = 2**54+3 > >> > >> In [53]: a > >> Out[53]: 18014398509481987L > >> > >> In [54]: np.float128(a) > >> Out[54]: 18014398509481988.0 > >> > >> In [55]: np.float128(a)-1 > >> Out[55]: 18014398509481987.0 > >> > >> The line above tells us that float128 can exactly represent 2**54+3, > >> but the line above that says that np.float128(2**54+3) rounds upwards > >> as if it were a float64: > >> > >> In [59]: np.float64(a) > >> Out[59]: 18014398509481988.0 > >> > >> In [60]: np.float64(a)-1 > >> Out[60]: 18014398509481988.0 > >> > >> Similarly: > >> > >> In [66]: np.float128('1e308') > >> Out[66]: 1.000000000000000011e+308 > >> > >> In [67]: np.float128('1e309') > >> Out[67]: inf > >> > >> Is it possible that float64 is being used somewhere in float128 casting? > >> > > > > The problem is probably in specifying the values. Python doesn't support > > long double and I expect python integers to be converted to doubles, then > > cast to long double. > > Presumably our (numpy) casting function receives the python integer, > and therefore its > us who are doing the conversion? If so, surely that is a bug? > > I believe the numpy casting function is supplied by Python. Python integers aren't C types. > > The only way to get around this is probably using > > string representations of the numbers, and I don't know how > > well/consistently numpy does that at the moment. If it calls python to do > > the job, then double is probably what is returned. It doesn't help on my > > system: > > > > In [1]: float128("18014398509481987.0") > > Out[1]: 18014398509481988.0 > > Note though that: > > >> In [66]: np.float128('1e308') > >> Out[66]: 1.000000000000000011e+308 > >> > >> In [67]: np.float128('1e309') > >> Out[67]: inf > > and I infer that we (numpy) are using float64 for converting the > strings; that seems to me to be the likely explanation of both > phenomena - do you agree? > > I expect we are using Python to convert the strings, unicode and all that. However, I would need to check. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From nadavh at visionsense.com Sun Oct 16 02:04:57 2011 From: nadavh at visionsense.com (Nadav Horesh) Date: Sat, 15 Oct 2011 23:04:57 -0700 Subject: [Numpy-discussion] float128 in fact float80 In-Reply-To: References: Message-ID: <26FC23E7C398A64083C980D16001012D261A2E4362@VA3DIAXVS361.RED001.local> On 32 bit systems it consumes 96 bits (3 x 32). and hence float96 On 64 bit machines it consumes 128 bits (2x64). The variable size is set for an efficient addressing, while the calculation in hardware is carried in the 80 bits FPU (x87) registers. Nadav ________________________________________ From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of Matthew Brett [matthew.brett at gmail.com] Sent: 16 October 2011 01:29 To: Discussion of Numerical Python Subject: [Numpy-discussion] float128 in fact float80 Hi, After getting rather confused, I concluded that float128 on a couple of Intel systems I have, is in fact an 80 bit extended precision number: http://en.wikipedia.org/wiki/Extended_precision >>> np.finfo(np.float128).nmant 63 >>> np.finfo(np.float128).nexp 15 That is rather confusing. What is the rationale for calling this float128? It is not IEEE 754 float128, and yet it seems to claim so. Best, Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Sun Oct 16 03:04:37 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 16 Oct 2011 00:04:37 -0700 Subject: [Numpy-discussion] float128 in fact float80 In-Reply-To: <26FC23E7C398A64083C980D16001012D261A2E4362@VA3DIAXVS361.RED001.local> References: <26FC23E7C398A64083C980D16001012D261A2E4362@VA3DIAXVS361.RED001.local> Message-ID: Hi, On Sat, Oct 15, 2011 at 11:04 PM, Nadav Horesh wrote: > On 32 bit systems it consumes 96 bits (3 x 32). and hence float96 > On 64 bit machines it consumes 128 bits (2x64). > The variable size is set for an efficient addressing, while the calculation in hardware is carried in the 80 bits FPU (x87) registers. Right - but the problem here is that it is very confusing. There is something called binary128 in the IEEE standard, and what numpy has is not that. float16, float32 and float64 are all IEEE standards called binary16, binary32 and binary64. Thus it was natural for me to assume wrongly that float128 was the IEEE standard. I'd therefore assume that it could store all the integers up to 2**113 exactly, and so on. On the other hand, if I found out that the float80 dtype in fact took 128 bits of storage, I'd rightly conclude that the data were being padded out with zeros and not be very surprised. I'd also I think find it easier to understand what was going on if there were float80 types on 32-bit and 64-bit, but they had different itemsizes. If there was one float80 type (with different itemsizes on 32, 64 bit) then I would not have to write guard try .. excepts around my use of the types to keep compatible across platforms. So float80 on both platforms seems like the less confusing option to me. Best, Matthew From cournape at gmail.com Sun Oct 16 03:28:08 2011 From: cournape at gmail.com (David Cournapeau) Date: Sun, 16 Oct 2011 08:28:08 +0100 Subject: [Numpy-discussion] float128 in fact float80 In-Reply-To: References: <26FC23E7C398A64083C980D16001012D261A2E4362@VA3DIAXVS361.RED001.local> Message-ID: On Sun, Oct 16, 2011 at 8:04 AM, Matthew Brett wrote: > Hi, > > On Sat, Oct 15, 2011 at 11:04 PM, Nadav Horesh wrote: >> On 32 bit systems it consumes 96 bits (3 x 32). and hence float96 >> On 64 bit machines it consumes 128 bits (2x64). >> The variable size is set for an efficient addressing, while the calculation in hardware is carried in the 80 bits FPU (x87) registers. > > Right - but the problem here is that it is very confusing. ?There is > something called binary128 in the IEEE standard, and what numpy has is > not that. ?float16, float32 and float64 are all IEEE standards called > binary16, binary32 and binary64. This one is easy: few CPU support the 128 bits float specified in IEEE standard (the only ones I know are the expensive IBM ones). Then there are the cases where it is implemented in software (SPARC use the double-pair IIRC). So you would need binar80, binary96, binary128, binary128_double_pair, etc... That would be a nightmare to support, and also not portable: what does binary80 become on ppc ? What does binary96 become on 32 bits Intel ? Or on windows (where long double is the same as double for visual studio) ? binary128 should only be thought as a (bad) synonym to np.longdouble. David From matthew.brett at gmail.com Sun Oct 16 03:33:22 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 16 Oct 2011 00:33:22 -0700 Subject: [Numpy-discussion] float128 in fact float80 In-Reply-To: References: <26FC23E7C398A64083C980D16001012D261A2E4362@VA3DIAXVS361.RED001.local> Message-ID: Hi, On Sun, Oct 16, 2011 at 12:28 AM, David Cournapeau wrote: > On Sun, Oct 16, 2011 at 8:04 AM, Matthew Brett wrote: >> Hi, >> >> On Sat, Oct 15, 2011 at 11:04 PM, Nadav Horesh wrote: >>> On 32 bit systems it consumes 96 bits (3 x 32). and hence float96 >>> On 64 bit machines it consumes 128 bits (2x64). >>> The variable size is set for an efficient addressing, while the calculation in hardware is carried in the 80 bits FPU (x87) registers. >> >> Right - but the problem here is that it is very confusing. ?There is >> something called binary128 in the IEEE standard, and what numpy has is >> not that. ?float16, float32 and float64 are all IEEE standards called >> binary16, binary32 and binary64. > > This one is easy: few CPU support the 128 bits float specified in IEEE > standard (the only ones I know are the expensive IBM ones). Then there > are the cases where it is implemented in software (SPARC use the > double-pair IIRC). > > So you would need binar80, binary96, binary128, binary128_double_pair, > etc... That would be a nightmare to support, and also not portable: > what does binary80 become on ppc ? What does binary96 become on 32 > bits Intel ? Or on windows (where long double is the same as double > for visual studio) ? > > binary128 should only be thought as a (bad) synonym to np.longdouble. What would be the nightmare to support - the different names for the different precisions? How many do we support in fact? Apart from float80? Is there some reason the support burden is less by naming lots of different precisions the same? See you, Matthew From cournape at gmail.com Sun Oct 16 04:18:21 2011 From: cournape at gmail.com (David Cournapeau) Date: Sun, 16 Oct 2011 09:18:21 +0100 Subject: [Numpy-discussion] float128 in fact float80 In-Reply-To: References: <26FC23E7C398A64083C980D16001012D261A2E4362@VA3DIAXVS361.RED001.local> Message-ID: On Sun, Oct 16, 2011 at 8:33 AM, Matthew Brett wrote: > Hi, > > On Sun, Oct 16, 2011 at 12:28 AM, David Cournapeau wrote: >> On Sun, Oct 16, 2011 at 8:04 AM, Matthew Brett wrote: >>> Hi, >>> >>> On Sat, Oct 15, 2011 at 11:04 PM, Nadav Horesh wrote: >>>> On 32 bit systems it consumes 96 bits (3 x 32). and hence float96 >>>> On 64 bit machines it consumes 128 bits (2x64). >>>> The variable size is set for an efficient addressing, while the calculation in hardware is carried in the 80 bits FPU (x87) registers. >>> >>> Right - but the problem here is that it is very confusing. ?There is >>> something called binary128 in the IEEE standard, and what numpy has is >>> not that. ?float16, float32 and float64 are all IEEE standards called >>> binary16, binary32 and binary64. >> >> This one is easy: few CPU support the 128 bits float specified in IEEE >> standard (the only ones I know are the expensive IBM ones). Then there >> are the cases where it is implemented in software (SPARC use the >> double-pair IIRC). >> >> So you would need binar80, binary96, binary128, binary128_double_pair, >> etc... That would be a nightmare to support, and also not portable: >> what does binary80 become on ppc ? What does binary96 become on 32 >> bits Intel ? Or on windows (where long double is the same as double >> for visual studio) ? >> >> binary128 should only be thought as a (bad) synonym to np.longdouble. > > What would be the nightmare to support - the different names for the > different precisions? Well, if you have an array of np.float80, what does it do on ppc, or windows, or solaris ? You will have a myriad of incompatible formats, and the only thing you gained by naming them differently is that you lose the ability of using the code on different platforms. The alternative is to implement in software a quadruple precision number. Using extended precision is fundamentally non-portable on today's CPU. David From robert.kern at gmail.com Sun Oct 16 04:59:30 2011 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 16 Oct 2011 09:59:30 +0100 Subject: [Numpy-discussion] NumPy example list is corrupted In-Reply-To: <1318726782.72878.YahooMailNeo@web161919.mail.bf1.yahoo.com> References: <1318726782.72878.YahooMailNeo@web161919.mail.bf1.yahoo.com> Message-ID: On Sun, Oct 16, 2011 at 01:59, Alan Frankel wrote: > I've been editing the "Tentative NumPy? Tutorial" and occasionally referring to the "NumPy? Example List" ( http://www.scipy.org/Numpy_Example_List ). In the process, I think I mistakenly corrupted the NumPy Example List. Since the website does not offer any wiki-type > > functionality for reverting changes or referring to a history of changes, there doesn't seem to be a way for me to fix the problem. http://www.scipy.org/Numpy_Example_List?action=info -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From alanf333 at yahoo.com Sun Oct 16 08:25:04 2011 From: alanf333 at yahoo.com (Alan Frankel) Date: Sun, 16 Oct 2011 05:25:04 -0700 (PDT) Subject: [Numpy-discussion] NumPy example list is corrupted In-Reply-To: References: <1318726782.72878.YahooMailNeo@web161919.mail.bf1.yahoo.com> Message-ID: <1318767904.77243.YahooMailNeo@web161907.mail.bf1.yahoo.com> Thanks a lot, Robert. That should enable me to do what I want to do. Unfortunately, I now get internal server errors whenever I try to commit a change. The message instructs me to send an e-mail to root at enthought.com, which I'll do separately. Couldn't someone add to the page an icon, or even a static link, that links to the "action=info" URL? Who's the right person to ask? Thanks again, Alan ----- Original Message ----- From: Robert Kern To: Discussion of Numerical Python Cc: Sent: Sunday, October 16, 2011 4:59 AM Subject: Re: [Numpy-discussion] NumPy example list is corrupted On Sun, Oct 16, 2011 at 01:59, Alan Frankel wrote: > I've been editing the "Tentative NumPy? Tutorial" and occasionally referring to the "NumPy? Example List" ( http://www.scipy.org/Numpy_Example_List ). In the process, I think I mistakenly corrupted the NumPy Example List. Since the website does not offer any wiki-type > > functionality for reverting changes or referring to a history of changes, there doesn't seem to be a way for me to fix the problem. http://www.scipy.org/Numpy_Example_List?action=info -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From alanf333 at yahoo.com Sun Oct 16 08:30:30 2011 From: alanf333 at yahoo.com (Alan Frankel) Date: Sun, 16 Oct 2011 05:30:30 -0700 (PDT) Subject: [Numpy-discussion] NumPy example list is corrupted In-Reply-To: <1318767904.77243.YahooMailNeo@web161907.mail.bf1.yahoo.com> References: <1318726782.72878.YahooMailNeo@web161919.mail.bf1.yahoo.com> <1318767904.77243.YahooMailNeo@web161907.mail.bf1.yahoo.com> Message-ID: <1318768230.94683.YahooMailNeo@web161902.mail.bf1.yahoo.com> Looks like my change went through after all, despite the internal server error message. Alan ----- Original Message ----- From: Alan Frankel To: Discussion of Numerical Python Cc: Sent: Sunday, October 16, 2011 8:25 AM Subject: Re: [Numpy-discussion] NumPy example list is corrupted Thanks a lot, Robert. That should enable me to do what I want to do. Unfortunately, I now get internal server errors whenever I try to commit a change. The message instructs me to send an e-mail to root at enthought.com, which I'll do separately. Couldn't someone add to the page an icon, or even a static link, that links to the "action=info" URL? Who's the right person to ask? Thanks again, Alan ----- Original Message ----- From: Robert Kern To: Discussion of Numerical Python Cc: Sent: Sunday, October 16, 2011 4:59 AM Subject: Re: [Numpy-discussion] NumPy example list is corrupted On Sun, Oct 16, 2011 at 01:59, Alan Frankel wrote: > I've been editing the "Tentative NumPy? Tutorial" and occasionally referring to the "NumPy? Example List" ( http://www.scipy.org/Numpy_Example_List ). In the process, I think I mistakenly corrupted the NumPy Example List. Since the website does not offer any wiki-type > > functionality for reverting changes or referring to a history of changes, there doesn't seem to be a way for me to fix the problem. http://www.scipy.org/Numpy_Example_List?action=info -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From pav at iki.fi Sun Oct 16 09:52:13 2011 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 16 Oct 2011 15:52:13 +0200 Subject: [Numpy-discussion] NumPy example list is corrupted In-Reply-To: <1318767904.77243.YahooMailNeo@web161907.mail.bf1.yahoo.com> References: <1318726782.72878.YahooMailNeo@web161919.mail.bf1.yahoo.com> <1318767904.77243.YahooMailNeo@web161907.mail.bf1.yahoo.com> Message-ID: (16.10.2011 14:25), Alan Frankel wrote: [clip] > Couldn't someone add to the page an icon, > or even a static link, that links to the "action=info" URL? > Who's the right person to ask? It's already there --- it's as part of standard Moinmoin wiki configuration --- the one with the letter 'i' in it on the upper right corner of the page. -- Pauli Virtanen From tsyu80 at gmail.com Sun Oct 16 12:39:13 2011 From: tsyu80 at gmail.com (Tony Yu) Date: Sun, 16 Oct 2011 12:39:13 -0400 Subject: [Numpy-discussion] Type checking inconsistency Message-ID: Hi, I noticed a type-checking inconsistency between assignments using slicing and fancy-indexing. The first will happily cast on assignment (regardless of type), while the second will throw a type error if there's reason to believe the casting will be unsafe. I'm not sure which would be the "correct" behavior, but the inconsistency is surprising. Best, -Tony Example: >>> import numpy as np >>> a = np.arange(10) >>> b = np.ones(10, dtype=np.uint8) # this runs without error >>> b[:5] = a[:5] >>> mask = a < 5 >>> b[mask] = b[mask] TypeError: array cannot be safely cast to required type -------------- next part -------------- An HTML attachment was scrubbed... URL: From tsyu80 at gmail.com Sun Oct 16 12:48:58 2011 From: tsyu80 at gmail.com (Tony Yu) Date: Sun, 16 Oct 2011 12:48:58 -0400 Subject: [Numpy-discussion] Type checking inconsistency In-Reply-To: References: Message-ID: On Sun, Oct 16, 2011 at 12:39 PM, Tony Yu wrote: > Hi, > > I noticed a type-checking inconsistency between assignments using slicing > and fancy-indexing. The first will happily cast on assignment (regardless of > type), while the second will throw a type error if there's reason to believe > the casting will be unsafe. I'm not sure which would be the "correct" > behavior, but the inconsistency is surprising. > > Best, > -Tony > > Example: > > >>> import numpy as np > >>> a = np.arange(10) > >>> b = np.ones(10, dtype=np.uint8) > > # this runs without error > >>> b[:5] = a[:5] > > >>> mask = a < 5 > >>> b[mask] = b[mask] > TypeError: array cannot be safely cast to required type > > And I just noticed that 1D arrays behave differently than 2D arrays. If you replace the above definitions of a, b with: >>> a = np.arange(10)[:, np.newaxis] >>> b = np.ones((10, 1), dtype=np.uint8) The rest of the code will run without error. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sun Oct 16 12:49:16 2011 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 16 Oct 2011 18:49:16 +0200 Subject: [Numpy-discussion] Type checking inconsistency In-Reply-To: References: Message-ID: (16.10.2011 18:39), Tony Yu wrote: > >>> import numpy as np > >>> a = np.arange(10) > >>> b = np.ones(10, dtype=np.uint8) > > # this runs without error > >>> b[:5] = a[:5] > > >>> mask = a < 5 > >>> b[mask] = b[mask] > TypeError: array cannot be safely cast to required type Seems to be fixed in Git master >>> import numpy as np >>> a = np.arange(10) >>> b = np.ones(10, dtype=np.uint8) >>> mask = a < 5 >>> b[mask] = b[mask] >>> b[mask] = a[mask] >>> np.__version__ '2.0.0.dev-1dc1877' -- Pauli Virtanen From tsyu80 at gmail.com Sun Oct 16 13:02:53 2011 From: tsyu80 at gmail.com (Tony Yu) Date: Sun, 16 Oct 2011 13:02:53 -0400 Subject: [Numpy-discussion] Type checking inconsistency In-Reply-To: References: Message-ID: On Sun, Oct 16, 2011 at 12:49 PM, Pauli Virtanen wrote: > (16.10.2011 18:39), Tony Yu wrote: > > >>> import numpy as np > > >>> a = np.arange(10) > > >>> b = np.ones(10, dtype=np.uint8) > > > > # this runs without error > > >>> b[:5] = a[:5] > > > > >>> mask = a < 5 > > >>> b[mask] = b[mask] > > TypeError: array cannot be safely cast to required type > > Seems to be fixed in Git master > > >>> import numpy as np > >>> a = np.arange(10) > >>> b = np.ones(10, dtype=np.uint8) > >>> mask = a < 5 > >>> b[mask] = b[mask] > >>> b[mask] = a[mask] > >>> np.__version__ > '2.0.0.dev-1dc1877' > (I see you noticed the typo in my original example: b --> a). Agreed, I'm getting this error with an old master. I just tried master and it worked fine, but the maintenance branch ('1.6.2.dev-396dbb9') does still have this issue. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alanf333 at yahoo.com Sun Oct 16 14:29:21 2011 From: alanf333 at yahoo.com (Alan Frankel) Date: Sun, 16 Oct 2011 11:29:21 -0700 (PDT) Subject: [Numpy-discussion] NumPy example list is corrupted Message-ID: <1318789761.86670.YahooMailNeo@web161910.mail.bf1.yahoo.com> I see it now. Thanks! Alan >>Couldn't someone add to the page an icon, >> or even a static link, that links to the "action=info" URL? >> Who's the right person to ask? >It's already there --- it's as part of standard Moinmoin wiki >configuration --- the one with the letter 'i' in it on the upper right >corner of the page. >-- >Pauli Virtanen From matthew.brett at gmail.com Sun Oct 16 14:40:57 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 16 Oct 2011 11:40:57 -0700 Subject: [Numpy-discussion] float128 in fact float80 In-Reply-To: References: <26FC23E7C398A64083C980D16001012D261A2E4362@VA3DIAXVS361.RED001.local> Message-ID: Hi, On Sun, Oct 16, 2011 at 1:18 AM, David Cournapeau wrote: > On Sun, Oct 16, 2011 at 8:33 AM, Matthew Brett wrote: >> Hi, >> >> On Sun, Oct 16, 2011 at 12:28 AM, David Cournapeau wrote: >>> On Sun, Oct 16, 2011 at 8:04 AM, Matthew Brett wrote: >>>> Hi, >>>> >>>> On Sat, Oct 15, 2011 at 11:04 PM, Nadav Horesh wrote: >>>>> On 32 bit systems it consumes 96 bits (3 x 32). and hence float96 >>>>> On 64 bit machines it consumes 128 bits (2x64). >>>>> The variable size is set for an efficient addressing, while the calculation in hardware is carried in the 80 bits FPU (x87) registers. >>>> >>>> Right - but the problem here is that it is very confusing. ?There is >>>> something called binary128 in the IEEE standard, and what numpy has is >>>> not that. ?float16, float32 and float64 are all IEEE standards called >>>> binary16, binary32 and binary64. >>> >>> This one is easy: few CPU support the 128 bits float specified in IEEE >>> standard (the only ones I know are the expensive IBM ones). Then there >>> are the cases where it is implemented in software (SPARC use the >>> double-pair IIRC). >>> >>> So you would need binar80, binary96, binary128, binary128_double_pair, >>> etc... That would be a nightmare to support, and also not portable: >>> what does binary80 become on ppc ? What does binary96 become on 32 >>> bits Intel ? Or on windows (where long double is the same as double >>> for visual studio) ? >>> >>> binary128 should only be thought as a (bad) synonym to np.longdouble. >> >> What would be the nightmare to support - the different names for the >> different precisions? > > Well, if you have an array of np.float80, what does it do on ppc, or > windows, or solaris ? You will have a myriad of incompatible formats, > and the only thing you gained by naming them differently is that you > lose the ability of using the code on different platforms. The > alternative is to implement in software a quadruple precision number. The thing you gain by naming them correctly is the person using the format knows what it is. If we use float64 we know what that is. If we are using float128, we've got no idea what it is. I had actually guessed that numpy had some software emulation for IEEE float128. I don't know how I would have known otherwise. Obviously what I'm proposing is that the names follow the precisions of the numbers, not the itemsize. If what we actually have is something that is sometimes called float128, sometimes float96, that is always what C thinks of as long double, then surely the best option would be: float80 floatLD for intel 32 and 64 bit, and then floatPPC floatLD for whatever PPC has, and so on. See you, Matthew From matthew.brett at gmail.com Sun Oct 16 14:41:48 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 16 Oct 2011 11:41:48 -0700 Subject: [Numpy-discussion] float128 in fact float80 In-Reply-To: References: <26FC23E7C398A64083C980D16001012D261A2E4362@VA3DIAXVS361.RED001.local> Message-ID: Hi, On Sun, Oct 16, 2011 at 11:40 AM, Matthew Brett wrote: > Hi, > > On Sun, Oct 16, 2011 at 1:18 AM, David Cournapeau wrote: >> On Sun, Oct 16, 2011 at 8:33 AM, Matthew Brett wrote: >>> Hi, >>> >>> On Sun, Oct 16, 2011 at 12:28 AM, David Cournapeau wrote: >>>> On Sun, Oct 16, 2011 at 8:04 AM, Matthew Brett wrote: >>>>> Hi, >>>>> >>>>> On Sat, Oct 15, 2011 at 11:04 PM, Nadav Horesh wrote: >>>>>> On 32 bit systems it consumes 96 bits (3 x 32). and hence float96 >>>>>> On 64 bit machines it consumes 128 bits (2x64). >>>>>> The variable size is set for an efficient addressing, while the calculation in hardware is carried in the 80 bits FPU (x87) registers. >>>>> >>>>> Right - but the problem here is that it is very confusing. ?There is >>>>> something called binary128 in the IEEE standard, and what numpy has is >>>>> not that. ?float16, float32 and float64 are all IEEE standards called >>>>> binary16, binary32 and binary64. >>>> >>>> This one is easy: few CPU support the 128 bits float specified in IEEE >>>> standard (the only ones I know are the expensive IBM ones). Then there >>>> are the cases where it is implemented in software (SPARC use the >>>> double-pair IIRC). >>>> >>>> So you would need binar80, binary96, binary128, binary128_double_pair, >>>> etc... That would be a nightmare to support, and also not portable: >>>> what does binary80 become on ppc ? What does binary96 become on 32 >>>> bits Intel ? Or on windows (where long double is the same as double >>>> for visual studio) ? >>>> >>>> binary128 should only be thought as a (bad) synonym to np.longdouble. >>> >>> What would be the nightmare to support - the different names for the >>> different precisions? >> >> Well, if you have an array of np.float80, what does it do on ppc, or >> windows, or solaris ? You will have a myriad of incompatible formats, >> and the only thing you gained by naming them differently is that you >> lose the ability of using the code on different platforms. The >> alternative is to implement in software a quadruple precision number. > > The thing you gain by naming them correctly is the person using the > format knows what it is. > > If we use float64 we know what that is. ?If we are using float128, > we've got no idea what it is. > > I had actually guessed that numpy had some software emulation for IEEE > float128. ? I don't know how I would have known otherwise. > > Obviously what I'm proposing is that the names follow the precisions > of the numbers, not the itemsize. > > If what we actually have is something that is sometimes called > float128, sometimes float96, that is always what C thinks of as long > double, then surely the best option ?would be: > > float80 > floatLD > > for intel 32 and 64 bit, and then > > floatPPC > floatLD Sorry - I missed out: Where floatLD is float80, floatPPC is floatLD Matthew From cournape at gmail.com Sun Oct 16 17:11:44 2011 From: cournape at gmail.com (David Cournapeau) Date: Sun, 16 Oct 2011 22:11:44 +0100 Subject: [Numpy-discussion] float128 in fact float80 In-Reply-To: References: <26FC23E7C398A64083C980D16001012D261A2E4362@VA3DIAXVS361.RED001.local> Message-ID: On Sun, Oct 16, 2011 at 7:40 PM, Matthew Brett wrote: > Hi, > If we use float64 we know what that is. ?If we are using float128, > we've got no idea what it is. I think there is no arguing here: the ideal solution would be to follow what happens with 32 and 64 bits reprensentations. But this is impossible on today's architectures because the 2008 version of the IEEE 754 standard is not supported yet. > > I had actually guessed that numpy had some software emulation for IEEE > float128. ? I don't know how I would have known otherwise. > > Obviously what I'm proposing is that the names follow the precisions > of the numbers, not the itemsize. > > If what we actually have is something that is sometimes called > float128, sometimes float96, that is always what C thinks of as long > double, then surely the best option ?would be: > > float80 > floatLD > > for intel 32 and 64 bit, and then > > floatPPC > floatLD > > for whatever PPC has, and so on. If all you want is a common name, there is already one: np.longdouble. This is an alias for the more platform-specific name. cheers, David From matthew.brett at gmail.com Sun Oct 16 18:04:51 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 16 Oct 2011 15:04:51 -0700 Subject: [Numpy-discussion] float128 in fact float80 In-Reply-To: References: <26FC23E7C398A64083C980D16001012D261A2E4362@VA3DIAXVS361.RED001.local> Message-ID: Hi, On Sun, Oct 16, 2011 at 2:11 PM, David Cournapeau wrote: > On Sun, Oct 16, 2011 at 7:40 PM, Matthew Brett wrote: >> Hi, > >> If we use float64 we know what that is. ?If we are using float128, >> we've got no idea what it is. > > I think there is no arguing here: the ideal solution would be to > follow what happens with 32 and 64 bits reprensentations. But this is > impossible on today's architectures because the 2008 version of the > IEEE 754 standard is not supported yet. If we agree that float128 is a bad name for something that isn't IEEE binary128, and there is already a longdouble type (thanks for pointing that out), then what about: Deprecating float128 / float96 as names Preferring longdouble for cross-platform == fairly big float of some sort Specific names according to format (float80 etc) ? See you, Matthew From njs at pobox.com Sun Oct 16 18:16:13 2011 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 16 Oct 2011 15:16:13 -0700 Subject: [Numpy-discussion] float128 in fact float80 In-Reply-To: References: <26FC23E7C398A64083C980D16001012D261A2E4362@VA3DIAXVS361.RED001.local> Message-ID: On Sun, Oct 16, 2011 at 3:04 PM, Matthew Brett wrote: > If we agree that float128 is a bad name for something that isn't IEEE > binary128, and there is already a longdouble type (thanks for pointing > that out), then what about: > > Deprecating float128 / float96 as names > Preferring longdouble for cross-platform == fairly big float of some sort +1 I understand the argument that you don't want to call it "float80" because not all machines support a float80 type. But I don't understand why we would solve that problem by making up two *more* names (float96, float128) that describe types that *no* machines actually support... this is incredibly confusing. I guess the question is, how do we deprecate a top-level name inside the np namespace? > Specific names according to format (float80 etc) ? This part doesn't even seem necessary right now -- we could always add it later if machines start supporting multiple >64-bit float types at once, and in the mean time it doesn't add much. You can always use finfo if you're curious what longdouble means locally? -- Nathaniel From charlesr.harris at gmail.com Sun Oct 16 19:29:24 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 16 Oct 2011 17:29:24 -0600 Subject: [Numpy-discussion] float128 in fact float80 In-Reply-To: References: <26FC23E7C398A64083C980D16001012D261A2E4362@VA3DIAXVS361.RED001.local> Message-ID: On Sun, Oct 16, 2011 at 4:16 PM, Nathaniel Smith wrote: > On Sun, Oct 16, 2011 at 3:04 PM, Matthew Brett > wrote: > > If we agree that float128 is a bad name for something that isn't IEEE > > binary128, and there is already a longdouble type (thanks for pointing > > that out), then what about: > > > > Deprecating float128 / float96 as names > > Preferring longdouble for cross-platform == fairly big float of some sort > > +1 > > I understand the argument that you don't want to call it "float80" > because not all machines support a float80 type. But I don't > understand why we would solve that problem by making up two *more* > names (float96, float128) that describe types that *no* machines > actually support... this is incredibly confusing. > Well, float128 and float96 aren't interchangeable across architectures because of the different alignments, C long double isn't portable either, and float80 doesn't seem to be available anywhere. What concerns me is the difference between extended and quad precision, both of which can occupy 128 bits. I've complained about that for several years now, but as to extended precision, just don't use it. It will never be portable. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Oct 16 20:13:48 2011 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 16 Oct 2011 17:13:48 -0700 Subject: [Numpy-discussion] float128 in fact float80 In-Reply-To: References: <26FC23E7C398A64083C980D16001012D261A2E4362@VA3DIAXVS361.RED001.local> Message-ID: On Sun, Oct 16, 2011 at 4:29 PM, Charles R Harris wrote: > On Sun, Oct 16, 2011 at 4:16 PM, Nathaniel Smith wrote: >> I understand the argument that you don't want to call it "float80" >> because not all machines support a float80 type. But I don't >> understand why we would solve that problem by making up two *more* >> names (float96, float128) that describe types that *no* machines >> actually support... this is incredibly confusing. > > Well, float128 and float96 aren't interchangeable across architectures > because of the different alignments, C long double isn't portable either, > and float80 doesn't seem to be available anywhere. What concerns me is the > difference between extended and quad precision, both of which can occupy 128 > bits. I've complained about that for several years now, but as to extended > precision, just don't use it. It will never be portable. I think part of the confusion here is about when a type is named like 'float', does 'N' refer to the size of the data or to the minimum alignment? I have a strong intuition that it should be the former, and I assume Matthew does too. If we have a data structure like struct { uint8_t flags; void * data; } then 'flags' will actually get 32 or 64 bits of space... but we would never, ever refer to it as a uint32 or a uint64! I know these extended precision types are even weirder because the compiler will insert that padding unconditionally, but the intuition still stands, and obviously some proportion of the userbase will share it. If our API makes smart people like Matthew spend a week going around in circles, then our API is dangerously broken! The solution is just to call it 'longdouble', which clearly communicates 'this does some quirky thing that depends on your C compiler and architecture'. -- Nathaniel From charlesr.harris at gmail.com Sun Oct 16 21:22:29 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 16 Oct 2011 19:22:29 -0600 Subject: [Numpy-discussion] float128 in fact float80 In-Reply-To: References: <26FC23E7C398A64083C980D16001012D261A2E4362@VA3DIAXVS361.RED001.local> Message-ID: On Sun, Oct 16, 2011 at 6:13 PM, Nathaniel Smith wrote: > On Sun, Oct 16, 2011 at 4:29 PM, Charles R Harris > wrote: > > On Sun, Oct 16, 2011 at 4:16 PM, Nathaniel Smith wrote: > >> I understand the argument that you don't want to call it "float80" > >> because not all machines support a float80 type. But I don't > >> understand why we would solve that problem by making up two *more* > >> names (float96, float128) that describe types that *no* machines > >> actually support... this is incredibly confusing. > > > > Well, float128 and float96 aren't interchangeable across architectures > > because of the different alignments, C long double isn't portable either, > > and float80 doesn't seem to be available anywhere. What concerns me is > the > > difference between extended and quad precision, both of which can occupy > 128 > > bits. I've complained about that for several years now, but as to > extended > > precision, just don't use it. It will never be portable. > > I think part of the confusion here is about when a type is named like > 'float', does 'N' refer to the size of the data or to the minimum > alignment? I have a strong intuition that it should be the former, and > I assume Matthew does too. If we have a data structure like > struct { uint8_t flags; void * data; } > We need both in theory, in practice floats and doubles are pretty well defined these days, but long doubles depend on architecture and compiler for alignment, and even for representation in the case of PPC. I don't regard these facts as obscure if one is familiar with floating point, but most folks aren't and I agree that it can be misleading if one assumes that types and storage space are strongly coupled. This also ties in to the problem with ints and longs, which may both be int32 despite having different C names. > then 'flags' will actually get 32 or 64 bits of space... but we would > never, ever refer to it as a uint32 or a uint64! I know these extended > precision types are even weirder because the compiler will insert that > padding unconditionally, but the intuition still stands, and obviously > some proportion of the userbase will share it. > > If our API makes smart people like Matthew spend a week going around > in circles, then our API is dangerously broken! > > I think "dangerously" is a bit overly dramatic. > The solution is just to call it 'longdouble', which clearly > communicates 'this does some quirky thing that depends on your C > compiler and architecture'. > > Well, I don't know. If someone is unfamiliar with floats I would expect they would post a complaint about bugs if a file of longdouble type written on a 32 bit system couldn't be read on a 64 bit system. It might be better to somehow combine both the ieee type and the storage alignment. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Oct 16 22:02:04 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 16 Oct 2011 20:02:04 -0600 Subject: [Numpy-discussion] Type checking inconsistency In-Reply-To: References: Message-ID: On Sun, Oct 16, 2011 at 11:02 AM, Tony Yu wrote: > On Sun, Oct 16, 2011 at 12:49 PM, Pauli Virtanen wrote: > >> (16.10.2011 18:39), Tony Yu wrote: >> > >>> import numpy as np >> > >>> a = np.arange(10) >> > >>> b = np.ones(10, dtype=np.uint8) >> > >> > # this runs without error >> > >>> b[:5] = a[:5] >> > >> > >>> mask = a < 5 >> > >>> b[mask] = b[mask] >> > TypeError: array cannot be safely cast to required type >> >> Seems to be fixed in Git master >> >> >>> import numpy as np >> >>> a = np.arange(10) >> >>> b = np.ones(10, dtype=np.uint8) >> >>> mask = a < 5 >> >>> b[mask] = b[mask] >> >>> b[mask] = a[mask] >> >>> np.__version__ >> '2.0.0.dev-1dc1877' >> > > (I see you noticed the typo in my original example: b --> a). Agreed, I'm > getting this error with an old master. I just tried master and it worked > fine, but the maintenance branch ('1.6.2.dev-396dbb9') does still have this > issue. > 1.6.2 hasn't been kept up to date. I suspect 1.7.0 will be the next release. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Mon Oct 17 04:20:06 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 17 Oct 2011 10:20:06 +0200 Subject: [Numpy-discussion] float128 in fact float80 In-Reply-To: References: <26FC23E7C398A64083C980D16001012D261A2E4362@VA3DIAXVS361.RED001.local> Message-ID: <4E9BE536.70901@astro.uio.no> On 10/17/2011 03:22 AM, Charles R Harris wrote: > > > On Sun, Oct 16, 2011 at 6:13 PM, Nathaniel Smith > wrote: > > The solution is just to call it 'longdouble', which clearly > communicates 'this does some quirky thing that depends on your C > compiler and architecture'. > > > Well, I don't know. If someone is unfamiliar with floats I would expect > they would post a complaint about bugs if a file of longdouble type > written on a 32 bit system couldn't be read on a 64 bit system. It might > be better to somehow combine both the ieee type and the storage alignment. np.float80_96 np.float80_128 ? Dag Sverre From zdoor at xs4all.nl Thu Oct 13 06:59:02 2011 From: zdoor at xs4all.nl (Alex van der Spek) Date: Thu, 13 Oct 2011 10:59:02 +0000 (UTC) Subject: [Numpy-discussion] dtyping with .astype() Message-ID: Beginner's question? I have this dictionary dtypes of names and types: >>>dtypes {'names': ['col1', 'col2', 'col3', 'col4', 'col5'], 'formats': [, , , , ]} and this array y >>> y array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24], [25, 26, 27, 28, 29], [30, 31, 32, 33, 34], [35, 36, 37, 38, 39], [40, 41, 42, 43, 44], [45, 46, 47, 48, 49]]) But: >>>>z=y.astype(dtypes) gives me a confusing result. I only asked to name the columns and change their types to half precision floats. What am I missing? How to do this? Thank you in advance, Alex van der Spek From pav at iki.fi Mon Oct 17 06:17:43 2011 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 17 Oct 2011 12:17:43 +0200 Subject: [Numpy-discussion] dtyping with .astype() In-Reply-To: References: Message-ID: 13.10.2011 12:59, Alex van der Spek kirjoitti: > gives me a confusing result. I only asked to name the columns and change their > types to half precision floats. Structured arrays shouldn't be thought as an array with named columns, as they are somewhat different. > What am I missing? How to do this? np.rec.fromarrays(arr.T, dtype=dt) From charlesr.harris at gmail.com Mon Oct 17 08:59:52 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 17 Oct 2011 06:59:52 -0600 Subject: [Numpy-discussion] float128 in fact float80 In-Reply-To: <4E9BE536.70901@astro.uio.no> References: <26FC23E7C398A64083C980D16001012D261A2E4362@VA3DIAXVS361.RED001.local> <4E9BE536.70901@astro.uio.no> Message-ID: On Mon, Oct 17, 2011 at 2:20 AM, Dag Sverre Seljebotn < d.s.seljebotn at astro.uio.no> wrote: > On 10/17/2011 03:22 AM, Charles R Harris wrote: > > > > > > On Sun, Oct 16, 2011 at 6:13 PM, Nathaniel Smith > > wrote: > > > > The solution is just to call it 'longdouble', which clearly > > communicates 'this does some quirky thing that depends on your C > > compiler and architecture'. > > > > > > Well, I don't know. If someone is unfamiliar with floats I would expect > > they would post a complaint about bugs if a file of longdouble type > > written on a 32 bit system couldn't be read on a 64 bit system. It might > > be better to somehow combine both the ieee type and the storage > alignment. > > np.float80_96 > np.float80_128 > Heh, my thoughts too. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Oct 17 09:48:23 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 17 Oct 2011 09:48:23 -0400 Subject: [Numpy-discussion] dtyping with .astype() In-Reply-To: References: Message-ID: On Mon, Oct 17, 2011 at 6:17 AM, Pauli Virtanen wrote: > 13.10.2011?12:59, Alex van der Spek kirjoitti: >> gives me a confusing result. I only asked to name the columns and change their >> types to half precision floats. > > Structured arrays shouldn't be thought as an array with named columns, > as they are somewhat different. > >> What am I missing? How to do this? > > np.rec.fromarrays(arr.T, dtype=dt) y.astype(float16).view(dt) ? Josef > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From pav at iki.fi Mon Oct 17 10:18:48 2011 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 17 Oct 2011 16:18:48 +0200 Subject: [Numpy-discussion] dtyping with .astype() In-Reply-To: References: Message-ID: 17.10.2011 15:48, josef.pktd at gmail.com kirjoitti: > On Mon, Oct 17, 2011 at 6:17 AM, Pauli Virtanen wrote: [clip] >>> What am I missing? How to do this? >> >> np.rec.fromarrays(arr.T, dtype=dt) > > y.astype(float16).view(dt) I think this will give surprises if the original array is not in C-order. Pauli From josef.pktd at gmail.com Mon Oct 17 13:04:44 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 17 Oct 2011 13:04:44 -0400 Subject: [Numpy-discussion] dtyping with .astype() In-Reply-To: References: Message-ID: On Mon, Oct 17, 2011 at 10:18 AM, Pauli Virtanen wrote: > 17.10.2011 15:48, josef.pktd at gmail.com kirjoitti: >> On Mon, Oct 17, 2011 at 6:17 AM, Pauli Virtanen wrote: > [clip] >>>> What am I missing? How to do this? >>> >>> np.rec.fromarrays(arr.T, dtype=dt) >> >> y.astype(float16).view(dt) > > I think this will give surprises if the original array is not in C-order. I forgot about those, dangerous if the array is square, otherwise it raises an error if it is in F-order maybe np.asarray(y, np.float16, order='C').view(dt) if I don't like record arrays Josef > > ? ? ? ?Pauli > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From Chris.Barker at noaa.gov Mon Oct 17 13:13:32 2011 From: Chris.Barker at noaa.gov (Chris.Barker) Date: Mon, 17 Oct 2011 10:13:32 -0700 Subject: [Numpy-discussion] yet another indexing question In-Reply-To: References: Message-ID: <4E9C623C.2050407@noaa.gov> On 10/14/11 5:04 AM, Neal Becker wrote: > suppose I have: > > In [10]: u > Out[10]: > array([[0, 1, 2, 3, 4], > [5, 6, 7, 8, 9]]) > > And I have a vector v: > v = np.array ((0,1,0,1,0)) > > I want to form an output vector which selects items from u where v is the index > of the row of u to be selected. > Now, more importantly, I need the result to be a reference to the original array > (not a copy), because I'm going to use it on the LHS of an assignment. Is this > possible? No, it's not. numpy arrays need to be describable with regular strides -- when selecting arbitrary elements from an array, there is no way to describe the resulting array as regular strides into the same data block as the original. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From tjhnson at gmail.com Mon Oct 17 14:19:10 2011 From: tjhnson at gmail.com (T J) Date: Mon, 17 Oct 2011 11:19:10 -0700 Subject: [Numpy-discussion] Example Usage of Neighborhood Iterator in Cython Message-ID: I recently put together a Cython example which uses the neighborhood iterator. It was trickier than I thought it would be, so I thought to share it with the community. The function takes a 1-dimensional array and returns a 2-dimensional array of neighborhoods in the original area. This is somewhat similar to the functionality provided by segment_axis (http://projects.scipy.org/numpy/ticket/901), but I believe this slightly different in that neighborhood can extend to the left of the current index (assuming circular boundaries). Keep in mind that this is just an example, and normal usage probably is not concerned with creating a new array. External link: http://codepad.org/czRIzXQl -------------- import numpy as np cimport numpy as np cdef extern from "numpy/arrayobject.h": ctypedef extern class numpy.flatiter [object PyArrayIterObject]: cdef int nd_m1 cdef np.npy_intp index, size cdef np.ndarray ao cdef char *dataptr # This isn't exposed to the Python API. # So we can't use the same approach we used to define flatiter ctypedef struct PyArrayNeighborhoodIterObject: int nd_m1 np.npy_intp index, size np.PyArrayObject *ao # note the change from np.ndarray char *dataptr object PyArray_NeighborhoodIterNew(flatiter it, np.npy_intp* bounds, int mode, np.ndarray fill_value) int PyArrayNeighborhoodIter_Next(PyArrayNeighborhoodIterObject *it) int PyArrayNeighborhoodIter_Reset(PyArrayNeighborhoodIterObject *it) object PyArray_IterNew(object arr) void PyArray_ITER_NEXT(flatiter it) np.npy_intp PyArray_SIZE(np.ndarray arr) cdef enum: NPY_NEIGHBORHOOD_ITER_ZERO_PADDING, NPY_NEIGHBORHOOD_ITER_ONE_PADDING, NPY_NEIGHBORHOOD_ITER_CONSTANT_PADDING, NPY_NEIGHBORHOOD_ITER_CIRCULAR_PADDING, NPY_NEIGHBORHOOD_ITER_MIRROR_PADDING np.import_array() def windowed(np.ndarray[np.int_t, ndim=1] arr, bounds): cdef flatiter iterx = PyArray_IterNew(arr) cdef np.npy_intp size = PyArray_SIZE(arr) cdef np.npy_intp* boundsPtr = [bounds[0], bounds[1]] cdef int hoodSize = bounds[1] - bounds[0] + 1 # Create the Python object and keep a reference to it cdef object niterx_ = PyArray_NeighborhoodIterNew(iterx, boundsPtr, NPY_NEIGHBORHOOD_ITER_CIRCULAR_PADDING, None) cdef PyArrayNeighborhoodIterObject *niterx = \ niterx_ cdef int i,j cdef np.ndarray[np.int_t, ndim=2] hoods hoods = np.empty((arr.shape[0], hoodSize), dtype=np.int) for i in range(iterx.size): for j in range(niterx.size): hoods[i,j] = (niterx.dataptr)[0] PyArrayNeighborhoodIter_Next(niterx) PyArray_ITER_NEXT(iterx) PyArrayNeighborhoodIter_Reset(niterx) return hoods def test(): x = np.arange(10) print x print print windowed(x, [-1, 3]) print print windowed(x, [-2, 2]) ---------- If you run test(), this is what you should see: [0 1 2 3 4 5 6 7 8 9] [[9 0 1 2 3] [0 1 2 3 4] [1 2 3 4 5] [2 3 4 5 6] [3 4 5 6 7] [4 5 6 7 8] [5 6 7 8 9] [6 7 8 9 0] [7 8 9 0 1] [8 9 0 1 2]] [[8 9 0 1 2] [9 0 1 2 3] [0 1 2 3 4] [1 2 3 4 5] [2 3 4 5 6] [3 4 5 6 7] [4 5 6 7 8] [5 6 7 8 9] [6 7 8 9 0] [7 8 9 0 1]] windowed(x, [0, 2]) is almost like segment_axis(x, 3, 2, end='wrap'). From matthew.brett at gmail.com Mon Oct 17 14:39:50 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 17 Oct 2011 11:39:50 -0700 Subject: [Numpy-discussion] float128 in fact float80 In-Reply-To: References: <26FC23E7C398A64083C980D16001012D261A2E4362@VA3DIAXVS361.RED001.local> Message-ID: Hi, On Sun, Oct 16, 2011 at 6:22 PM, Charles R Harris wrote: > > > On Sun, Oct 16, 2011 at 6:13 PM, Nathaniel Smith wrote: >> >> On Sun, Oct 16, 2011 at 4:29 PM, Charles R Harris >> wrote: >> > On Sun, Oct 16, 2011 at 4:16 PM, Nathaniel Smith wrote: >> >> I understand the argument that you don't want to call it "float80" >> >> because not all machines support a float80 type. But I don't >> >> understand why we would solve that problem by making up two *more* >> >> names (float96, float128) that describe types that *no* machines >> >> actually support... this is incredibly confusing. >> > >> > Well, float128 and float96 aren't interchangeable across architectures >> > because of the different alignments, C long double isn't portable >> > either, >> > and float80 doesn't seem to be available anywhere. What concerns me is >> > the >> > difference between extended and quad precision, both of which can occupy >> > 128 >> > bits. I've complained about that for several years now, but as to >> > extended >> > precision, just don't use it. It will never be portable. >> >> I think part of the confusion here is about when a type is named like >> 'float', does 'N' refer to the size of the data or to the minimum >> alignment? I have a strong intuition that it should be the former, and >> I assume Matthew does too. If we have a data structure like >> ?struct { uint8_t flags; void * data; } > > We need both in theory, in practice floats and doubles are pretty well > defined these days, but long doubles depend on architecture and compiler for > alignment, and even for representation in the case of PPC. I don't regard > these facts as obscure if one is familiar with floating point, but most > folks aren't and I agree that it can be misleading if one assumes that types > and storage space are strongly coupled. This also ties in to the problem > with ints and longs, which may both be int32 despite having different C > names. > >> >> then 'flags' will actually get 32 or 64 bits of space... but we would >> never, ever refer to it as a uint32 or a uint64! I know these extended >> precision types are even weirder because the compiler will insert that >> padding unconditionally, but the intuition still stands, and obviously >> some proportion of the userbase will share it. >> >> If our API makes smart people like Matthew spend a week going around >> in circles, then our API is dangerously broken! >> > > I think "dangerously" is a bit overly dramatic. > >> >> The solution is just to call it 'longdouble', which clearly >> communicates 'this does some quirky thing that depends on your C >> compiler and architecture'. >> > > Well, I don't know. If someone is unfamiliar with floats I would expect they > would post a complaint about bugs if a file of longdouble type written on a > 32 bit system couldn't be read on a 64 bit system. It might be better to > somehow combine both the ieee type and the storage alignment. David was pointing out that e.g. np.float128 could be a different thing on SPARC, PPC and Intel, so it seems to me that the float128 is a false friend if we think it at all likely that people will use platforms other than Intel. Personally, if I saw 'longdouble' as a datatype, it would not surprise me if it wasn't portable across platforms, including 32 and 64 bit. float80_96 and float80_128 seem fine to me, but it would also be good to suggest longdouble as the default name to use for the platform-specific higher-precision datatype to make code portable across platforms. See you, Matthew From e.antero.tammi at gmail.com Mon Oct 17 15:45:46 2011 From: e.antero.tammi at gmail.com (eat) Date: Mon, 17 Oct 2011 22:45:46 +0300 Subject: [Numpy-discussion] Example Usage of Neighborhood Iterator in Cython In-Reply-To: References: Message-ID: Hi, On Mon, Oct 17, 2011 at 9:19 PM, T J wrote: > I recently put together a Cython example which uses the neighborhood > iterator. It was trickier than I thought it would be, so I thought to > share it with the community. The function takes a 1-dimensional array > and returns a 2-dimensional array of neighborhoods in the original > area. This is somewhat similar to the functionality provided by > segment_axis (http://projects.scipy.org/numpy/ticket/901), but I > believe this slightly different in that neighborhood can extend to the > left of the current index (assuming circular boundaries). Keep in > mind that this is just an example, and normal usage probably is not > concerned with creating a new array. > > External link: http://codepad.org/czRIzXQl > > -------------- > > import numpy as np > cimport numpy as np > > cdef extern from "numpy/arrayobject.h": > > ctypedef extern class numpy.flatiter [object PyArrayIterObject]: > cdef int nd_m1 > cdef np.npy_intp index, size > cdef np.ndarray ao > cdef char *dataptr > > # This isn't exposed to the Python API. > # So we can't use the same approach we used to define flatiter > ctypedef struct PyArrayNeighborhoodIterObject: > int nd_m1 > np.npy_intp index, size > np.PyArrayObject *ao # note the change from np.ndarray > char *dataptr > > object PyArray_NeighborhoodIterNew(flatiter it, np.npy_intp* bounds, > int mode, np.ndarray fill_value) > int PyArrayNeighborhoodIter_Next(PyArrayNeighborhoodIterObject *it) > int PyArrayNeighborhoodIter_Reset(PyArrayNeighborhoodIterObject *it) > > object PyArray_IterNew(object arr) > void PyArray_ITER_NEXT(flatiter it) > np.npy_intp PyArray_SIZE(np.ndarray arr) > > cdef enum: > NPY_NEIGHBORHOOD_ITER_ZERO_PADDING, > NPY_NEIGHBORHOOD_ITER_ONE_PADDING, > NPY_NEIGHBORHOOD_ITER_CONSTANT_PADDING, > NPY_NEIGHBORHOOD_ITER_CIRCULAR_PADDING, > NPY_NEIGHBORHOOD_ITER_MIRROR_PADDING > > np.import_array() > > def windowed(np.ndarray[np.int_t, ndim=1] arr, bounds): > > cdef flatiter iterx = PyArray_IterNew(arr) > cdef np.npy_intp size = PyArray_SIZE(arr) > cdef np.npy_intp* boundsPtr = [bounds[0], bounds[1]] > cdef int hoodSize = bounds[1] - bounds[0] + 1 > > # Create the Python object and keep a reference to it > cdef object niterx_ = PyArray_NeighborhoodIterNew(iterx, > boundsPtr, NPY_NEIGHBORHOOD_ITER_CIRCULAR_PADDING, None) > cdef PyArrayNeighborhoodIterObject *niterx = \ > niterx_ > > cdef int i,j > cdef np.ndarray[np.int_t, ndim=2] hoods > > hoods = np.empty((arr.shape[0], hoodSize), dtype=np.int) > for i in range(iterx.size): > for j in range(niterx.size): > hoods[i,j] = (niterx.dataptr)[0] > PyArrayNeighborhoodIter_Next(niterx) > PyArray_ITER_NEXT(iterx) > PyArrayNeighborhoodIter_Reset(niterx) > return hoods > > def test(): > x = np.arange(10) > print x > print > print windowed(x, [-1, 3]) > print > print windowed(x, [-2, 2]) > > > ---------- > > If you run test(), this is what you should see: > > [0 1 2 3 4 5 6 7 8 9] > > [[9 0 1 2 3] > [0 1 2 3 4] > [1 2 3 4 5] > [2 3 4 5 6] > [3 4 5 6 7] > [4 5 6 7 8] > [5 6 7 8 9] > [6 7 8 9 0] > [7 8 9 0 1] > [8 9 0 1 2]] > > [[8 9 0 1 2] > [9 0 1 2 3] > [0 1 2 3 4] > [1 2 3 4 5] > [2 3 4 5 6] > [3 4 5 6 7] > [4 5 6 7 8] > [5 6 7 8 9] > [6 7 8 9 0] > [7 8 9 0 1]] > > windowed(x, [0, 2]) is almost like segment_axis(x, 3, 2, end='wrap'). > Just wondering what are the main benefits, of your approach, comparing to simple: In []: a= arange(5) In []: n= 10 In []: b= arange(n)[:, None] In []: mod(a+ roll(b, 1), n) Out[]: array([[9, 0, 1, 2, 3], [0, 1, 2, 3, 4], [1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7], [4, 5, 6, 7, 8], [5, 6, 7, 8, 9], [6, 7, 8, 9, 0], [7, 8, 9, 0, 1], [8, 9, 0, 1, 2]]) In []: mod(a+ roll(b, 2), n) Out[]: array([[8, 9, 0, 1, 2], [9, 0, 1, 2, 3], [0, 1, 2, 3, 4], [1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7], [4, 5, 6, 7, 8], [5, 6, 7, 8, 9], [6, 7, 8, 9, 0], [7, 8, 9, 0, 1]]) Regards, eat > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjhnson at gmail.com Mon Oct 17 17:16:18 2011 From: tjhnson at gmail.com (T J) Date: Mon, 17 Oct 2011 14:16:18 -0700 Subject: [Numpy-discussion] Example Usage of Neighborhood Iterator in Cython In-Reply-To: References: Message-ID: On Mon, Oct 17, 2011 at 12:45 PM, eat wrote: > > Just wondering what are the main benefits, of your approach, comparing to > simple: As I hinted, my goal was not to construct a "practical" example, but rather, to demonstrate how to use the neighborhood iterator in Cython. Roll and mod are quite nice. :) Now imagine working with higher dimensional arrays with more exotic neighborhoods (like the letter X). From nadavh at visionsense.com Tue Oct 18 04:37:58 2011 From: nadavh at visionsense.com (Nadav Horesh) Date: Tue, 18 Oct 2011 01:37:58 -0700 Subject: [Numpy-discussion] Example Usage of Neighborhood Iterator in Cython In-Reply-To: References: , Message-ID: <26FC23E7C398A64083C980D16001012D261A2E4364@VA3DIAXVS361.RED001.local> Just in time! I was just working on a cythonic replacement to ndimage.generic_filter (well, I took a a short two years break in the middle). thank you very much, Nadav. ________________________________________ From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of T J [tjhnson at gmail.com] Sent: 17 October 2011 23:16 To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Example Usage of Neighborhood Iterator in Cython On Mon, Oct 17, 2011 at 12:45 PM, eat wrote: > > Just wondering what are the main benefits, of your approach, comparing to > simple: As I hinted, my goal was not to construct a "practical" example, but rather, to demonstrate how to use the neighborhood iterator in Cython. Roll and mod are quite nice. :) Now imagine working with higher dimensional arrays with more exotic neighborhoods (like the letter X). _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From chaoyuejoy at gmail.com Tue Oct 18 07:44:00 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Tue, 18 Oct 2011 13:44:00 +0200 Subject: [Numpy-discussion] index the last several members of a ndarray Message-ID: Dear all, if I have In [395]: a Out[395]: array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]) In [396]: a[...,-1] Out[396]: array([4, 9]) In [397]: a[...,-4:-1] Out[397]: array([[1, 2, 3], [6, 7, 8]]) In [398]: a[...,-4:0] Out[398]: array([], shape=(2, 0), dtype=int64) how can I pick up something like: array([[1, 2, 3, 4], [6, 7, 8, 9]]) I want to index the final 4 rows. I cannot figure out how to do this? Thanks for any help, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanluc.menut at free.fr Tue Oct 18 07:51:24 2011 From: jeanluc.menut at free.fr (Jean-Luc Menut) Date: Tue, 18 Oct 2011 13:51:24 +0200 Subject: [Numpy-discussion] index the last several members of a ndarray In-Reply-To: References: Message-ID: <4E9D683C.7010107@free.fr> > how can I pick up something like: > array([[1, 2, 3, 4], > [6, 7, 8, 9]]) I'm not sure to understand, should not a[:,1:] be sufficient ? Did I miss something in your message ? From eric at depagne.org Tue Oct 18 07:56:33 2011 From: eric at depagne.org (=?iso-8859-15?q?=C9ric_Depagne?=) Date: Tue, 18 Oct 2011 13:56:33 +0200 Subject: [Numpy-discussion] index the last several members of a ndarray In-Reply-To: References: Message-ID: <201110181356.33737.eric@depagne.org> Le mardi 18 octobre 2011 13:44:00, Chao YUE a ?crit : > Dear all, > > if I have > > In [395]: a > Out[395]: > array([[0, 1, 2, 3, 4], > [5, 6, 7, 8, 9]]) > > In [396]: a[...,-1] > Out[396]: array([4, 9]) > > In [397]: a[...,-4:-1] > Out[397]: > array([[1, 2, 3], > [6, 7, 8]]) > > In [398]: a[...,-4:0] > Out[398]: array([], shape=(2, 0), dtype=int64) > > how can I pick up something like: > array([[1, 2, 3, 4], > [6, 7, 8, 9]]) > > I want to index the final 4 rows. I cannot figure out how to do this? > Don't you want to do : In [1]: a[:,-4:] Out[1]: array([[1, 2, 3, 4], [6, 7, 8, 9]]) ?ric. > Thanks for any help, > > Chao Un clavier azerty en vaut deux ---------------------------------------------------------- ?ric Depagne eric at depagne.org From chaoyuejoy at gmail.com Tue Oct 18 07:56:48 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Tue, 18 Oct 2011 13:56:48 +0200 Subject: [Numpy-discussion] index the last several members of a ndarray In-Reply-To: <4E9D683C.7010107@free.fr> References: <4E9D683C.7010107@free.fr> Message-ID: Thanks Jean I just want the last several numbers by indexing from the end. In [400]: b=np.arange(20).reshape(2,10) In [401]: b Out[401]: array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) I want something like b[...,...(index from the end by using negative number)] to get: array([[ 6, 7, 8, 9], [16, 17, 18, 19]]) but it's strange that if you use b[...,-1], you get: In [402]: b[...,-1] Out[402]: array([ 9, 19]) if use b[...,-4:-1], you get: Out[403]: array([[ 6, 7, 8], [16, 17, 18]]) but you cannot use b[...,-4:-1] to get array([[ 6, 7, 8, 9], [16, 17, 18, 19]]) because In [403]: b[...,-4:-1] Out[403]: array([[ 6, 7, 8], [16, 17, 18]]) I don't know I am more clear this time.... Chao 2011/10/18 Jean-Luc Menut > how can I pick up something like: >> array([[1, 2, 3, 4], >> [6, 7, 8, 9]]) >> > > > I'm not sure to understand, should not a[:,1:] be sufficient ? > Did I miss something in your message ? > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Tue Oct 18 07:58:42 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Tue, 18 Oct 2011 13:58:42 +0200 Subject: [Numpy-discussion] index the last several members of a ndarray In-Reply-To: References: <4E9D683C.7010107@free.fr> Message-ID: you are right Eric, In [405]: b[...,-4:] Out[405]: array([[ 6, 7, 8, 9], [16, 17, 18, 19]]) cheers, Chao 2011/10/18 Chao YUE > Thanks Jean > I just want the last several numbers by indexing from the end. > > In [400]: b=np.arange(20).reshape(2,10) > > In [401]: b > Out[401]: > array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], > [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) > > I want something like b[...,...(index from the end by using negative > number)] to get: > array([[ 6, 7, 8, 9], > [16, 17, 18, 19]]) > > but it's strange that if you use b[...,-1], > you get: > In [402]: b[...,-1] > Out[402]: array([ 9, 19]) > > if use b[...,-4:-1], > you get: > Out[403]: > array([[ 6, 7, 8], > [16, 17, 18]]) > > but you cannot use b[...,-4:-1] to get > array([[ 6, 7, 8, 9], > [16, 17, 18, 19]]) > because > In [403]: b[...,-4:-1] > Out[403]: > array([[ 6, 7, 8], > [16, 17, 18]]) > > I don't know I am more clear this time.... > > Chao > > > 2011/10/18 Jean-Luc Menut > >> how can I pick up something like: >>> array([[1, 2, 3, 4], >>> [6, 7, 8, 9]]) >>> >> >> >> I'm not sure to understand, should not a[:,1:] be sufficient ? >> Did I miss something in your message ? >> >> > > > -- > > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > ************************************************************************************ > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Tue Oct 18 08:44:12 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Tue, 18 Oct 2011 14:44:12 +0200 Subject: [Numpy-discussion] np.ma.mean is not working? Message-ID: Dear all, previoulsy I think np.ma.mean() will automatically filter the masked (missing) value but it's not? In [489]: a=np.arange(20.).reshape(2,10) In [490]: a=np.ma.masked_array(a,(a==2)|(a==5)|(a==11)|(a==18),fill_value=np.nan) In [491]: a Out[491]: masked_array(data = [[0.0 1.0 -- 3.0 4.0 -- 6.0 7.0 8.0 9.0] [10.0 -- 12.0 13.0 14.0 15.0 16.0 17.0 -- 19.0]], mask = [[False False True False False True False False False False] [False True False False False False False False True False]], fill_value = nan) In [492]: a.mean(0) Out[492]: masked_array(data = [5.0 1.0 12.0 8.0 9.0 15.0 11.0 12.0 8.0 14.0], mask = [False False False False False False False False False False], fill_value = 1e+20) In [494]: np.ma.mean(a,0) Out[494]: masked_array(data = [5.0 1.0 12.0 8.0 9.0 15.0 11.0 12.0 8.0 14.0], mask = [False False False False False False False False False False], fill_value = 1e+20) In [495]: np.ma.mean(a,0)==a.mean(0) Out[495]: masked_array(data = [ True True True True True True True True True True], mask = False, fill_value = True) only use a.filled().mean(0) can I get the result I want: In [496]: a.filled().mean(0) Out[496]: array([ 5., NaN, NaN, 8., 9., NaN, 11., 12., NaN, 14.]) I am doing this because I tried to have a small fuction from the web to do moving average for data: import numpy as np def rolling_window(a, window): if window < 1: raise ValueError, "`window` must be at least 1." if window > a.shape[-1]: raise ValueError, "`window` is too long." shape = a.shape[:-1] + (a.shape[-1] - window + 1, window) strides = a.strides + (a.strides[-1],) return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides) def move_ave(a,window): temp=rolling_window(a,window) pre=int(window)/2 post=int(window)-pre-1 return np.concatenate((a[...,0:pre],np.mean(temp,-1),a[...,-post:]),axis=-1) In [489]: a=np.arange(20.).reshape(2,10) In [499]: move_ave(a,4) Out[499]: masked_array(data = [[ 0. 1. 1.5 2.5 3.5 4.5 5.5 6.5 7.5 9. ] [ 10. 11. 11.5 12.5 13.5 14.5 15.5 16.5 17.5 19. ]], mask = False, fill_value = 1e+20) thanks, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Tue Oct 18 08:49:16 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Tue, 18 Oct 2011 14:49:16 +0200 Subject: [Numpy-discussion] anyway to check a ndarray is a mased array or not? Message-ID: Just one more question, how can I check a ndarray is a masked array or not? Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Tue Oct 18 09:00:26 2011 From: shish at keba.be (Olivier Delalleau) Date: Tue, 18 Oct 2011 09:00:26 -0400 Subject: [Numpy-discussion] anyway to check a ndarray is a mased array or not? In-Reply-To: References: Message-ID: You could simply check if it has a 'mask' attribute. You can also check if it is an instance of numpy.ma.core.MaskedArray. -=- Olivier Le 18 octobre 2011 08:49, Chao YUE a ?crit : > Just one more question, how can I check a ndarray is a masked array or not? > > Chao > -- > > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > ************************************************************************************ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Tue Oct 18 09:07:02 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Tue, 18 Oct 2011 15:07:02 +0200 Subject: [Numpy-discussion] anyway to check a ndarray is a mased array or not? In-Reply-To: References: Message-ID: Thanks Olivier. but I don't know how can I check it in the code (not in an interactive mode)? I would like: if ndarray a is a mased array: code 1 else code 2 thanks again, Chao 2011/10/18 Olivier Delalleau > You could simply check if it has a 'mask' attribute. You can also check if > it is an instance of numpy.ma.core.MaskedArray. > > -=- Olivier > > Le 18 octobre 2011 08:49, Chao YUE a ?crit : > >> Just one more question, how can I check a ndarray is a masked array or >> not? >> >> Chao >> -- >> >> *********************************************************************************** >> Chao YUE >> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) >> UMR 1572 CEA-CNRS-UVSQ >> Batiment 712 - Pe 119 >> 91191 GIF Sur YVETTE Cedex >> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 >> >> ************************************************************************************ >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Tue Oct 18 09:10:51 2011 From: shish at keba.be (Olivier Delalleau) Date: Tue, 18 Oct 2011 09:10:51 -0400 Subject: [Numpy-discussion] np.ma.mean is not working? In-Reply-To: References: Message-ID: As far as I can tell ma.mean() is working as expected here: it computes the mean only over non-masked values. If you want to get rid of any mean that was computed over a series containing masked value you can do: b = a.mean(0) b.mask[a.mask.any(0)] = True Then b will be: masked_array(data = [5.0 -- -- 8.0 9.0 -- 11.0 12.0 -- 14.0], mask = [False True True False False True False False True False], fill_value = 1e+20) -=- Olivier 2011/10/18 Chao YUE > Dear all, > > previoulsy I think np.ma.mean() will automatically filter the masked > (missing) value but it's not? > In [489]: a=np.arange(20.).reshape(2,10) > > In [490]: > a=np.ma.masked_array(a,(a==2)|(a==5)|(a==11)|(a==18),fill_value=np.nan) > > In [491]: a > Out[491]: > masked_array(data = > [[0.0 1.0 -- 3.0 4.0 -- 6.0 7.0 8.0 9.0] > [10.0 -- 12.0 13.0 14.0 15.0 16.0 17.0 -- 19.0]], > mask = > [[False False True False False True False False False False] > [False True False False False False False False True False]], > fill_value = nan) > > In [492]: a.mean(0) > Out[492]: > masked_array(data = [5.0 1.0 12.0 8.0 9.0 15.0 11.0 12.0 8.0 14.0], > mask = [False False False False False False False False False > False], > fill_value = 1e+20) > > In [494]: np.ma.mean(a,0) > Out[494]: > masked_array(data = [5.0 1.0 12.0 8.0 9.0 15.0 11.0 12.0 8.0 14.0], > mask = [False False False False False False False False False > False], > fill_value = 1e+20) > > In [495]: np.ma.mean(a,0)==a.mean(0) > Out[495]: > masked_array(data = [ True True True True True True True True True > True], > mask = False, > fill_value = True) > > only use a.filled().mean(0) can I get the result I want: > In [496]: a.filled().mean(0) > Out[496]: array([ 5., NaN, NaN, 8., 9., NaN, 11., 12., NaN, > 14.]) > > I am doing this because I tried to have a small fuction from the web to do > moving average for data: > > import numpy as np > def rolling_window(a, window): > if window < 1: > raise ValueError, "`window` must be at least 1." > if window > a.shape[-1]: > raise ValueError, "`window` is too long." > shape = a.shape[:-1] + (a.shape[-1] - window + 1, window) > strides = a.strides + (a.strides[-1],) > return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides) > > > def move_ave(a,window): > temp=rolling_window(a,window) > pre=int(window)/2 > post=int(window)-pre-1 > return > np.concatenate((a[...,0:pre],np.mean(temp,-1),a[...,-post:]),axis=-1) > > > In [489]: a=np.arange(20.).reshape(2,10) > > In [499]: move_ave(a,4) > Out[499]: > masked_array(data = > [[ 0. 1. 1.5 2.5 3.5 4.5 5.5 6.5 7.5 9. ] > [ 10. 11. 11.5 12.5 13.5 14.5 15.5 16.5 17.5 19. ]], > mask = > False, > fill_value = 1e+20) > > thanks, > > Chao > > -- > > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > ************************************************************************************ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Tue Oct 18 09:13:14 2011 From: shish at keba.be (Olivier Delalleau) Date: Tue, 18 Oct 2011 09:13:14 -0400 Subject: [Numpy-discussion] anyway to check a ndarray is a mased array or not? In-Reply-To: References: Message-ID: if hasattr(a, 'mask'): # or if isinstance(a, numpy.ma.core.MaskedArray.) code 1 else code 2 -=- Olivier 2011/10/18 Chao YUE > Thanks Olivier. but I don't know how can I check it in the code (not in an > interactive mode)? > I would like: > > if ndarray a is a mased array: > code 1 > else > code 2 > > thanks again, > > Chao > > > 2011/10/18 Olivier Delalleau > >> You could simply check if it has a 'mask' attribute. You can also check if >> it is an instance of numpy.ma.core.MaskedArray. >> >> -=- Olivier >> >> Le 18 octobre 2011 08:49, Chao YUE a ?crit : >> >>> Just one more question, how can I check a ndarray is a masked array or >>> not? >>> >>> Chao >>> -- >>> >>> *********************************************************************************** >>> Chao YUE >>> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) >>> UMR 1572 CEA-CNRS-UVSQ >>> Batiment 712 - Pe 119 >>> 91191 GIF Sur YVETTE Cedex >>> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 >>> >>> ************************************************************************************ >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > ************************************************************************************ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Tue Oct 18 09:40:56 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Tue, 18 Oct 2011 15:40:56 +0200 Subject: [Numpy-discussion] anyway to check a ndarray is a mased array or not? In-Reply-To: References: Message-ID: really cool, thanks. Chao 2011/10/18 Olivier Delalleau > if hasattr(a, 'mask'): # or if isinstance(a, numpy.ma.core.MaskedArray.) > > code 1 > else > code 2 > > -=- Olivier > > > 2011/10/18 Chao YUE > >> Thanks Olivier. but I don't know how can I check it in the code (not in an >> interactive mode)? >> I would like: >> >> if ndarray a is a mased array: >> code 1 >> else >> code 2 >> >> thanks again, >> >> Chao >> >> >> 2011/10/18 Olivier Delalleau >> >>> You could simply check if it has a 'mask' attribute. You can also check >>> if it is an instance of numpy.ma.core.MaskedArray. >>> >>> -=- Olivier >>> >>> Le 18 octobre 2011 08:49, Chao YUE a ?crit : >>> >>>> Just one more question, how can I check a ndarray is a masked array or >>>> not? >>>> >>>> Chao >>>> -- >>>> >>>> *********************************************************************************** >>>> Chao YUE >>>> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) >>>> UMR 1572 CEA-CNRS-UVSQ >>>> Batiment 712 - Pe 119 >>>> 91191 GIF Sur YVETTE Cedex >>>> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 >>>> >>>> ************************************************************************************ >>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> >> -- >> >> *********************************************************************************** >> Chao YUE >> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) >> UMR 1572 CEA-CNRS-UVSQ >> Batiment 712 - Pe 119 >> 91191 GIF Sur YVETTE Cedex >> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 >> >> ************************************************************************************ >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott.sinclair.za at gmail.com Tue Oct 18 09:58:27 2011 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Tue, 18 Oct 2011 15:58:27 +0200 Subject: [Numpy-discussion] index the last several members of a ndarray In-Reply-To: References: <4E9D683C.7010107@free.fr> Message-ID: On 18 October 2011 13:56, Chao YUE wrote: > but it's strange that if you use b[...,-1], > you get: > In [402]: b[...,-1] > Out[402]: array([ 9, 19]) > > if use b[...,-4:-1], > you get: > Out[403]: > array([[ 6,? 7,? 8], > ?????? [16, 17, 18]]) That's because you're mixing two different indexing constructs. In the first case, you're using direct indexing, so you get the values in b at the index you specify. In the second example you're using slicing syntax, where you get the values in b at the range of indices starting with -4 and ending *one before* -1 i.e. the values at b[..., -2]. Here's a simpler example: In [1]: a = range(5) In [2]: a Out[2]: [0, 1, 2, 3, 4] In [3]: a[0] Out[3]: 0 In [4]: a[2] Out[4]: 2 In [5]: a[0:2] Out[5]: [0, 1] In [6]: a[-3] Out[6]: 2 In [7]: a[-1] Out[7]: 4 In [8]: a[-3:-1] Out[8]: [2, 3] Cheers, Scott From chaoyuejoy at gmail.com Tue Oct 18 10:12:12 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Tue, 18 Oct 2011 16:12:12 +0200 Subject: [Numpy-discussion] np.ma.mean is not working? In-Reply-To: References: Message-ID: thanks. Olivier. I see. Chao 2011/10/18 Olivier Delalleau > As far as I can tell ma.mean() is working as expected here: it computes the > mean only over non-masked values. > If you want to get rid of any mean that was computed over a series > containing masked value you can do: > > b = a.mean(0) > b.mask[a.mask.any(0)] = True > > Then b will be: > > masked_array(data = [5.0 -- -- 8.0 9.0 -- 11.0 12.0 -- 14.0], > mask = [False True True False False True False False True > False], > fill_value = 1e+20) > > -=- Olivier > > 2011/10/18 Chao YUE > >> Dear all, >> >> previoulsy I think np.ma.mean() will automatically filter the masked >> (missing) value but it's not? >> In [489]: a=np.arange(20.).reshape(2,10) >> >> In [490]: >> a=np.ma.masked_array(a,(a==2)|(a==5)|(a==11)|(a==18),fill_value=np.nan) >> >> In [491]: a >> Out[491]: >> masked_array(data = >> [[0.0 1.0 -- 3.0 4.0 -- 6.0 7.0 8.0 9.0] >> [10.0 -- 12.0 13.0 14.0 15.0 16.0 17.0 -- 19.0]], >> mask = >> [[False False True False False True False False False False] >> [False True False False False False False False True False]], >> fill_value = nan) >> >> In [492]: a.mean(0) >> Out[492]: >> masked_array(data = [5.0 1.0 12.0 8.0 9.0 15.0 11.0 12.0 8.0 14.0], >> mask = [False False False False False False False False False >> False], >> fill_value = 1e+20) >> >> In [494]: np.ma.mean(a,0) >> Out[494]: >> masked_array(data = [5.0 1.0 12.0 8.0 9.0 15.0 11.0 12.0 8.0 14.0], >> mask = [False False False False False False False False False >> False], >> fill_value = 1e+20) >> >> In [495]: np.ma.mean(a,0)==a.mean(0) >> Out[495]: >> masked_array(data = [ True True True True True True True True >> True True], >> mask = False, >> fill_value = True) >> >> only use a.filled().mean(0) can I get the result I want: >> In [496]: a.filled().mean(0) >> Out[496]: array([ 5., NaN, NaN, 8., 9., NaN, 11., 12., NaN, >> 14.]) >> >> I am doing this because I tried to have a small fuction from the web to do >> moving average for data: >> >> import numpy as np >> def rolling_window(a, window): >> if window < 1: >> raise ValueError, "`window` must be at least 1." >> if window > a.shape[-1]: >> raise ValueError, "`window` is too long." >> shape = a.shape[:-1] + (a.shape[-1] - window + 1, window) >> strides = a.strides + (a.strides[-1],) >> return np.lib.stride_tricks.as_strided(a, shape=shape, >> strides=strides) >> >> def move_ave(a,window): >> temp=rolling_window(a,window) >> pre=int(window)/2 >> post=int(window)-pre-1 >> return >> np.concatenate((a[...,0:pre],np.mean(temp,-1),a[...,-post:]),axis=-1) >> >> >> In [489]: a=np.arange(20.).reshape(2,10) >> >> In [499]: move_ave(a,4) >> Out[499]: >> masked_array(data = >> [[ 0. 1. 1.5 2.5 3.5 4.5 5.5 6.5 7.5 9. ] >> [ 10. 11. 11.5 12.5 13.5 14.5 15.5 16.5 17.5 19. ]], >> mask = >> False, >> fill_value = 1e+20) >> >> thanks, >> >> Chao >> >> -- >> >> *********************************************************************************** >> Chao YUE >> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) >> UMR 1572 CEA-CNRS-UVSQ >> Batiment 712 - Pe 119 >> 91191 GIF Sur YVETTE Cedex >> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 >> >> ************************************************************************************ >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Tue Oct 18 10:14:03 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Tue, 18 Oct 2011 16:14:03 +0200 Subject: [Numpy-discussion] index the last several members of a ndarray In-Reply-To: References: <4E9D683C.7010107@free.fr> Message-ID: thanks Scott. very good explanation. cheers, Chao 2011/10/18 Scott Sinclair > On 18 October 2011 13:56, Chao YUE wrote: > > but it's strange that if you use b[...,-1], > > you get: > > In [402]: b[...,-1] > > Out[402]: array([ 9, 19]) > > > > if use b[...,-4:-1], > > you get: > > Out[403]: > > array([[ 6, 7, 8], > > [16, 17, 18]]) > > That's because you're mixing two different indexing constructs. In the > first case, you're using direct indexing, so you get the values in b > at the index you specify. > > In the second example you're using slicing syntax, where you get the > values in b at the range of indices starting with -4 and ending *one > before* -1 i.e. the values at b[..., -2]. > > Here's a simpler example: > > In [1]: a = range(5) > > In [2]: a > Out[2]: [0, 1, 2, 3, 4] > > In [3]: a[0] > Out[3]: 0 > > In [4]: a[2] > Out[4]: 2 > > In [5]: a[0:2] > Out[5]: [0, 1] > > In [6]: a[-3] > Out[6]: 2 > > In [7]: a[-1] > Out[7]: 4 > > In [8]: a[-3:-1] > Out[8]: [2, 3] > > Cheers, > Scott > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Tue Oct 18 10:21:14 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 18 Oct 2011 09:21:14 -0500 Subject: [Numpy-discussion] np.ma.mean is not working? In-Reply-To: References: Message-ID: <4E9D8B5A.9070002@gmail.com> On 10/18/2011 09:12 AM, Chao YUE wrote: > thanks. Olivier. I see. > > Chao > > 2011/10/18 Olivier Delalleau > > > As far as I can tell ma.mean() is working as expected here: it > computes the mean only over non-masked values. > If you want to get rid of any mean that was computed over a series > containing masked value you can do: > > b = a.mean(0) > b.mask[a.mask.any(0)] = True > > Then b will be: > > masked_array(data = [5.0 -- -- 8.0 9.0 -- 11.0 12.0 -- 14.0], > mask = [False True True False False True False > False True False], > fill_value = 1e+20) > > -=- Olivier > > 2011/10/18 Chao YUE > > > Dear all, > > previoulsy I think np.ma.mean() will automatically filter the > masked (missing) value but it's not? > In [489]: a=np.arange(20.).reshape(2,10) > > In [490]: > a=np.ma.masked_array(a,(a==2)|(a==5)|(a==11)|(a==18),fill_value=np.nan) > > In [491]: a > Out[491]: > masked_array(data = > [[0.0 1.0 -- 3.0 4.0 -- 6.0 7.0 8.0 9.0] > [10.0 -- 12.0 13.0 14.0 15.0 16.0 17.0 -- 19.0]], > mask = > [[False False True False False True False False False False] > [False True False False False False False False True False]], > fill_value = nan) > > In [492]: a.mean(0) > Out[492]: > masked_array(data = [5.0 1.0 12.0 8.0 9.0 15.0 11.0 12.0 8.0 > 14.0], > mask = [False False False False False False False > False False False], > fill_value = 1e+20) > > In [494]: np.ma.mean(a,0) > Out[494]: > masked_array(data = [5.0 1.0 12.0 8.0 9.0 15.0 11.0 12.0 8.0 > 14.0], > mask = [False False False False False False False > False False False], > fill_value = 1e+20) > > In [495]: np.ma.mean(a,0)==a.mean(0) > Out[495]: > masked_array(data = [ True True True True True True > True True True True], > mask = False, > fill_value = True) > > only use a.filled().mean(0) can I get the result I want: > In [496]: a.filled().mean(0) > Out[496]: array([ 5., NaN, NaN, 8., 9., NaN, 11., > 12., NaN, 14.]) > > I am doing this because I tried to have a small fuction from > the web to do moving average for data: > > import numpy as np > def rolling_window(a, window): > if window < 1: > raise ValueError, "`window` must be at least 1." > if window > a.shape[-1]: > raise ValueError, "`window` is too long." > shape = a.shape[:-1] + (a.shape[-1] - window + 1, window) > strides = a.strides + (a.strides[-1],) > return np.lib.stride_tricks.as_strided(a, shape=shape, > strides=strides) > > def move_ave(a,window): > temp=rolling_window(a,window) > pre=int(window)/2 > post=int(window)-pre-1 > return > np.concatenate((a[...,0:pre],np.mean(temp,-1),a[...,-post:]),axis=-1) > > > In [489]: a=np.arange(20.).reshape(2,10) > > In [499]: move_ave(a,4) > Out[499]: > masked_array(data = > [[ 0. 1. 1.5 2.5 3.5 4.5 5.5 6.5 7.5 9. ] > [ 10. 11. 11.5 12.5 13.5 14.5 15.5 16.5 17.5 19. ]], > mask = > False, > fill_value = 1e+20) > > thanks, > > Chao > > -- > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement > (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > ************************************************************************************ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > -- > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > ************************************************************************************ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Looked at pandas for your rolling window functionality: http://pandas.sourceforge.net *"Time series*-specific functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc." Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Tue Oct 18 12:57:25 2011 From: efiring at hawaii.edu (Eric Firing) Date: Tue, 18 Oct 2011 06:57:25 -1000 Subject: [Numpy-discussion] anyway to check a ndarray is a mased array or not? In-Reply-To: References: Message-ID: <4E9DAFF5.4070707@hawaii.edu> On 10/18/2011 03:13 AM, Olivier Delalleau wrote: > if hasattr(a, 'mask'): # or if isinstance(a, numpy.ma.core.MaskedArray.) or if numpy.ma.isMA(a): or if numpy.ma.isMaskedArray(a): Eric > code 1 > else > code 2 > > -=- Olivier > > 2011/10/18 Chao YUE > > > Thanks Olivier. but I don't know how can I check it in the code (not > in an interactive mode)? > I would like: > > if ndarray a is a mased array: > code 1 > else > code 2 > > thanks again, > > Chao > > > 2011/10/18 Olivier Delalleau > > > You could simply check if it has a 'mask' attribute. You can > also check if it is an instance of numpy.ma.core.MaskedArray. > > -=- Olivier > > Le 18 octobre 2011 08:49, Chao YUE > a ?crit : > > Just one more question, how can I check a ndarray is a > masked array or not? > > Chao > -- > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement > (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > ************************************************************************************ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > -- > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > ************************************************************************************ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From chaoyuejoy at gmail.com Tue Oct 18 13:02:29 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Tue, 18 Oct 2011 19:02:29 +0200 Subject: [Numpy-discussion] np.ma.mean is not working? In-Reply-To: <4E9D8B5A.9070002@gmail.com> References: <4E9D8B5A.9070002@gmail.com> Message-ID: Thanks Bruce. 2011/10/18 Bruce Southey > ** > On 10/18/2011 09:12 AM, Chao YUE wrote: > > thanks. Olivier. I see. > > Chao > > 2011/10/18 Olivier Delalleau > >> As far as I can tell ma.mean() is working as expected here: it computes >> the mean only over non-masked values. >> If you want to get rid of any mean that was computed over a series >> containing masked value you can do: >> >> b = a.mean(0) >> b.mask[a.mask.any(0)] = True >> >> Then b will be: >> >> masked_array(data = [5.0 -- -- 8.0 9.0 -- 11.0 12.0 -- 14.0], >> mask = [False True True False False True False False True >> False], >> fill_value = 1e+20) >> >> -=- Olivier >> >> 2011/10/18 Chao YUE >> >>> Dear all, >>> >>> previoulsy I think np.ma.mean() will automatically filter the masked >>> (missing) value but it's not? >>> In [489]: a=np.arange(20.).reshape(2,10) >>> >>> In [490]: >>> a=np.ma.masked_array(a,(a==2)|(a==5)|(a==11)|(a==18),fill_value=np.nan) >>> >>> In [491]: a >>> Out[491]: >>> masked_array(data = >>> [[0.0 1.0 -- 3.0 4.0 -- 6.0 7.0 8.0 9.0] >>> [10.0 -- 12.0 13.0 14.0 15.0 16.0 17.0 -- 19.0]], >>> mask = >>> [[False False True False False True False False False False] >>> [False True False False False False False False True False]], >>> fill_value = nan) >>> >>> In [492]: a.mean(0) >>> Out[492]: >>> masked_array(data = [5.0 1.0 12.0 8.0 9.0 15.0 11.0 12.0 8.0 14.0], >>> mask = [False False False False False False False False >>> False False], >>> fill_value = 1e+20) >>> >>> In [494]: np.ma.mean(a,0) >>> Out[494]: >>> masked_array(data = [5.0 1.0 12.0 8.0 9.0 15.0 11.0 12.0 8.0 14.0], >>> mask = [False False False False False False False False >>> False False], >>> fill_value = 1e+20) >>> >>> In [495]: np.ma.mean(a,0)==a.mean(0) >>> Out[495]: >>> masked_array(data = [ True True True True True True True True >>> True True], >>> mask = False, >>> fill_value = True) >>> >>> only use a.filled().mean(0) can I get the result I want: >>> In [496]: a.filled().mean(0) >>> Out[496]: array([ 5., NaN, NaN, 8., 9., NaN, 11., 12., NaN, >>> 14.]) >>> >>> I am doing this because I tried to have a small fuction from the web to >>> do moving average for data: >>> >>> import numpy as np >>> def rolling_window(a, window): >>> if window < 1: >>> raise ValueError, "`window` must be at least 1." >>> if window > a.shape[-1]: >>> raise ValueError, "`window` is too long." >>> shape = a.shape[:-1] + (a.shape[-1] - window + 1, window) >>> strides = a.strides + (a.strides[-1],) >>> return np.lib.stride_tricks.as_strided(a, shape=shape, >>> strides=strides) >>> >>> def move_ave(a,window): >>> temp=rolling_window(a,window) >>> pre=int(window)/2 >>> post=int(window)-pre-1 >>> return >>> np.concatenate((a[...,0:pre],np.mean(temp,-1),a[...,-post:]),axis=-1) >>> >>> >>> In [489]: a=np.arange(20.).reshape(2,10) >>> >>> In [499]: move_ave(a,4) >>> Out[499]: >>> masked_array(data = >>> [[ 0. 1. 1.5 2.5 3.5 4.5 5.5 6.5 7.5 9. ] >>> [ 10. 11. 11.5 12.5 13.5 14.5 15.5 16.5 17.5 19. ]], >>> mask = >>> False, >>> fill_value = 1e+20) >>> >>> thanks, >>> >>> Chao >>> >>> -- >>> >>> *********************************************************************************** >>> Chao YUE >>> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) >>> UMR 1572 CEA-CNRS-UVSQ >>> Batiment 712 - Pe 119 >>> 91191 GIF Sur YVETTE Cedex >>> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 >>> >>> ************************************************************************************ >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > ************************************************************************************ > > > _______________________________________________ > NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion > > Looked at pandas for your rolling window functionality: > http://pandas.sourceforge.net > > *"Time series*-specific functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc." > > Bruce > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Tue Oct 18 14:34:38 2011 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Tue, 18 Oct 2011 14:34:38 -0400 Subject: [Numpy-discussion] abs for max negative integers - desired behavior? In-Reply-To: References: <4E954D4B.7040905@esrf.fr> <4E955BDF.3010704@esrf.fr> Message-ID: What about a parameter that allow to select the option the user want? it would select between uint, upcasted_int, -MAX and +MAX. This way, at least it will be documented and user who care will have the choose. Personally, when the option is available, I would prefer the safe version, uint, but I understand that is not all people position. Fr?d?ric Bastien On Sat, Oct 15, 2011 at 3:00 PM, Matthew Brett wrote: > Hi, > > On Wed, Oct 12, 2011 at 8:31 AM, David Cournapeau wrote: >> On 10/12/11, "V. Armando Sol?" wrote: >>> On 12/10/2011 10:46, David Cournapeau wrote: >>>> On Wed, Oct 12, 2011 at 9:18 AM, "V. Armando Sol?" wrote: >>>>> ? From a pure user perspective, I would not expect the abs function to >>>>> return a negative number. Returning +127 plus a warning the first time >>>>> that happens seems to me a good compromise. >>>> I guess the question is what's the common context to use small >>>> integers in the first place. If it is to save memory, then upcasting >>>> may not be the best solution. I may be wrong, but if you decide to use >>>> those types in the first place, you need to know about overflows. Abs >>>> is just one of them (dividing by -1 is another, although this one >>>> actually raises an exception). >>>> >>>> Detecting it may be costly, but this would need benchmarking. >>>> >>>> That being said, without context, I don't find 127 a better solution than >>>> -128. >>> >>> Well that choice is just based on getting the closest positive number to >>> the true value (128). The context can be anything, for instance you >>> could be using a look up table based on the result of an integer >>> operation ... >>> >>> In terms of cost, it would imply to evaluate the cost of something like: >>> >>> a = abs(x); >>> ? if (a < 0) {a -= MIN_INT;} >>> return a; >> >> Yes, this is costly: it adds a branch to a trivial operation. I did >> some preliminary benchmarks (would need confirmation when I have more >> than one minute to spend on this): >> >> ?int8, 2**16 long array. Before check: 16 us. After check: 92 us. 5-6 >> times slower >> ?int8, 2**24 long array. Before check: 20ms. After check: 30ms. 30 % slower. >> >> There is also the issue of signaling the error in the ufunc machinery. >> I forgot whether this is possible at that level. > > I suppose that returning the equivalent uint type would be of zero cost though? > > I don't think the problem should be relegated to 'people should know > about this' because this a problem for any signed integer type, and it > can lead to nasty errors which people are unlikely to test for. > > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From matthew.brett at gmail.com Tue Oct 18 15:39:09 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 18 Oct 2011 12:39:09 -0700 Subject: [Numpy-discussion] abs for max negative integers - desired behavior? In-Reply-To: References: <4E954D4B.7040905@esrf.fr> <4E955BDF.3010704@esrf.fr> Message-ID: Hi, 2011/10/18 Fr?d?ric Bastien : > What about a parameter that allow to select the option the user want? > it would select between uint, upcasted_int, -MAX and +MAX. This way, > at least it will be documented and user who care will have the choose. > > Personally, when the option is available, I would prefer the safe > version, uint, but I understand that is not all people position. Would there be any objection to the proposal to add a keyword to abs: always_positive=False or similar, which would have the effect, when True, of returning uints from an int? Best, Matthew From chaoyuejoy at gmail.com Tue Oct 18 17:25:06 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Tue, 18 Oct 2011 23:25:06 +0200 Subject: [Numpy-discussion] np.ma.mean is not working? In-Reply-To: <4E9D8B5A.9070002@gmail.com> References: <4E9D8B5A.9070002@gmail.com> Message-ID: I would say pandas is really cool. More people need to know it. and we should have better documentation. cheers, Chao 2011/10/18 Bruce Southey > ** > On 10/18/2011 09:12 AM, Chao YUE wrote: > > thanks. Olivier. I see. > > Chao > > 2011/10/18 Olivier Delalleau > >> As far as I can tell ma.mean() is working as expected here: it computes >> the mean only over non-masked values. >> If you want to get rid of any mean that was computed over a series >> containing masked value you can do: >> >> b = a.mean(0) >> b.mask[a.mask.any(0)] = True >> >> Then b will be: >> >> masked_array(data = [5.0 -- -- 8.0 9.0 -- 11.0 12.0 -- 14.0], >> mask = [False True True False False True False False True >> False], >> fill_value = 1e+20) >> >> -=- Olivier >> >> 2011/10/18 Chao YUE >> >>> Dear all, >>> >>> previoulsy I think np.ma.mean() will automatically filter the masked >>> (missing) value but it's not? >>> In [489]: a=np.arange(20.).reshape(2,10) >>> >>> In [490]: >>> a=np.ma.masked_array(a,(a==2)|(a==5)|(a==11)|(a==18),fill_value=np.nan) >>> >>> In [491]: a >>> Out[491]: >>> masked_array(data = >>> [[0.0 1.0 -- 3.0 4.0 -- 6.0 7.0 8.0 9.0] >>> [10.0 -- 12.0 13.0 14.0 15.0 16.0 17.0 -- 19.0]], >>> mask = >>> [[False False True False False True False False False False] >>> [False True False False False False False False True False]], >>> fill_value = nan) >>> >>> In [492]: a.mean(0) >>> Out[492]: >>> masked_array(data = [5.0 1.0 12.0 8.0 9.0 15.0 11.0 12.0 8.0 14.0], >>> mask = [False False False False False False False False >>> False False], >>> fill_value = 1e+20) >>> >>> In [494]: np.ma.mean(a,0) >>> Out[494]: >>> masked_array(data = [5.0 1.0 12.0 8.0 9.0 15.0 11.0 12.0 8.0 14.0], >>> mask = [False False False False False False False False >>> False False], >>> fill_value = 1e+20) >>> >>> In [495]: np.ma.mean(a,0)==a.mean(0) >>> Out[495]: >>> masked_array(data = [ True True True True True True True True >>> True True], >>> mask = False, >>> fill_value = True) >>> >>> only use a.filled().mean(0) can I get the result I want: >>> In [496]: a.filled().mean(0) >>> Out[496]: array([ 5., NaN, NaN, 8., 9., NaN, 11., 12., NaN, >>> 14.]) >>> >>> I am doing this because I tried to have a small fuction from the web to >>> do moving average for data: >>> >>> import numpy as np >>> def rolling_window(a, window): >>> if window < 1: >>> raise ValueError, "`window` must be at least 1." >>> if window > a.shape[-1]: >>> raise ValueError, "`window` is too long." >>> shape = a.shape[:-1] + (a.shape[-1] - window + 1, window) >>> strides = a.strides + (a.strides[-1],) >>> return np.lib.stride_tricks.as_strided(a, shape=shape, >>> strides=strides) >>> >>> def move_ave(a,window): >>> temp=rolling_window(a,window) >>> pre=int(window)/2 >>> post=int(window)-pre-1 >>> return >>> np.concatenate((a[...,0:pre],np.mean(temp,-1),a[...,-post:]),axis=-1) >>> >>> >>> In [489]: a=np.arange(20.).reshape(2,10) >>> >>> In [499]: move_ave(a,4) >>> Out[499]: >>> masked_array(data = >>> [[ 0. 1. 1.5 2.5 3.5 4.5 5.5 6.5 7.5 9. ] >>> [ 10. 11. 11.5 12.5 13.5 14.5 15.5 16.5 17.5 19. ]], >>> mask = >>> False, >>> fill_value = 1e+20) >>> >>> thanks, >>> >>> Chao >>> >>> -- >>> >>> *********************************************************************************** >>> Chao YUE >>> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) >>> UMR 1572 CEA-CNRS-UVSQ >>> Batiment 712 - Pe 119 >>> 91191 GIF Sur YVETTE Cedex >>> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 >>> >>> ************************************************************************************ >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > ************************************************************************************ > > > _______________________________________________ > NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion > > Looked at pandas for your rolling window functionality: > http://pandas.sourceforge.net > > *"Time series*-specific functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc." > > Bruce > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Tue Oct 18 16:13:43 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 18 Oct 2011 22:13:43 +0200 Subject: [Numpy-discussion] numpy.distutils quirk In-Reply-To: <4E9459C8.8080300@ntc.zcu.cz> References: <4E9459C8.8080300@ntc.zcu.cz> Message-ID: On Tue, Oct 11, 2011 at 4:59 PM, Robert Cimrman wrote: > Hi, > > I have now spent several hours hunting down a major slowdown of my code > caused > (apparently) by using config.add_library() for a reusable part of C source > files instead of just config.add_extension(). > > The reason of the slowdown was different, but hard to discern, naming of > options and silent ignoring of non-existing ones: > > add_library() : extra_compiler_args > add_extension() : extra_compile_args > > Other build keys used for the same purpose also differ. > > A bug to be reported, or is this going to be solved by going bento? > > Bento will use saner names I hope, but filing a bug can't hurt. We're still fixing distutils issues. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From nwagner at iam.uni-stuttgart.de Wed Oct 19 07:58:58 2011 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Wed, 19 Oct 2011 13:58:58 +0200 Subject: [Numpy-discussion] Evaluate bivariate polynomials Message-ID: Hi all, how do I evaluate a bivariate polynomial p(x,y)=c_0 + c_1 x + c_2 y +c_3 x**2 + c_4 x*y+ c_5 y**2 + c_6 x**3 + c_7 x**2*y + c_8 x*y**2+c_9*y**3 + \dots in numpy ? In case of univariate polynomials I can use np.polyval. Any pointer would be appreciated. Nils From mail.till at gmx.de Wed Oct 19 09:56:17 2011 From: mail.till at gmx.de (Till Stensitzki) Date: Wed, 19 Oct 2011 13:56:17 +0000 (UTC) Subject: [Numpy-discussion] Boolean indexing of 2d-Array not intutive Message-ID: Hi, for a description of the problem see here: http://stackoverflow.com/questions/7820809/understanding-weird-boolean-2d-array-indexing-behavior-in-numpy I really think, that the current way of handling two boolean indices is missleading, is there any reason behind that? greetings Till From charlesr.harris at gmail.com Wed Oct 19 10:00:27 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 19 Oct 2011 08:00:27 -0600 Subject: [Numpy-discussion] Evaluate bivariate polynomials In-Reply-To: References: Message-ID: On Wed, Oct 19, 2011 at 5:58 AM, Nils Wagner wrote: > Hi all, > > how do I evaluate a bivariate polynomial > > p(x,y)=c_0 + c_1 x + c_2 y +c_3 x**2 + c_4 x*y+ c_5 y**2 + > c_6 x**3 + c_7 x**2*y + c_8 x*y**2+c_9*y**3 + \dots > > in numpy ? > > In case of univariate polynomials I can use np.polyval. > > Any pointer would be appreciated. > Here's a version for Bernstein polynomials that you could adapt. It's possible to fool with the 2d version to have it evaluate on the grid defined by x,y. As is, it evaluates on the x,y pairs. The coefficient c is a rectangular array. def bval(x, c): x = np.asarray(x) c = np.asarray(c) if len(c) == 1: c = c*np.ones(x.shape) else: t = 1 - x for i in range(len(c) - 1): c = c[:-1]*t + c[1:]*x return c[0] def bval2d(x, y, c): f = bval(y, bval(x, c[...,None])) return f I use Bernstein polynomials for non-linear least squares for numerical reasons and because they tend to work better with numerical differentiation. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cimrman3 at ntc.zcu.cz Wed Oct 19 10:54:45 2011 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Wed, 19 Oct 2011 16:54:45 +0200 Subject: [Numpy-discussion] numpy.distutils quirk In-Reply-To: References: <4E9459C8.8080300@ntc.zcu.cz> Message-ID: <4E9EE4B5.9020301@ntc.zcu.cz> On 10/18/11 22:13, Ralf Gommers wrote: > On Tue, Oct 11, 2011 at 4:59 PM, Robert Cimrman wrote: > >> Hi, >> >> I have now spent several hours hunting down a major slowdown of my code >> caused >> (apparently) by using config.add_library() for a reusable part of C source >> files instead of just config.add_extension(). >> >> The reason of the slowdown was different, but hard to discern, naming of >> options and silent ignoring of non-existing ones: >> >> add_library() : extra_compiler_args >> add_extension() : extra_compile_args >> >> Other build keys used for the same purpose also differ. >> >> A bug to be reported, or is this going to be solved by going bento? >> > Bento will use saner names I hope, but filing a bug can't hurt. We're still > fixing distutils issues. Ok. I am getting internal server error at http://projects.scipy.org/numpy/newticket now - I will wait some time and try again. r. From cimrman3 at ntc.zcu.cz Wed Oct 19 11:08:23 2011 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Wed, 19 Oct 2011 17:08:23 +0200 Subject: [Numpy-discussion] numpy.distutils quirk In-Reply-To: <4E9EE4B5.9020301@ntc.zcu.cz> References: <4E9459C8.8080300@ntc.zcu.cz> <4E9EE4B5.9020301@ntc.zcu.cz> Message-ID: <4E9EE7E7.6000606@ntc.zcu.cz> On 10/19/11 16:54, Robert Cimrman wrote: > On 10/18/11 22:13, Ralf Gommers wrote: >> On Tue, Oct 11, 2011 at 4:59 PM, Robert Cimrman wrote: >> >>> Hi, >>> >>> I have now spent several hours hunting down a major slowdown of my code >>> caused >>> (apparently) by using config.add_library() for a reusable part of C source >>> files instead of just config.add_extension(). >>> >>> The reason of the slowdown was different, but hard to discern, naming of >>> options and silent ignoring of non-existing ones: >>> >>> add_library() : extra_compiler_args >>> add_extension() : extra_compile_args >>> >>> Other build keys used for the same purpose also differ. >>> >>> A bug to be reported, or is this going to be solved by going bento? >>> > >> Bento will use saner names I hope, but filing a bug can't hurt. We're still >> fixing distutils issues. > > Ok. I am getting internal server error at > http://projects.scipy.org/numpy/newticket now - I will wait some time and try > again. got through: http://projects.scipy.org/numpy/ticket/1965 regards, r. From akshar.bhosale at gmail.com Sun Oct 23 01:54:26 2011 From: akshar.bhosale at gmail.com (akshar bhosale) Date: Sun, 23 Oct 2011 11:24:26 +0530 Subject: [Numpy-discussion] libmkl_lapack error in numpy Message-ID: i am getting following error. python -c 'import numpy;numpy.matrix([[1, 5, 10], [1.0, 3j, 4]], numpy.complex128).T.I.H' MKL FATAL ERROR: Cannot load libmkl_lapack.so have installed numpy 1.6.0 with python 2.6. i have intel cluster toolkit installed on my system. (11/069 version and mlk=10.3). i have machine having intel xeon processor and rhel 5.2 x86_64 platform. i am trying with intel compilers. if i do python -c 'import numpy;numpy.matrix([[1, 5, 10], [1.0, 3j, 4]], numpy.complex128).T.I.H' python: symbol lookup error: /opt/intel/Compiler/11.0/069/ mkl/lib/em64/libmkl_lapack.so: undefined symbol: mkl_lapack_zgeqrf my site.cfg is : #################### [mkl] mkl_libs =3D mkl_def, mkl_intel_lp64, mkl_intel_thread, mkl_core, mkl_mc - Ignored: lapack_libs =3D mkl_lapack95_lp64 library_dirs =3D /opt/intel/Compiler/11.0/069/ mkl/lib/em64t:/opt/intel/Compiler/11.0/069/lib/intel64/ include_dirs =3D /opt/intel/Compiler/11.0/069/mkl/include:/opt/intel/Compiler/11.0/069/inclu= de/ #################### and intelcompiler.py is : ############################ from distutils.unixccompiler import UnixCCompiler from numpy.distutils.exec_command import find_executable import sys class IntelCCompiler(UnixCCompiler): """ A modified Intel compiler compatible with an gcc built Python.""" compiler_type =3D 'intel' cc_exe =3D 'icc' cc_args =3D 'fPIC' def __init__ (self, verbose=3D0, dry_run=3D0, force=3D0): sys.exit(0) UnixCCompiler.__init__ (self, verbose,dry_run, force) self.cc_exe =3D 'icc -fPIC ' compiler =3D self.cc_exe self.set_executables(compiler=3Dcompiler, compiler_so=3Dcompiler, compiler_cxx=3Dcompiler, linker_exe=3Dcompiler, linker_so=3Dcompiler + ' -shared -lstdc++') class IntelItaniumCCompiler(IntelCCompiler): compiler_type =3D 'intele' # On Itanium, the Intel Compiler used to be called ecc, let's search fo= r # it (now it's also icc, so ecc is last in the search). for cc_exe in map(find_executable,['icc','ecc']): if cc_exe: break class IntelEM64TCCompiler(UnixCCompiler): """ A modified Intel x86_64 compiler compatible with a 64bit gcc built Python. """ compiler_type =3D 'intelem' cc_exe =3D 'icc -m64 -fPIC' cc_args =3D "-fPIC -openmp" def __init__ (self, verbose=3D0, dry_run=3D0, force=3D0): UnixCCompiler.__init__ (self, verbose,dry_run, force) self.cc_exe =3D 'icc -m64 -fPIC -openmp ' compiler =3D self.cc_exe self.set_executables(compiler=3Dcompiler, compiler_so=3Dcompiler, compiler_cxx=3Dcompiler, linker_exe=3Dcompiler, linker_so=3Dcompiler + ' -shared -lstdc++') ########################## LD_LIBRARY_PATH is : ######################### /opt/intel/Compiler/11.0/069/mkl/lib/em64t:/opt/scalasca-1.3.3/lib:/opt/PBS= /lib:/opt/intel/mpi/lib64:/opt/maui/lib:/opt/jdk1.6.0_23/lib:/opt/intel/Com= piler/11.0/069/ipp/em64t/sharedlib:/opt/intel/Compiler/11.0/069/mkl/lib/em6= 4t:/opt/intel/Compiler/11.0/069/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/l= ib:/opt/intel/Compiler/11.0/069/lib/intel64:/opt/intel/Compiler/11.0/069/ip= p/em64t/sharedlib:/opt/intel/Compiler/11.0/069/mkl/lib/em64t:/opt/intel/Com= piler/11.0/069/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/intel/Com= piler/11.0/069/lib/intel64:/usr/local/lib ######################### - Done. ---------- Forwarded message ---------- From: akshar bhosale To: numpy-discussion-request at scipy.org Date: Sun, 23 Oct 2011 11:19:27 +0530 Subject: libmkl_lapack error Hi, i am getting following error. python -c 'import numpy;numpy.matrix([[1, 5, 10], [1.0, 3j, 4]], numpy.complex128).T.I.H' MKL FATAL ERROR: Cannot load libmkl_lapack.so have installed numpy 1.6.0 with python 2.6. i have intel cluster toolkit installed on my system. (11/069 version and mkl 10.3). i have machine having intel xeon processor and rhel 5.2 x86_64 platform. i am trying with intel compilers. if i do python -c 'import numpy;numpy.matrix([[1, 5, 10], [1.0, 3j, 4]], numpy.complex128).T.I.H' python: symbol lookup error: /opt/intel/Compiler/11.0/069/mkl/lib/em64/libmkl_lapack.so: undefined symbol: mkl_lapack_zgeqrf my site.cfg is : #################### [mkl] mkl_libs = mkl_def, mkl_intel_lp64, mkl_intel_thread, mkl_core, mkl_mc lapack_libs = mkl_lapack95_lp64 library_dirs = /opt/intel/Compiler/11.0/069/ mkl/lib/em64t:/opt/intel/Compiler/11.0/069/lib/intel64/ include_dirs = /opt/intel/Compiler/11.0/069/mkl/include:/opt/intel/Compiler/11.0/069/include/ #################### and intelcompiler.py is : ############################ from distutils.unixccompiler import UnixCCompiler from numpy.distutils.exec_command import find_executable import sys class IntelCCompiler(UnixCCompiler): """ A modified Intel compiler compatible with an gcc built Python.""" compiler_type = 'intel' cc_exe = 'icc' cc_args = 'fPIC' def __init__ (self, verbose=0, dry_run=0, force=0): sys.exit(0) UnixCCompiler.__init__ (self, verbose,dry_run, force) self.cc_exe = 'icc -fPIC ' compiler = self.cc_exe self.set_executables(compiler=compiler, compiler_so=compiler, compiler_cxx=compiler, linker_exe=compiler, linker_so=compiler + ' -shared -lstdc++') class IntelItaniumCCompiler(IntelCCompiler): compiler_type = 'intele' # On Itanium, the Intel Compiler used to be called ecc, let's search for # it (now it's also icc, so ecc is last in the search). for cc_exe in map(find_executable,['icc','ecc']): if cc_exe: break class IntelEM64TCCompiler(UnixCCompiler): """ A modified Intel x86_64 compiler compatible with a 64bit gcc built Python. """ compiler_type = 'intelem' cc_exe = 'icc -m64 -fPIC' cc_args = "-fPIC -openmp" def __init__ (self, verbose=0, dry_run=0, force=0): UnixCCompiler.__init__ (self, verbose,dry_run, force) self.cc_exe = 'icc -m64 -fPIC -openmp ' compiler = self.cc_exe self.set_executables(compiler=compiler, compiler_so=compiler, compiler_cxx=compiler, linker_exe=compiler, linker_so=compiler + ' -shared -lstdc++') ########################## LD_LIBRARY_PATH is : ######################### /opt/intel/Compiler/11.0/069/mkl/lib/em64t:/opt/scalasca-1.3.3/lib:/opt/PBS/lib:/opt/intel/mpi/lib64:/opt/maui/lib:/opt/jdk1.6.0_23/lib:/opt/intel/Compiler/11.0/069/ipp/em64t/sharedlib:/opt/intel/Compiler/11.0/069/mkl/lib/em64t:/opt/intel/Compiler/11.0/069/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/intel/Compiler/11.0/069/lib/intel64:/opt/intel/Compiler/11.0/069/ipp/em64t/sharedlib:/opt/intel/Compiler/11.0/069/mkl/lib/em64t:/opt/intel/Compiler/11.0/069/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/intel/Compiler/11.0/069/lib/intel64:/usr/local/lib ######################### -------------- next part -------------- An HTML attachment was scrubbed... URL: From akshar.bhosale at gmail.com Sun Oct 23 03:35:07 2011 From: akshar.bhosale at gmail.com (akshar bhosale) Date: Sun, 23 Oct 2011 13:05:07 +0530 Subject: [Numpy-discussion] libmkl_lapack error in numpy In-Reply-To: References: Message-ID: Hi, libmkl_lapack.so is added in site.cfg and now the matrix function is not giving an error, but numpy.test hangs. On Sun, Oct 23, 2011 at 11:24 AM, akshar bhosale wrote: > i am getting following error. > python -c 'import numpy;numpy.matrix([[1, 5, 10], [1.0, 3j, 4]], > numpy.complex128).T.I.H' > MKL FATAL ERROR: Cannot load libmkl_lapack.so > > have installed numpy 1.6.0 with python 2.6. > i have intel cluster toolkit installed on my system. (11/069 version and > mlk=10.3). i have machine having intel xeon processor and rhel 5.2 x86_64 > platform. i am trying with intel compilers. > if i do > python -c 'import numpy;numpy.matrix([[1, 5, 10], [1.0, 3j, 4]], > numpy.complex128).T.I.H' > python: symbol lookup error: > /opt/intel/Compiler/11.0/069/ > mkl/lib/em64/libmkl_lapack.so: undefined > symbol: mkl_lapack_zgeqrf > my site.cfg is : > #################### > [mkl] > mkl_libs =3D mkl_def, mkl_intel_lp64, mkl_intel_thread, mkl_core, mkl_mc > > - Ignored: > lapack_libs =3D mkl_lapack95_lp64 > > library_dirs =3D /opt/intel/Compiler/11.0/069/ > mkl/lib/em64t:/opt/intel/Compiler/11.0/069/lib/intel64/ > include_dirs =3D > > /opt/intel/Compiler/11.0/069/mkl/include:/opt/intel/Compiler/11.0/069/inclu= > de/ > #################### > and intelcompiler.py is : > ############################ > from distutils.unixccompiler import UnixCCompiler > from numpy.distutils.exec_command import find_executable > import sys > > class IntelCCompiler(UnixCCompiler): > """ A modified Intel compiler compatible with an gcc built > Python.""" > compiler_type =3D 'intel' > cc_exe =3D 'icc' > cc_args =3D 'fPIC' > > def __init__ (self, verbose=3D0, dry_run=3D0, force=3D0): > sys.exit(0) > UnixCCompiler.__init__ (self, verbose,dry_run, force) > self.cc_exe =3D 'icc -fPIC ' > compiler =3D self.cc_exe > self.set_executables(compiler=3Dcompiler, > compiler_so=3Dcompiler, > compiler_cxx=3Dcompiler, > linker_exe=3Dcompiler, > linker_so=3Dcompiler + ' -shared -lstdc++') > > class IntelItaniumCCompiler(IntelCCompiler): > compiler_type =3D 'intele' > > # On Itanium, the Intel Compiler used to be called ecc, let's search > fo= > r > # it (now it's also icc, so ecc is last in the search). > for cc_exe in map(find_executable,['icc','ecc']): > if cc_exe: > break > > class IntelEM64TCCompiler(UnixCCompiler): > """ A modified Intel x86_64 compiler compatible with a 64bit gcc > built > Python. > """ > compiler_type =3D 'intelem' > cc_exe =3D 'icc -m64 -fPIC' > cc_args =3D "-fPIC -openmp" > def __init__ (self, verbose=3D0, dry_run=3D0, force=3D0): > UnixCCompiler.__init__ (self, verbose,dry_run, force) > self.cc_exe =3D 'icc -m64 -fPIC -openmp ' > compiler =3D self.cc_exe > self.set_executables(compiler=3Dcompiler, > compiler_so=3Dcompiler, > compiler_cxx=3Dcompiler, > linker_exe=3Dcompiler, > linker_so=3Dcompiler + ' -shared -lstdc++') > ########################## > LD_LIBRARY_PATH is : > ######################### > > /opt/intel/Compiler/11.0/069/mkl/lib/em64t:/opt/scalasca-1.3.3/lib:/opt/PBS= > > /lib:/opt/intel/mpi/lib64:/opt/maui/lib:/opt/jdk1.6.0_23/lib:/opt/intel/Com= > > piler/11.0/069/ipp/em64t/sharedlib:/opt/intel/Compiler/11.0/069/mkl/lib/em6= > > 4t:/opt/intel/Compiler/11.0/069/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/l= > > ib:/opt/intel/Compiler/11.0/069/lib/intel64:/opt/intel/Compiler/11.0/069/ip= > > p/em64t/sharedlib:/opt/intel/Compiler/11.0/069/mkl/lib/em64t:/opt/intel/Com= > > piler/11.0/069/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/intel/Com= > piler/11.0/069/lib/intel64:/usr/local/lib > ######################### > > - Done. > > > > ---------- Forwarded message ---------- > From: akshar bhosale > To: numpy-discussion-request at scipy.org > Date: Sun, 23 Oct 2011 11:19:27 +0530 > Subject: libmkl_lapack error > Hi, > i am getting following error. > python -c 'import numpy;numpy.matrix([[1, 5, 10], [1.0, 3j, 4]], > numpy.complex128).T.I.H' > MKL FATAL ERROR: Cannot load libmkl_lapack.so > > have installed numpy 1.6.0 with python 2.6. > > i have intel cluster toolkit installed on my system. (11/069 version and > mkl 10.3). i have machine having intel xeon processor and rhel 5.2 x86_64 > platform. i am trying with intel compilers. > if i do > > python -c 'import numpy;numpy.matrix([[1, 5, 10], [1.0, 3j, 4]], > numpy.complex128).T.I.H' > python: symbol lookup error: > /opt/intel/Compiler/11.0/069/mkl/lib/em64/libmkl_lapack.so: undefined > symbol: mkl_lapack_zgeqrf > > my site.cfg is : > #################### > [mkl] > > mkl_libs = mkl_def, mkl_intel_lp64, mkl_intel_thread, mkl_core, mkl_mc > lapack_libs = mkl_lapack95_lp64 > > library_dirs = /opt/intel/Compiler/11.0/069/ > mkl/lib/em64t:/opt/intel/Compiler/11.0/069/lib/intel64/ > include_dirs = > /opt/intel/Compiler/11.0/069/mkl/include:/opt/intel/Compiler/11.0/069/include/ > #################### > and intelcompiler.py is : > ############################ > from distutils.unixccompiler import UnixCCompiler > from numpy.distutils.exec_command import find_executable > import sys > > class IntelCCompiler(UnixCCompiler): > """ A modified Intel compiler compatible with an gcc built Python.""" > compiler_type = 'intel' > cc_exe = 'icc' > cc_args = 'fPIC' > > def __init__ (self, verbose=0, dry_run=0, force=0): > sys.exit(0) > UnixCCompiler.__init__ (self, verbose,dry_run, force) > self.cc_exe = 'icc -fPIC ' > compiler = self.cc_exe > self.set_executables(compiler=compiler, > compiler_so=compiler, > compiler_cxx=compiler, > linker_exe=compiler, > linker_so=compiler + ' -shared -lstdc++') > > class IntelItaniumCCompiler(IntelCCompiler): > compiler_type = 'intele' > > # On Itanium, the Intel Compiler used to be called ecc, let's search > for > # it (now it's also icc, so ecc is last in the search). > for cc_exe in map(find_executable,['icc','ecc']): > if cc_exe: > break > > class IntelEM64TCCompiler(UnixCCompiler): > """ A modified Intel x86_64 compiler compatible with a 64bit gcc built > Python. > """ > compiler_type = 'intelem' > cc_exe = 'icc -m64 -fPIC' > cc_args = "-fPIC -openmp" > def __init__ (self, verbose=0, dry_run=0, force=0): > UnixCCompiler.__init__ (self, verbose,dry_run, force) > self.cc_exe = 'icc -m64 -fPIC -openmp ' > compiler = self.cc_exe > self.set_executables(compiler=compiler, > compiler_so=compiler, > compiler_cxx=compiler, > linker_exe=compiler, > linker_so=compiler + ' -shared -lstdc++') > ########################## > LD_LIBRARY_PATH is : > ######################### > > /opt/intel/Compiler/11.0/069/mkl/lib/em64t:/opt/scalasca-1.3.3/lib:/opt/PBS/lib:/opt/intel/mpi/lib64:/opt/maui/lib:/opt/jdk1.6.0_23/lib:/opt/intel/Compiler/11.0/069/ipp/em64t/sharedlib:/opt/intel/Compiler/11.0/069/mkl/lib/em64t:/opt/intel/Compiler/11.0/069/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/intel/Compiler/11.0/069/lib/intel64:/opt/intel/Compiler/11.0/069/ipp/em64t/sharedlib:/opt/intel/Compiler/11.0/069/mkl/lib/em64t:/opt/intel/Compiler/11.0/069/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/intel/Compiler/11.0/069/lib/intel64:/usr/local/lib > ######################### > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From akshar.bhosale at gmail.com Sun Oct 23 05:45:07 2011 From: akshar.bhosale at gmail.com (akshar bhosale) Date: Sun, 23 Oct 2011 15:15:07 +0530 Subject: [Numpy-discussion] [SciPy-Dev] numpy.test hangs In-Reply-To: References: Message-ID: Hi, i changed site.cfg : ####site.cfg#### [MKL} mkl_libs = mkl_def, mkl_intel_lp64, mkl_intel_thread, mkl_core, mkl_mc lapack_libs = mkl_lapack95_lp64, mkl_lapack, mkl_scalapack_ilp64, mkl_scalapack_lp64 #lapack_libs =mkl_lapack,mkl_scalapack_ilp64,mkl_scalapack_lp64,mkl_lapack95_lp64 library_dirs = /opt/intel/Compiler/11.0/069/mkl/lib/em64t:/opt/intel/Compiler/11.0/069/lib/intel64/ include_dirs = /opt/intel/Compiler/11.0/069/mkl/include:/opt/intel/Compiler/11.0/069/include/ and now numpy.test hangs at : python Python 2.6 (r26:66714, May 29 2011, 15:10:47) [GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> numpy.test(verbose=3) Running unit tests for numpy NumPy version 1.6.0 NumPy is installed in /home/aksharb/Python-2.6/lib/python2.6/site-packages/numpy Python version 2.6 (r26:66714, May 29 2011, 15:10:47) [GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] nose version 1.0.0 nose.config: INFO: Excluding tests matching ['f2py_ext', 'f2py_f90_ext', 'gen_ext', 'pyrex_ext', 'swig_ext'] nose.selector: INFO: /home/external/unipune/gadre/Python-2.6/lib/python2.6/site-packages/numpy/core/multiarray.so is executable; skipped nose.selector: INFO: /home/external/unipune/gadre/Python-2.6/lib/python2.6/site-packages/numpy/core/scalarmath.so is executable; skipped nose.selector: INFO: /home/external/unipune/gadre/Python-2.6/lib/python2.6/site-packages/numpy/core/umath.so is executable; skipped nose.selector: INFO: /home/external/unipune/gadre/Python-2.6/lib/python2.6/site-packages/numpy/core/multiarray_tests.so is executable; skipped nose.selector: INFO: /home/external/unipune/gadre/Python-2.6/lib/python2.6/site-packages/numpy/core/umath_tests.so is executable; skipped nose.selector: INFO: /home/external/unipune/gadre/Python-2.6/lib/python2.6/site-packages/numpy/fft/fftpack_lite.so is executable; skipped nose.selector: INFO: /home/external/unipune/gadre/Python-2.6/lib/python2.6/site-packages/numpy/linalg/lapack_lite.so is executable; skipped nose.selector: INFO: /home/external/unipune/gadre/Python-2.6/lib/python2.6/site-packages/numpy/random/mtrand.so is executable; skipped test_api.test_fastCopyAndTranspose ... ok test_arrayprint.TestArrayRepr.test_nan_inf ... ok test_str (test_arrayprint.TestComplexArray) ... ok Ticket 844. ... ok test_blasdot.test_blasdot_used ... ok test_blasdot.test_dot_2args ... ok test_blasdot.test_dot_3args ... ok test_blasdot.test_dot_3args_errors ... ok . . . Test if an appropriate exception is raised when passing bad values to ... ok Test whether equivalent subarray dtypes hash the same. ... ok Test whether different subarray dtypes hash differently. ... ok Test some data types that are equal ... ok Test some more complicated cases that shouldn't be equal ... ok Test some simple cases that shouldn't be equal ... ok test_single_subarray (test_dtype.TestSubarray) ... ok test_einsum_errors (test_einsum.TestEinSum) ... ok test_einsum_sums_cfloat128 (test_einsum.TestEinSum) ... ####################### On Sat, Oct 22, 2011 at 1:46 PM, wrote: > This is a members-only list. Your message has been automatically > rejected, since it came from a non-member's email address. Please > make sure to use the email account that you used to join this list. > > > > ---------- Forwarded message ---------- > From: akshar bhosale > To: SciPy Developers List , Discussion of Numerical > Python > Date: Sat, 22 Oct 2011 13:46:49 +0530 > Subject: Re: [SciPy-Dev] numpy.test hangs > Hi, > > python > Python 2.6 (r26:66714, May 29 2011, 15:10:47) > [GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import numpy > >>> numpy.show_config() > lapack_opt_info: > libraries = ['mkl_lapack95_lp64', 'mkl_def', 'mkl_intel_lp64', > 'mkl_intel_thread', 'mkl_core', 'mkl_mc', 'pthread'] > library_dirs = ['/opt/intel/Compiler/11.0/069/mkl/lib/em64t'] > define_macros = [('SCIPY_MKL_H', None)] > include_dirs = ['/opt/intel/Compiler/11.0/069/mkl/include', > '/opt/intel/Compiler/11.0/069/include/'] > blas_opt_info: > libraries = ['mkl_def', 'mkl_intel_lp64', 'mkl_intel_thread', > 'mkl_core', 'mkl_mc', 'pthread'] > library_dirs = ['/opt/intel/Compiler/11.0/069/mkl/lib/em64t'] > define_macros = [('SCIPY_MKL_H', None)] > include_dirs = ['/opt/intel/Compiler/11.0/069/mkl/include', > '/opt/intel/Compiler/11.0/069/include/'] > lapack_mkl_info: > libraries = ['mkl_lapack95_lp64', 'mkl_def', 'mkl_intel_lp64', > 'mkl_intel_thread', 'mkl_core', 'mkl_mc', 'pthread'] > library_dirs = ['/opt/intel/Compiler/11.0/069/mkl/lib/em64t'] > define_macros = [('SCIPY_MKL_H', None)] > include_dirs = ['/opt/intel/Compiler/11.0/069/mkl/include', > '/opt/intel/Compiler/11.0/069/include/'] > blas_mkl_info: > libraries = ['mkl_def', 'mkl_intel_lp64', 'mkl_intel_thread', > 'mkl_core', 'mkl_mc', 'pthread'] > library_dirs = ['/opt/intel/Compiler/11.0/069/mkl/lib/em64t'] > define_macros = [('SCIPY_MKL_H', None)] > include_dirs = ['/opt/intel/Compiler/11.0/069/mkl/include', > '/opt/intel/Compiler/11.0/069/include/'] > mkl_info: > libraries = ['mkl_def', 'mkl_intel_lp64', 'mkl_intel_thread', > 'mkl_core', 'mkl_mc', 'pthread'] > library_dirs = ['/opt/intel/Compiler/11.0/069/mkl/lib/em64t'] > define_macros = [('SCIPY_MKL_H', None)] > include_dirs = ['/opt/intel/Compiler/11.0/069/mkl/include', > '/opt/intel/Compiler/11.0/069/include/'] > > Akshar > > > On Sat, Oct 22, 2011 at 11:54 AM, akshar bhosale > wrote: > >> yes sure.. >> i have intel cluster toolkit installed on my system. (11/069 version and >> mkl 10.3). i have machine having intel xeon processor and rhel 5.2 x86_64 >> platform. i am trying with intel compilers. >> if i do >> >> python -c 'import numpy;numpy.matrix([[1, 5, 10], [1.0, 3j, 4]], >> numpy.complex128).T.I.H' >> python: symbol lookup error: >> /opt/intel/Compiler/11.0/069/mkl/lib/em64/libmkl_lapack.so: undefined >> symbol: mkl_lapack_zgeqrf >> >> my site.cfg is : >> #################### >> [mkl] >> >> mkl_libs = mkl_def, mkl_intel_lp64, mkl_intel_thread, mkl_core, mkl_mc >> lapack_libs = mkl_lapack95_lp64 >> >> library_dirs = >> /opt/intel/Compiler/11.0/069/mkl/lib/em64t:/opt/intel/Compiler/11.0/069/lib/intel64/ >> include_dirs = >> /opt/intel/Compiler/11.0/069/mkl/include:/opt/intel/Compiler/11.0/069/include/ >> #################### >> and intelcompiler.py is : >> ############################ >> from distutils.unixccompiler import UnixCCompiler >> from numpy.distutils.exec_command import find_executable >> import sys >> >> class IntelCCompiler(UnixCCompiler): >> """ A modified Intel compiler compatible with an gcc built Python.""" >> compiler_type = 'intel' >> cc_exe = 'icc' >> cc_args = 'fPIC' >> >> def __init__ (self, verbose=0, dry_run=0, force=0): >> sys.exit(0) >> UnixCCompiler.__init__ (self, verbose,dry_run, force) >> self.cc_exe = 'icc -fPIC ' >> compiler = self.cc_exe >> self.set_executables(compiler=compiler, >> compiler_so=compiler, >> compiler_cxx=compiler, >> linker_exe=compiler, >> linker_so=compiler + ' -shared -lstdc++') >> >> class IntelItaniumCCompiler(IntelCCompiler): >> compiler_type = 'intele' >> >> # On Itanium, the Intel Compiler used to be called ecc, let's search >> for >> # it (now it's also icc, so ecc is last in the search). >> for cc_exe in map(find_executable,['icc','ecc']): >> if cc_exe: >> break >> >> class IntelEM64TCCompiler(UnixCCompiler): >> """ A modified Intel x86_64 compiler compatible with a 64bit gcc built >> Python. >> """ >> compiler_type = 'intelem' >> cc_exe = 'icc -m64 -fPIC' >> cc_args = "-fPIC -openmp" >> def __init__ (self, verbose=0, dry_run=0, force=0): >> UnixCCompiler.__init__ (self, verbose,dry_run, force) >> self.cc_exe = 'icc -m64 -fPIC -openmp ' >> compiler = self.cc_exe >> self.set_executables(compiler=compiler, >> compiler_so=compiler, >> compiler_cxx=compiler, >> linker_exe=compiler, >> linker_so=compiler + ' -shared -lstdc++') >> ########################## >> LD_LIBRARY_PATH is : >> ######################### >> >> /opt/intel/Compiler/11.0/069/mkl/lib/em64t:/opt/scalasca-1.3.3/lib:/opt/PBS/lib:/opt/intel/mpi/lib64:/opt/maui/lib:/opt/jdk1.6.0_23/lib:/opt/intel/Compiler/11.0/069/ipp/em64t/sharedlib:/opt/intel/Compiler/11.0/069/mkl/lib/em64t:/opt/intel/Compiler/11.0/069/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/intel/Compiler/11.0/069/lib/intel64:/opt/intel/Compiler/11.0/069/ipp/em64t/sharedlib:/opt/intel/Compiler/11.0/069/mkl/lib/em64t:/opt/intel/Compiler/11.0/069/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/intel/Compiler/11.0/069/lib/intel64:/usr/local/lib >> ######################### >> >> -AKSHAR >> >> >> >> On Sat, Oct 22, 2011 at 11:32 AM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> >>> On Fri, Oct 21, 2011 at 11:49 PM, akshar bhosale < >>> akshar.bhosale at gmail.com> wrote: >>> >>>> Hi, >>>> >>>> unfortunately 1.6.1 also hangs on the same place. Can i move ahead with >>>> installing scipy? >>>> >>>> >>> Hmm. Well, give scipy a try, but it would be nice to know what the >>> problem is with einsum. I'm thinking compiler, GCC 4.1.2 might be a bit >>> old, but it could easily be something else. Can you give us more information >>> about your system? >>> >>> Chuck >>> >>>> >>>> On Sat, Oct 22, 2011 at 12:19 AM, Charles R Harris < >>>> charlesr.harris at gmail.com> wrote: >>>> >>>>> >>>>> >>>>> On Fri, Oct 21, 2011 at 5:25 AM, akshar bhosale < >>>>> akshar.bhosale at gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> does this mean that numpy is not configured properly or i can ignore >>>>>> this and go ahead with scipy installation? >>>>>> >>>>> >>>>> Scipy will probably work, but you should really install numpy 1.6.1 >>>>> instead of 1.6.0. >>>>> >>>>> >>>>> >>>>> Chuck >>>>> >>>>> _______________________________________________ >>>>> SciPy-Dev mailing list >>>>> SciPy-Dev at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> SciPy-Dev mailing list >>>> SciPy-Dev at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>>> >>>> >>> >>> _______________________________________________ >>> SciPy-Dev mailing list >>> SciPy-Dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/scipy-dev >>> >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nadavh at visionsense.com Sun Oct 23 09:21:43 2011 From: nadavh at visionsense.com (Nadav Horesh) Date: Sun, 23 Oct 2011 06:21:43 -0700 Subject: [Numpy-discussion] array iterators and cython prange Message-ID: <26FC23E7C398A64083C980D16001012D261BA8F04A@VA3DIAXVS361.RED001.local> I coded a bilateral filter class in cython based on numpy's neighborhood iterator (thanks to T. J's code example). I tried to parallel the code by replacing the standard loop (commented line 150) by a prange loop (line 151). The result are series of compilation errors mainly due the the use of iterators. Is there an *easy* way to work with nmpy iterators while the GIL is released? Platfoem: numpy 1.6.1 on python 2.7.2 and cython 0.15.1 system: gcc on linux Nadav -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: newbil.pyx URL: From njs at pobox.com Sun Oct 23 14:21:21 2011 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 23 Oct 2011 11:21:21 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? Message-ID: Hi all, I was surprised today to notice that Mark's NA mask support appears to have been merged into numpy master and is described in the draft release notes[1]. My surprise is because merging it to mainline without any discussion on the list seems to contradict what what Travis wrote in July, that it was being developed as an experiment and explicitly *not* intended to be merged without further discussion: "Basically, because there is not consensus and in fact a strong and reasonable opposition to specific points, Mark's NEP as proposed cannot be accepted in its entirety right now. However, I believe an implementation of his NEP is useful and will be instructive in resolving the issues and so I have instructed him to spend Enthought time on the implementation. Any changes that need to be made to the API before it is accepted into a released form of NumPy can still be made even after most of the implementation is completed as far as I understand it."[2] Can anyone explain what the plan is here? Is the idea to continue the discussion and rework the API while it is in master, delaying the next release for as long as it takes to achieve consensus? Or is there some mysterious git thing going on where "master" is actually an experimental branch and the real mainline development is happening somewhere else? Or something else I'm not thinking of? Please help me understand. -- Nathaniel [1] https://github.com/numpy/numpy/blob/master/doc/release/2.0.0-notes.rst [2] http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057579.html From pav at iki.fi Sun Oct 23 14:52:24 2011 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 23 Oct 2011 20:52:24 +0200 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: Message-ID: 23.10.2011 20:21, Nathaniel Smith kirjoitti: > I was surprised today to notice that Mark's NA mask support appears to > have been merged into numpy master and is described in the draft > release notes[1]. My surprise is because merging it to mainline > without any discussion on the list seems to contradict what what > Travis wrote in July, that it was being developed as an experiment and > explicitly *not* intended to be merged without further discussion: FWIW, the changes did not go in through a "back door", but through a pull request: https://github.com/numpy/numpy/pull/141 Whether issues with the API were resolved or not before merging, I don't know. (One can also ask whether it would be a good idea to forward noise from the pull requests to the ML.) [clip] > Can anyone explain what the plan is here? Is the idea to continue the > discussion and rework the API while it is in master, delaying the next > release for as long as it takes to achieve consensus? Or is there some > mysterious git thing going on where "master" is actually an > experimental branch and the real mainline development is happening > somewhere else? Or something else I'm not thinking of? Please help me > understand. No, master is supposed to be the integration branch with only finished stuff in it. -- Pauli Virtanen From matthew.brett at gmail.com Sun Oct 23 14:54:17 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 23 Oct 2011 11:54:17 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: Message-ID: Hi, On Sun, Oct 23, 2011 at 11:21 AM, Nathaniel Smith wrote: > Hi all, > > I was surprised today to notice that Mark's NA mask support appears to > have been merged into numpy master and is described in the draft > release notes[1]. My surprise is because merging it to mainline > without any discussion on the list seems to contradict what what > Travis wrote in July, that it was being developed as an experiment and > explicitly *not* intended to be merged without further discussion: > > "Basically, because there is not consensus and in fact a strong and > reasonable opposition to specific points, Mark's NEP as proposed > cannot be accepted in its entirety right now. However, ?I believe an > implementation of his NEP is useful and will be instructive in > resolving the issues and so I have instructed him to spend Enthought > time on the implementation. Any changes that need to be made to the > API before it is accepted into a released form of NumPy can still be > made even after most of the implementation is completed as far as I > understand it."[2] > > Can anyone explain what the plan is here? Is the idea to continue the > discussion and rework the API while it is in master, delaying the next > release for as long as it takes to achieve consensus? Or is there some > mysterious git thing going on where "master" is actually an > experimental branch and the real mainline development is happening > somewhere else? Or something else I'm not thinking of? Please help me > understand. I don't know about you, but watching the development from a distance it became increasingly clear to me that this would happen. I"m sure you've had the experience as I have, of mixing several desirable changes into the same set of commits, and it's hard work to avoid this. I imagine this is what happened with Mark's MA changes. The result is actually an extension of the problems of the original discussion, which is a feeling that we the community do not have a say in the development. I think this email might be a plea to the numpy steering group, and to Travis in particular, to see if we can use a discussion of this series of events to decide on a good way to proceed in future. See you, Matthew From charlesr.harris at gmail.com Sun Oct 23 15:53:56 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 23 Oct 2011 13:53:56 -0600 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: Message-ID: On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett wrote: > Hi, > > On Sun, Oct 23, 2011 at 11:21 AM, Nathaniel Smith wrote: > > Hi all, > > > > I was surprised today to notice that Mark's NA mask support appears to > > have been merged into numpy master and is described in the draft > > release notes[1]. My surprise is because merging it to mainline > > without any discussion on the list seems to contradict what what > > Travis wrote in July, that it was being developed as an experiment and > > explicitly *not* intended to be merged without further discussion: > > > > "Basically, because there is not consensus and in fact a strong and > > reasonable opposition to specific points, Mark's NEP as proposed > > cannot be accepted in its entirety right now. However, I believe an > > implementation of his NEP is useful and will be instructive in > > resolving the issues and so I have instructed him to spend Enthought > > time on the implementation. Any changes that need to be made to the > > API before it is accepted into a released form of NumPy can still be > > made even after most of the implementation is completed as far as I > > understand it."[2] > > > > Can anyone explain what the plan is here? Is the idea to continue the > > discussion and rework the API while it is in master, delaying the next > > release for as long as it takes to achieve consensus? Or is there some > > mysterious git thing going on where "master" is actually an > > experimental branch and the real mainline development is happening > > somewhere else? Or something else I'm not thinking of? Please help me > > understand. > > I don't know about you, but watching the development from a distance > it became increasingly clear to me that this would happen. I"m sure > you've had the experience as I have, of mixing several desirable > changes into the same set of commits, and it's hard work to avoid > this. I imagine this is what happened with Mark's MA changes. > > The result is actually an extension of the problems of the original > discussion, which is a feeling that we the community do not have a say > in the development. > > I think this email might be a plea to the numpy steering group, and to > Travis in particular, to see if we can use a discussion of this series > of events to decide on a good way to proceed in future. > > Oh come, people had plenty to say, you and Nathaniel in particular. Mark pointed to the pull request, anyone who was interested could comment on it, Benjamin Root did so, for instance. The fact things didn't go the way you wanted doesn't indicate insufficient discussion. And you are certainly welcome to put together an alternative and put up a pull request. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sun Oct 23 15:58:57 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 23 Oct 2011 12:58:57 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: Message-ID: Hi, On Sun, Oct 23, 2011 at 12:53 PM, Charles R Harris wrote: > > > On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Sun, Oct 23, 2011 at 11:21 AM, Nathaniel Smith wrote: >> > Hi all, >> > >> > I was surprised today to notice that Mark's NA mask support appears to >> > have been merged into numpy master and is described in the draft >> > release notes[1]. My surprise is because merging it to mainline >> > without any discussion on the list seems to contradict what what >> > Travis wrote in July, that it was being developed as an experiment and >> > explicitly *not* intended to be merged without further discussion: >> > >> > "Basically, because there is not consensus and in fact a strong and >> > reasonable opposition to specific points, Mark's NEP as proposed >> > cannot be accepted in its entirety right now. However, ?I believe an >> > implementation of his NEP is useful and will be instructive in >> > resolving the issues and so I have instructed him to spend Enthought >> > time on the implementation. Any changes that need to be made to the >> > API before it is accepted into a released form of NumPy can still be >> > made even after most of the implementation is completed as far as I >> > understand it."[2] >> > >> > Can anyone explain what the plan is here? Is the idea to continue the >> > discussion and rework the API while it is in master, delaying the next >> > release for as long as it takes to achieve consensus? Or is there some >> > mysterious git thing going on where "master" is actually an >> > experimental branch and the real mainline development is happening >> > somewhere else? Or something else I'm not thinking of? Please help me >> > understand. >> >> I don't know about you, but watching the development from a distance >> it became increasingly clear to me that this would happen. ?I"m sure >> you've had the experience as I have, of mixing several desirable >> changes into the same set of commits, and it's hard work to avoid >> this. ?I imagine this is what happened with Mark's MA changes. >> >> The result is actually an extension of the problems of the original >> discussion, which is a feeling that we the community do not have a say >> in the development. >> >> I think this email might be a plea to the numpy steering group, and to >> Travis in particular, to see if we can use a discussion of this series >> of events to decide on a good way to proceed in future. >> > > Oh come, people had plenty to say, you and Nathaniel in particular.? Mark > pointed to the pull request, anyone who was interested could comment on it, > Benjamin Root did so, for instance. The fact things didn't go the way you > wanted doesn't indicate insufficient discussion. And you are certainly > welcome to put together an alternative and put up a pull request. I was also guessing that something like this would be the reply to Nathaniel's post. I think this reply is rude because it implies some sort of sour-grapes from Nathaniel, when he is politely referring back to an explicit reassurance from Travis. I was trying to avoid this sort of thing by concentrating on thinking about what to do in future. Best, Matthew Matthew From charlesr.harris at gmail.com Sun Oct 23 16:05:24 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 23 Oct 2011 14:05:24 -0600 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: Message-ID: On Sun, Oct 23, 2011 at 12:21 PM, Nathaniel Smith wrote: > Hi all, > > I was surprised today to notice that Mark's NA mask support appears to > have been merged into numpy master and is described in the draft > release notes[1]. My surprise is because merging it to mainline > without any discussion on the list seems to contradict what what > Travis wrote in July, that it was being developed as an experiment and > explicitly *not* intended to be merged without further discussion: > > "Basically, because there is not consensus and in fact a strong and > reasonable opposition to specific points, Mark's NEP as proposed > cannot be accepted in its entirety right now. However, I believe an > implementation of his NEP is useful and will be instructive in > resolving the issues and so I have instructed him to spend Enthought > time on the implementation. Any changes that need to be made to the > API before it is accepted into a released form of NumPy can still be > made even after most of the implementation is completed as far as I > understand it."[2] > > Can anyone explain what the plan is here? Is the idea to continue the > discussion and rework the API while it is in master, delaying the next > release for as long as it takes to achieve consensus? Or is there some > mysterious git thing going on where "master" is actually an > experimental branch and the real mainline development is happening > somewhere else? Or something else I'm not thinking of? Please help me > understand. > > No, it's in and has been for a while. You should spend some time with it and make specific suggestion for improvement. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sun Oct 23 16:07:46 2011 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 23 Oct 2011 21:07:46 +0100 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: Message-ID: On Sun, Oct 23, 2011 at 20:58, Matthew Brett wrote: > Hi, > > On Sun, Oct 23, 2011 at 12:53 PM, Charles R Harris > wrote: >> >> >> On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett >> wrote: >>> >>> Hi, >>> >>> On Sun, Oct 23, 2011 at 11:21 AM, Nathaniel Smith wrote: >>> > Hi all, >>> > >>> > I was surprised today to notice that Mark's NA mask support appears to >>> > have been merged into numpy master and is described in the draft >>> > release notes[1]. My surprise is because merging it to mainline >>> > without any discussion on the list seems to contradict what what >>> > Travis wrote in July, that it was being developed as an experiment and >>> > explicitly *not* intended to be merged without further discussion: >>> > >>> > "Basically, because there is not consensus and in fact a strong and >>> > reasonable opposition to specific points, Mark's NEP as proposed >>> > cannot be accepted in its entirety right now. However, ?I believe an >>> > implementation of his NEP is useful and will be instructive in >>> > resolving the issues and so I have instructed him to spend Enthought >>> > time on the implementation. Any changes that need to be made to the >>> > API before it is accepted into a released form of NumPy can still be >>> > made even after most of the implementation is completed as far as I >>> > understand it."[2] >>> > >>> > Can anyone explain what the plan is here? Is the idea to continue the >>> > discussion and rework the API while it is in master, delaying the next >>> > release for as long as it takes to achieve consensus? Or is there some >>> > mysterious git thing going on where "master" is actually an >>> > experimental branch and the real mainline development is happening >>> > somewhere else? Or something else I'm not thinking of? Please help me >>> > understand. >>> >>> I don't know about you, but watching the development from a distance >>> it became increasingly clear to me that this would happen. ?I"m sure >>> you've had the experience as I have, of mixing several desirable >>> changes into the same set of commits, and it's hard work to avoid >>> this. ?I imagine this is what happened with Mark's MA changes. >>> >>> The result is actually an extension of the problems of the original >>> discussion, which is a feeling that we the community do not have a say >>> in the development. >>> >>> I think this email might be a plea to the numpy steering group, and to >>> Travis in particular, to see if we can use a discussion of this series >>> of events to decide on a good way to proceed in future. >>> >> >> Oh come, people had plenty to say, you and Nathaniel in particular.? Mark >> pointed to the pull request, anyone who was interested could comment on it, >> Benjamin Root did so, for instance. The fact things didn't go the way you >> wanted doesn't indicate insufficient discussion. And you are certainly >> welcome to put together an alternative and put up a pull request. > > I was also guessing that something like this would be the reply to > Nathaniel's post. But it wasn't. It was a reply to your message. > I think this reply is rude because it implies some sort of sour-grapes > from Nathaniel, when he is politely referring back to an explicit > reassurance from Travis. What Travis assured did happen, just on the pull request (on which everyone's input was requested and where most "should this be merged?" discussions are *meant* to happen) rather than on the mailing list. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From matthew.brett at gmail.com Sun Oct 23 16:12:41 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 23 Oct 2011 13:12:41 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: Message-ID: Hi, On Sun, Oct 23, 2011 at 1:07 PM, Robert Kern wrote: > On Sun, Oct 23, 2011 at 20:58, Matthew Brett wrote: >> Hi, >> >> On Sun, Oct 23, 2011 at 12:53 PM, Charles R Harris >> wrote: >>> >>> >>> On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett >>> wrote: >>>> >>>> Hi, >>>> >>>> On Sun, Oct 23, 2011 at 11:21 AM, Nathaniel Smith wrote: >>>> > Hi all, >>>> > >>>> > I was surprised today to notice that Mark's NA mask support appears to >>>> > have been merged into numpy master and is described in the draft >>>> > release notes[1]. My surprise is because merging it to mainline >>>> > without any discussion on the list seems to contradict what what >>>> > Travis wrote in July, that it was being developed as an experiment and >>>> > explicitly *not* intended to be merged without further discussion: >>>> > >>>> > "Basically, because there is not consensus and in fact a strong and >>>> > reasonable opposition to specific points, Mark's NEP as proposed >>>> > cannot be accepted in its entirety right now. However, ?I believe an >>>> > implementation of his NEP is useful and will be instructive in >>>> > resolving the issues and so I have instructed him to spend Enthought >>>> > time on the implementation. Any changes that need to be made to the >>>> > API before it is accepted into a released form of NumPy can still be >>>> > made even after most of the implementation is completed as far as I >>>> > understand it."[2] >>>> > >>>> > Can anyone explain what the plan is here? Is the idea to continue the >>>> > discussion and rework the API while it is in master, delaying the next >>>> > release for as long as it takes to achieve consensus? Or is there some >>>> > mysterious git thing going on where "master" is actually an >>>> > experimental branch and the real mainline development is happening >>>> > somewhere else? Or something else I'm not thinking of? Please help me >>>> > understand. >>>> >>>> I don't know about you, but watching the development from a distance >>>> it became increasingly clear to me that this would happen. ?I"m sure >>>> you've had the experience as I have, of mixing several desirable >>>> changes into the same set of commits, and it's hard work to avoid >>>> this. ?I imagine this is what happened with Mark's MA changes. >>>> >>>> The result is actually an extension of the problems of the original >>>> discussion, which is a feeling that we the community do not have a say >>>> in the development. >>>> >>>> I think this email might be a plea to the numpy steering group, and to >>>> Travis in particular, to see if we can use a discussion of this series >>>> of events to decide on a good way to proceed in future. >>>> >>> >>> Oh come, people had plenty to say, you and Nathaniel in particular.? Mark >>> pointed to the pull request, anyone who was interested could comment on it, >>> Benjamin Root did so, for instance. The fact things didn't go the way you >>> wanted doesn't indicate insufficient discussion. And you are certainly >>> welcome to put together an alternative and put up a pull request. >> >> I was also guessing that something like this would be the reply to >> Nathaniel's post. > > But it wasn't. It was a reply to your message. If you read the message again I think you will see that, although it is addressed to me, it is referring to Nathaniel's question which was, 'Why was this not discussed as promised'. My post was 'This was obviously going to happen and that is a problem, do you all agree and what can we do about it?'. >> I think this reply is rude because it implies some sort of sour-grapes >> from Nathaniel, when he is politely referring back to an explicit >> reassurance from Travis. > > What Travis assured did happen, just on the pull request (on which > everyone's input was requested and where most "should this be merged?" > discussions are *meant* to happen) rather than on the mailing list. It just isn't reasonable to ask for high-level API discussions on the pull-request in this situation. Unless Travis tells me he did mean that, I can only assume that he didn't and he meant that we would revisit the high-level mailing list discussions - on the mailing list. Best, Matthew From sransom at nrao.edu Sun Oct 23 16:16:46 2011 From: sransom at nrao.edu (Scott Ransom) Date: Sun, 23 Oct 2011 16:16:46 -0400 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: Message-ID: <4EA4762E.9000406@nrao.edu> On 10/23/2011 04:07 PM, Robert Kern wrote: > On Sun, Oct 23, 2011 at 20:58, Matthew Brett wrote: >> Hi, >> >> On Sun, Oct 23, 2011 at 12:53 PM, Charles R Harris >> wrote: >>> >>> >>> On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett >>> wrote: >>>> >>>> Hi, >>>> >>>> On Sun, Oct 23, 2011 at 11:21 AM, Nathaniel Smith wrote: >>>>> Hi all, >>>>> >>>>> I was surprised today to notice that Mark's NA mask support appears to >>>>> have been merged into numpy master and is described in the draft >>>>> release notes[1]. My surprise is because merging it to mainline >>>>> without any discussion on the list seems to contradict what what >>>>> Travis wrote in July, that it was being developed as an experiment and >>>>> explicitly *not* intended to be merged without further discussion: >>>>> >>>>> "Basically, because there is not consensus and in fact a strong and >>>>> reasonable opposition to specific points, Mark's NEP as proposed >>>>> cannot be accepted in its entirety right now. However, I believe an >>>>> implementation of his NEP is useful and will be instructive in >>>>> resolving the issues and so I have instructed him to spend Enthought >>>>> time on the implementation. Any changes that need to be made to the >>>>> API before it is accepted into a released form of NumPy can still be >>>>> made even after most of the implementation is completed as far as I >>>>> understand it."[2] >>>>> >>>>> Can anyone explain what the plan is here? Is the idea to continue the >>>>> discussion and rework the API while it is in master, delaying the next >>>>> release for as long as it takes to achieve consensus? Or is there some >>>>> mysterious git thing going on where "master" is actually an >>>>> experimental branch and the real mainline development is happening >>>>> somewhere else? Or something else I'm not thinking of? Please help me >>>>> understand. >>>> >>>> I don't know about you, but watching the development from a distance >>>> it became increasingly clear to me that this would happen. I"m sure >>>> you've had the experience as I have, of mixing several desirable >>>> changes into the same set of commits, and it's hard work to avoid >>>> this. I imagine this is what happened with Mark's MA changes. >>>> >>>> The result is actually an extension of the problems of the original >>>> discussion, which is a feeling that we the community do not have a say >>>> in the development. >>>> >>>> I think this email might be a plea to the numpy steering group, and to >>>> Travis in particular, to see if we can use a discussion of this series >>>> of events to decide on a good way to proceed in future. >>>> >>> >>> Oh come, people had plenty to say, you and Nathaniel in particular. Mark >>> pointed to the pull request, anyone who was interested could comment on it, >>> Benjamin Root did so, for instance. The fact things didn't go the way you >>> wanted doesn't indicate insufficient discussion. And you are certainly >>> welcome to put together an alternative and put up a pull request. >> >> I was also guessing that something like this would be the reply to >> Nathaniel's post. > > But it wasn't. It was a reply to your message. > >> I think this reply is rude because it implies some sort of sour-grapes >> from Nathaniel, when he is politely referring back to an explicit >> reassurance from Travis. > > What Travis assured did happen, just on the pull request (on which > everyone's input was requested and where most "should this be merged?" > discussions are *meant* to happen) rather than on the mailing list. Except that for a project with a large user community (like numpy), you will _not_ get the feedback you are looking for on github pull-request pages. That's because most users do not look at detailed developer related things like pull requests. But they do read the mailing list. I don't use these features so I don't have a dog in this fight. But potentially controversial changes really should be discussed on the mailing list rather than on pull requests (and yes, I know that there was a lot of discussion about this stuff some months ago). Scott -- Scott M. Ransom Address: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: sransom at nrao.edu Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From njs at pobox.com Sun Oct 23 16:49:43 2011 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 23 Oct 2011 13:49:43 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: Message-ID: On Sun, Oct 23, 2011 at 12:53 PM, Charles R Harris wrote: > On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett > wrote: >> I think this email might be a plea to the numpy steering group, and to >> Travis in particular, to see if we can use a discussion of this series >> of events to decide on a good way to proceed in future. > > Oh come, people had plenty to say, you and Nathaniel in particular.? Mark > pointed to the pull request, anyone who was interested could comment on it, Ah, this helps answer my initial question -- I can see how you might have thought things were more resolved if you thought that we were aware of the pull request and chose not to participate. That's a reasonable source of confusion. But I (and presumably others) were unaware of the pull request, because it turns out that actually Mark did *not* point to the pull request, at least in email to either me or numpy-discussion. As far as I can tell, the first time that pull request has ever been mentioned on the list is in Pauli's email today. (I did worry I might have missed it, so I just double-checked the archives for August 18-August 27, which is the time period the pull request was open, and couldn't find anything there.) (Also, for the record, I'd ask that next time you want to make sure that there has been sufficient discussion on a controversial feature that has "strong and reasonable opposition", you make more of an effort to make sure that the relevant stakeholders are aware...?) > Benjamin Root did so, for instance. The fact things didn't go the way you > wanted doesn't indicate insufficient discussion. And you are certainly > welcome to put together an alternative and put up a pull request. In the interests of not turning this into a game of procedural brinksmanship, can we agree that the point of pull requests and such is to make sure that code which ends up in numpy releases generally matches what the community wants? Obviously the community has not reached a consensus on this code and API, so I'll prepare a pull request to temporarily revert the change, and we can work from there. -- Nathaniel From ben.root at ou.edu Sun Oct 23 17:28:01 2011 From: ben.root at ou.edu (Benjamin Root) Date: Sun, 23 Oct 2011 16:28:01 -0500 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: Message-ID: On Sunday, October 23, 2011, Nathaniel Smith wrote: > On Sun, Oct 23, 2011 at 12:53 PM, Charles R Harris > wrote: >> On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett >> wrote: >>> I think this email might be a plea to the numpy steering group, and to >>> Travis in particular, to see if we can use a discussion of this series >>> of events to decide on a good way to proceed in future. >> >> Oh come, people had plenty to say, you and Nathaniel in particular. Mark >> pointed to the pull request, anyone who was interested could comment on it, > > Ah, this helps answer my initial question -- I can see how you might > have thought things were more resolved if you thought that we were > aware of the pull request and chose not to participate. That's a > reasonable source of confusion. > > But I (and presumably others) were unaware of the pull request, > because it turns out that actually Mark did *not* point to the pull > request, at least in email to either me or numpy-discussion. As far as > I can tell, the first time that pull request has ever been mentioned > on the list is in Pauli's email today. (I did worry I might have > missed it, so I just double-checked the archives for August 18-August > 27, which is the time period the pull request was open, and couldn't > find anything there.) > > (Also, for the record, I'd ask that next time you want to make sure > that there has been sufficient discussion on a controversial feature > that has "strong and reasonable opposition", you make more of an > effort to make sure that the relevant stakeholders are aware...?) > >> Benjamin Root did so, for instance. The fact things didn't go the way you >> wanted doesn't indicate insufficient discussion. And you are certainly >> welcome to put together an alternative and put up a pull request. > > In the interests of not turning this into a game of procedural > brinksmanship, can we agree that the point of pull requests and such > is to make sure that code which ends up in numpy releases generally > matches what the community wants? Obviously the community has not > reached a consensus on this code and API, so I'll prepare a pull > request to temporarily revert the change, and we can work from there. > > -- Nathaniel > The discussion started on mark's branches, which was referred to several times in emails (that's how I started). When it reached a particular level of maturity, a pull request was made and additional work went into it. The initial discussion happened for quite a while. Plus, my understanding is that it isnt the full Nep, but the core parts (but I haven't checked in a while). Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Sun Oct 23 17:29:29 2011 From: efiring at hawaii.edu (Eric Firing) Date: Sun, 23 Oct 2011 11:29:29 -1000 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: Message-ID: <4EA48739.4010206@hawaii.edu> On 10/23/2011 10:49 AM, Nathaniel Smith wrote: > But I (and presumably others) were unaware of the pull request, > because it turns out that actually Mark did*not* point to the pull > request, at least in email to either me or numpy-discussion. As far as > I can tell, the first time that pull request has ever been mentioned > on the list is in Pauli's email today. (I did worry I might have > missed it, so I just double-checked the archives for August 18-August > 27, which is the time period the pull request was open, and couldn't > find anything there.) Ideally, Mark's message announcing that his branch was ready for testing (a message that started a thread of constructive comment) would have mentioned the pull request: http://www.mail-archive.com/numpy-discussion at scipy.org/msg33151.html Ultimately, though, the numpy core developers must decide what goes in and what does not. Consensus is desirable but may not always be possible or optimal, especially if "consensus" is interpreted as "unanimity". There is a risk in deciding to accept a major change, but it is mitigated by the ability to make future changes, and it is a risk that must be taken if progress is to be made. As a numpy user, I was pleased to see Travis make the decision that Mark should get on with the coding, and I was pleased to see Charles make the decision to merge the pull request. Eric From charlesr.harris at gmail.com Sun Oct 23 17:42:42 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 23 Oct 2011 15:42:42 -0600 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: Message-ID: On Sun, Oct 23, 2011 at 3:28 PM, Benjamin Root wrote: > > > On Sunday, October 23, 2011, Nathaniel Smith wrote: > > On Sun, Oct 23, 2011 at 12:53 PM, Charles R Harris > > wrote: > >> On Sun, Oct 23, 2011 at 12:54 PM, Matthew Brett < > matthew.brett at gmail.com> > >> wrote: > >>> I think this email might be a plea to the numpy steering group, and to > >>> Travis in particular, to see if we can use a discussion of this series > >>> of events to decide on a good way to proceed in future. > >> > >> Oh come, people had plenty to say, you and Nathaniel in particular. > Mark > >> pointed to the pull request, anyone who was interested could comment on > it, > > > > Ah, this helps answer my initial question -- I can see how you might > > have thought things were more resolved if you thought that we were > > aware of the pull request and chose not to participate. That's a > > reasonable source of confusion. > > > > But I (and presumably others) were unaware of the pull request, > > because it turns out that actually Mark did *not* point to the pull > > request, at least in email to either me or numpy-discussion. As far as > > I can tell, the first time that pull request has ever been mentioned > > on the list is in Pauli's email today. (I did worry I might have > > missed it, so I just double-checked the archives for August 18-August > > 27, which is the time period the pull request was open, and couldn't > > find anything there.) > > > > (Also, for the record, I'd ask that next time you want to make sure > > that there has been sufficient discussion on a controversial feature > > that has "strong and reasonable opposition", you make more of an > > effort to make sure that the relevant stakeholders are aware...?) > > > >> Benjamin Root did so, for instance. The fact things didn't go the way > you > >> wanted doesn't indicate insufficient discussion. And you are certainly > >> welcome to put together an alternative and put up a pull request. > > > > In the interests of not turning this into a game of procedural > > brinksmanship, can we agree that the point of pull requests and such > > is to make sure that code which ends up in numpy releases generally > > matches what the community wants? Obviously the community has not > > reached a consensus on this code and API, so I'll prepare a pull > > request to temporarily revert the change, and we can work from there. > > > > -- Nathaniel > > > > The discussion started on mark's branches, which was referred to several > times in emails (that's how I started). When it reached a particular level > of maturity, a pull request was made and additional work went into it. The > initial discussion happened for quite a while. > > Plus, my understanding is that it isnt the full Nep, but the core parts > (but I haven't checked in a while). > > In its current state, it is a working implementation that can be used to explore the API. Bit patterns are missing and the masks are handled at the iterator level rather than in the low level ufunc loops, so it isn't particularly fast. IIRC, Mark was careful to leave some hooks for further development and also set things up so that in the future masks could be adapted to allow different mask values with different interpretations. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Oct 23 18:34:27 2011 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 23 Oct 2011 15:34:27 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: <4EA48739.4010206@hawaii.edu> References: <4EA48739.4010206@hawaii.edu> Message-ID: On Sun, Oct 23, 2011 at 2:29 PM, Eric Firing wrote: > Ultimately, though, the numpy core developers must decide what goes in > and what does not. ?Consensus is desirable but may not always be > possible or optimal, especially if "consensus" is interpreted as > "unanimity". ?There is a risk in deciding to accept a major change, but > it is mitigated by the ability to make future changes, and it is a risk > that must be taken if progress is to be made. ?As a numpy user, I was > pleased to see Travis make the decision that Mark should get on with the > coding, and I was pleased to see Charles make the decision to merge the > pull request. Well, let's not jump to conclusions -- this is why I wrote an email asking questions in the first place :-). Consensus certainly does not mean unanimity, but yes, of course, sometimes disagreements are irreconcilable. As a benevolent dictator[1] on other projects I've been stuck dealing with some of these myself. But of the two core numpy developers who have been most involved in this, Charles has just stated that he thought there had been more discussion than had actually occurred, and Travis described a "reasonable opposition", so it's not at all clear to me that the core developers have decided that no consensus is possible and they simply have to step in. (And in general, irreconcilable differences are quite rare in FOSS projects... e.g., I remember the Subversion folks set up a voting procedure to handle these cases, and then the only time they used it in like a 5 year period was to settle an argument about code formatting. Insisting on consensus really does mostly work, even though it does often take longer than one would like. And in this case I do think we can come up with an API that will make everyone happy, but that Mark's current API probably can't be incrementally evolved to become that API.) So if there's been an executive decision then I can live with it, but I'd like to see someone say that before I assume it's true. It's just as likely that there was confusion, or Charles jumped the gun, or whatever, and that consensus is still useful and desired in this case. I hope so. [1] https://secure.wikimedia.org/wikipedia/en/wiki/Benevolent_Dictator_For_Life -- Nathaniel From efiring at hawaii.edu Sun Oct 23 20:07:00 2011 From: efiring at hawaii.edu (Eric Firing) Date: Sun, 23 Oct 2011 14:07:00 -1000 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> Message-ID: <4EA4AC24.5060805@hawaii.edu> On 10/23/2011 12:34 PM, Nathaniel Smith wrote: > like. And in this case I do think we can come up with an API that will > make everyone happy, but that Mark's current API probably can't be > incrementally evolved to become that API.) > No one could object to coming up with an API that makes everyone happy, provided that it actually gets coded up, tested, and is found to be fast and maintainable. When you say the API probably can't be evolved, do you mean that the underlying implementation also has to be redone? And if so, who will do it, and when? Eric From wesmckinn at gmail.com Mon Oct 24 01:23:51 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 24 Oct 2011 01:23:51 -0400 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: <4EA4AC24.5060805@hawaii.edu> References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> Message-ID: On Sun, Oct 23, 2011 at 8:07 PM, Eric Firing wrote: > On 10/23/2011 12:34 PM, Nathaniel Smith wrote: > >> like. And in this case I do think we can come up with an API that will >> make everyone happy, but that Mark's current API probably can't be >> incrementally evolved to become that API.) >> > > No one could object to coming up with an API that makes everyone happy, > provided that it actually gets coded up, tested, and is found to be fast > and maintainable. ?When you say the API probably can't be evolved, do > you mean that the underlying implementation also has to be redone? ?And > if so, who will do it, and when? > > Eric > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > I personally am a bit apprehensive as I am worried about the masked array abstraction "leaking" through to users of pandas, something which I simply will not accept (why I decided against using numpy.ma early on, that + performance problems). Basically if having an understanding of masked arrays is a prerequisite for using pandas, the whole thing is DOA to me as it undermines the usability arguments I've been making about switching to Python (from R) for data analysis and statistical computing. Performance is also a concern, but based on prior discussions it seems a great deal can be done there. - Wes From nadavh at visionsense.com Mon Oct 24 01:57:28 2011 From: nadavh at visionsense.com (Nadav Horesh) Date: Sun, 23 Oct 2011 22:57:28 -0700 Subject: [Numpy-discussion] neighborhood iterator speed Message-ID: <26FC23E7C398A64083C980D16001012D261BA8F04D@VA3DIAXVS361.RED001.local> I am trying to replace an old code (biliteral filter) that rely on ndimage.generic_filter with the neighborhood iterator. In the old code, the generic_filter generates a contiguous copy of the neighborhood, thus the (cython) code could use C loop to iterate over the neighbourhood copy. In the new code version the PyArrayNeighborhoodIter_Next must be called to retrieve every neighbourhood item. The results of rough benchmarking to compare bilateral filtering on a 1000x1000 array: Old code (ndimage.generic_filter): 16.5 sec New code (neighborhood iteration): 60.5 sec New code with PyArrayNeighborhoodIter_Next omitted: 1.5 sec * The last benchmark is not "real" since the omitted call is a must. It just demonstrates the iterator overhead. * I assune the main overhead in the old code is the python function callback process. There are instructions in the manual how to wrap a C code for a faster callback, but I rather use the neighbourhood iterator as I consider it as more generic. If the PyArrayNeighborhoodIter_Reset could (optionally) copy the relevant data (as the generic_filter does) it would provide a major speed up in many cases. Nadav From cournape at gmail.com Mon Oct 24 03:37:53 2011 From: cournape at gmail.com (David Cournapeau) Date: Mon, 24 Oct 2011 08:37:53 +0100 Subject: [Numpy-discussion] neighborhood iterator speed In-Reply-To: <26FC23E7C398A64083C980D16001012D261BA8F04D@VA3DIAXVS361.RED001.local> References: <26FC23E7C398A64083C980D16001012D261BA8F04D@VA3DIAXVS361.RED001.local> Message-ID: On Mon, Oct 24, 2011 at 6:57 AM, Nadav Horesh wrote: > I am trying to replace an old code (biliteral filter) that rely on ndimage.generic_filter with the neighborhood iterator. In the old code, the generic_filter generates a contiguous copy of the neighborhood, thus the (cython) code could use C loop to iterate over the neighbourhood copy. In the new code version the ?PyArrayNeighborhoodIter_Next must be called to retrieve every neighbourhood item. The results of rough benchmarking to compare bilateral filtering on a 1000x1000 array: > Old code (ndimage.generic_filter): ?16.5 sec > New code (neighborhood iteration): ?60.5 sec > New code with PyArrayNeighborhoodIter_Next ?omitted: 1.5 sec > > * The last benchmark is not "real" since the omitted call is a must. It just demonstrates the iterator overhead. > * I assune the main overhead in the old code is the python function callback process. There are instructions in the manual how to wrap a C code for a faster callback, but I rather use the neighbourhood iterator as I consider it as more generic. > I am afraid the cost is unavoidable: you are really trading cpu for memory. When using PyArrayNeighborhood_Next, there is a loop with a condiational within, and I don't think those can easily be avoided without losing genericity. Which mode are you using when creating the neighborhood iterator ? There used to be a PyArrayNeightborhoodIter_Next2d, I don't know why I commented out. You could try to see if you can get faster. > If the PyArrayNeighborhoodIter_Reset could (optionally) copy the relevant data (as the generic_filter does) it would provide a major speed up in many cases. Optionally copying may be an option, but it would make more sense to do it at creation time than during reset, no ? Something like a binary and with the current mode flag, cheers, David From nadavh at visionsense.com Mon Oct 24 05:48:10 2011 From: nadavh at visionsense.com (Nadav Horesh) Date: Mon, 24 Oct 2011 02:48:10 -0700 Subject: [Numpy-discussion] neighborhood iterator speed In-Reply-To: References: <26FC23E7C398A64083C980D16001012D261BA8F04D@VA3DIAXVS361.RED001.local> Message-ID: <26FC23E7C398A64083C980D16001012D261B9F1C27@VA3DIAXVS361.RED001.local> * Iterator mode: Mirror. Does the mode make a huge difference? * I can not find any reference to PyArrayNeightborhoodIter_Next2d, where can I find it? * I think that making a copy on reset is (maybe in addition to the creation), since there is a reset for every change of the parent iterator, and after this change, the neighborhood can be determined. * What do you think about the following idea? * A neighbourhood iterator generator that accepts also a buffer to copy in the neighbourhood. * A reset function that would refill the buffer after each parent iterator modification Nadav -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of David Cournapeau Sent: Monday, October 24, 2011 9:38 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] neighborhood iterator speed On Mon, Oct 24, 2011 at 6:57 AM, Nadav Horesh wrote: > I am trying to replace an old code (biliteral filter) that rely on ndimage.generic_filter with the neighborhood iterator. In the old code, the generic_filter generates a contiguous copy of the neighborhood, thus the (cython) code could use C loop to iterate over the neighbourhood copy. In the new code version the PyArrayNeighborhoodIter_Next must be called to retrieve every neighbourhood item. The results of rough benchmarking to compare bilateral filtering on a 1000x1000 array: > Old code (ndimage.generic_filter): 16.5 sec > New code (neighborhood iteration): 60.5 sec > New code with PyArrayNeighborhoodIter_Next omitted: 1.5 sec > > * The last benchmark is not "real" since the omitted call is a must. It just demonstrates the iterator overhead. > * I assune the main overhead in the old code is the python function callback process. There are instructions in the manual how to wrap a C code for a faster callback, but I rather use the neighbourhood iterator as I consider it as more generic. > I am afraid the cost is unavoidable: you are really trading cpu for memory. When using PyArrayNeighborhood_Next, there is a loop with a condiational within, and I don't think those can easily be avoided without losing genericity. Which mode are you using when creating the neighborhood iterator ? There used to be a PyArrayNeightborhoodIter_Next2d, I don't know why I commented out. You could try to see if you can get faster. > If the PyArrayNeighborhoodIter_Reset could (optionally) copy the relevant data (as the generic_filter does) it would provide a major speed up in many cases. Optionally copying may be an option, but it would make more sense to do it at creation time than during reset, no ? Something like a binary and with the current mode flag, cheers, David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion __________ Information from ESET NOD32 Antivirus, version of virus signature database 4628 (20091122) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com From xscript at gmx.net Mon Oct 24 07:19:28 2011 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Mon, 24 Oct 2011 13:19:28 +0200 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: (Nathaniel Smith's message of "Sun, 23 Oct 2011 11:21:21 -0700") References: Message-ID: <87fwiiljr2.fsf@ginnungagap.bsc.es> Nathaniel Smith writes: [...] > Is the idea to continue the discussion and rework the API while it is in > master, delaying the next release for as long as it takes to achieve > consensus? Well, for those who missed it, I think the first thing to do should be to carefully read and discuss the contents of: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From cournape at gmail.com Mon Oct 24 07:56:42 2011 From: cournape at gmail.com (David Cournapeau) Date: Mon, 24 Oct 2011 12:56:42 +0100 Subject: [Numpy-discussion] neighborhood iterator speed In-Reply-To: <26FC23E7C398A64083C980D16001012D261B9F1C27@VA3DIAXVS361.RED001.local> References: <26FC23E7C398A64083C980D16001012D261BA8F04D@VA3DIAXVS361.RED001.local> <26FC23E7C398A64083C980D16001012D261B9F1C27@VA3DIAXVS361.RED001.local> Message-ID: On Mon, Oct 24, 2011 at 10:48 AM, Nadav Horesh wrote: > * Iterator mode: Mirror. Does the mode make a huge difference? It could, at least in principle. The underlying translate function is called often enough that a slight different can be significant. > * I can not find any reference to PyArrayNeightborhoodIter_Next2d, where can I find it? I think it would look like: static NPY_INLINE int PyArrayNeighborhoodIter_Next2d(PyArrayNeighborhoodIterObject* iter) { _PyArrayNeighborhoodIter_IncrCoord2d(iter); iter->dataptr = iter->translate((PyArrayIterObject*)iter, iter->coordinates); return 0; } The ...IncrCoord2 macro avoid one loop, which may be useful (or not). The big issue here is the translate method call that cannot be inlined because of the "polymorphism" of neighborhood iterator. But the only way to avoid this would be to have many different iterators so that the underlying translate function is known. Copying the data makes the call to translate unnecessary (but adds the penalty of one more conditional on every PyArrayNeighborhood_Next. > * I think that making a copy on reset is (maybe in addition to the creation), since there is a reset for every change of the parent iterator, and after this change, the neighborhood can be determined. you're right of course, I forgot about the parent iterator. > * What do you think about the following idea? > ? ?* A neighbourhood iterator generator that accepts also a buffer to copy in the neighbourhood. > ? ?* A reset function that would refill the buffer after each parent iterator modification The issue with giving the buffer is that one needs to be carefull about the size and all. What's your usecase to pass the buffer ? David From nadavh at visionsense.com Mon Oct 24 08:23:41 2011 From: nadavh at visionsense.com (Nadav Horesh) Date: Mon, 24 Oct 2011 05:23:41 -0700 Subject: [Numpy-discussion] neighborhood iterator speed In-Reply-To: References: <26FC23E7C398A64083C980D16001012D261BA8F04D@VA3DIAXVS361.RED001.local> <26FC23E7C398A64083C980D16001012D261B9F1C27@VA3DIAXVS361.RED001.local> Message-ID: <26FC23E7C398A64083C980D16001012D261B9F1C6B@VA3DIAXVS361.RED001.local> * I'll try to implement the 2D iterator as far as far as my programming expertise goes. It might take few days. * There is a risk in providing a buffer pointer, and for my (and probably most) use cases it is better for the iterator constructor to provide it. I was thinking about the possibility to give the iterator a shared memory pointer, to open a door for multiprocessing. Maybe it is better instead to provide a contiguous ndarray object to enable a sanity check. Nadav. -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of David Cournapeau Sent: Monday, October 24, 2011 1:57 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] neighborhood iterator speed On Mon, Oct 24, 2011 at 10:48 AM, Nadav Horesh wrote: > * Iterator mode: Mirror. Does the mode make a huge difference? It could, at least in principle. The underlying translate function is called often enough that a slight different can be significant. > * I can not find any reference to PyArrayNeightborhoodIter_Next2d, where can I find it? I think it would look like: static NPY_INLINE int PyArrayNeighborhoodIter_Next2d(PyArrayNeighborhoodIterObject* iter) { _PyArrayNeighborhoodIter_IncrCoord2d(iter); iter->dataptr = iter->translate((PyArrayIterObject*)iter, iter->coordinates); return 0; } The ...IncrCoord2 macro avoid one loop, which may be useful (or not). The big issue here is the translate method call that cannot be inlined because of the "polymorphism" of neighborhood iterator. But the only way to avoid this would be to have many different iterators so that the underlying translate function is known. Copying the data makes the call to translate unnecessary (but adds the penalty of one more conditional on every PyArrayNeighborhood_Next. > * I think that making a copy on reset is (maybe in addition to the creation), since there is a reset for every change of the parent iterator, and after this change, the neighborhood can be determined. you're right of course, I forgot about the parent iterator. > * What do you think about the following idea? > * A neighbourhood iterator generator that accepts also a buffer to copy in the neighbourhood. > * A reset function that would refill the buffer after each parent iterator modification The issue with giving the buffer is that one needs to be carefull about the size and all. What's your usecase to pass the buffer ? David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion __________ Information from ESET NOD32 Antivirus, version of virus signature database 4628 (20091122) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com From opossumnano at gmail.com Mon Oct 24 09:57:32 2011 From: opossumnano at gmail.com (Tiziano Zito) Date: Mon, 24 Oct 2011 15:57:32 +0200 Subject: [Numpy-discussion] ANN: MDP 3.2 released! Message-ID: <20111024135732.GE28362@tulpenbaum.cognition.tu-berlin.de> We are glad to announce release 3.2 of the Modular toolkit for Data Processing (MDP). MDP is a Python library of widely used data processing algorithms that can be combined according to a pipeline analogy to build more complex data processing software. The base of available algorithms includes signal processing methods (Principal Component Analysis, Independent Component Analysis, Slow Feature Analysis), manifold learning methods ([Hessian] Locally Linear Embedding), several classifiers, probabilistic methods (Factor Analysis, RBM), data pre-processing methods, and many others. What's new in version 3.2? -------------------------- - improved sklearn wrappers - update sklearn, shogun, and pp wrappers to new versions - do not leave temporary files around after testing - refactoring and cleaning up of HTML exporting features - improve export of signature and doc-string to public methods - fixed and updated FastICANode to closely resemble the original Matlab version (thanks to Ben Willmore) - support for new numpy version - new NeuralGasNode (thanks to Michael Schmuker) - several bug fixes and improvements We recommend all users to upgrade. Resources --------- Download: http://sourceforge.net/projects/mdp-toolkit/files Homepage: http://mdp-toolkit.sourceforge.net Mailing list: http://lists.sourceforge.net/mailman/listinfo/mdp-toolkit-users Acknowledgments --------------- We thank the contributors to this release: Michael Schmuker, Ben Willmore. The MDP developers, Pietro Berkes Zbigniew J?drzejewski-Szmek Rike-Benjamin Schuppner Niko Wilbert Tiziano Zito From cournape at gmail.com Mon Oct 24 10:04:27 2011 From: cournape at gmail.com (David Cournapeau) Date: Mon, 24 Oct 2011 15:04:27 +0100 Subject: [Numpy-discussion] neighborhood iterator speed In-Reply-To: <26FC23E7C398A64083C980D16001012D261B9F1C6B@VA3DIAXVS361.RED001.local> References: <26FC23E7C398A64083C980D16001012D261BA8F04D@VA3DIAXVS361.RED001.local> <26FC23E7C398A64083C980D16001012D261B9F1C27@VA3DIAXVS361.RED001.local> <26FC23E7C398A64083C980D16001012D261B9F1C6B@VA3DIAXVS361.RED001.local> Message-ID: On Mon, Oct 24, 2011 at 1:23 PM, Nadav Horesh wrote: > * I'll try to implement the 2D iterator as far as far as my programming expertise goes. It might take few days. I am pretty sure the code is in the history, if you are patient enough to look for it in git history. I can't remember why I removed it (maybe because it was not faster ?). > > * There is a risk in providing a buffer pointer, and for my (and probably most) use cases it is better for the iterator constructor to provide it. I was thinking about the possibility to give the iterator a shared memory pointer, to open a door for multiprocessing. Maybe it is better instead to provide a contiguous ndarray object to enable a sanity check. One could ask for an optional buffer (if NULL -> auto-allocation). But I would need a more detailed explanation about what you are trying to do to warrant changing the API here. cheers, David From charlesr.harris at gmail.com Mon Oct 24 10:40:40 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 24 Oct 2011 08:40:40 -0600 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> Message-ID: On Sun, Oct 23, 2011 at 11:23 PM, Wes McKinney wrote: > On Sun, Oct 23, 2011 at 8:07 PM, Eric Firing wrote: > > On 10/23/2011 12:34 PM, Nathaniel Smith wrote: > > > >> like. And in this case I do think we can come up with an API that will > >> make everyone happy, but that Mark's current API probably can't be > >> incrementally evolved to become that API.) > >> > > > > No one could object to coming up with an API that makes everyone happy, > > provided that it actually gets coded up, tested, and is found to be fast > > and maintainable. When you say the API probably can't be evolved, do > > you mean that the underlying implementation also has to be redone? And > > if so, who will do it, and when? > > > > Eric > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > I personally am a bit apprehensive as I am worried about the masked > array abstraction "leaking" through to users of pandas, something > which I simply will not accept (why I decided against using numpy.ma > early on, that + performance problems). Basically if having an > understanding of masked arrays is a prerequisite for using pandas, the > whole thing is DOA to me as it undermines the usability arguments I've > been making about switching to Python (from R) for data analysis and > statistical computing. > The missing data functionality looks far more like R than numpy.ma. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Oct 24 10:54:57 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 24 Oct 2011 08:54:57 -0600 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> Message-ID: On Mon, Oct 24, 2011 at 8:40 AM, Charles R Harris wrote: > > > On Sun, Oct 23, 2011 at 11:23 PM, Wes McKinney wrote: > >> On Sun, Oct 23, 2011 at 8:07 PM, Eric Firing wrote: >> > On 10/23/2011 12:34 PM, Nathaniel Smith wrote: >> > >> >> like. And in this case I do think we can come up with an API that will >> >> make everyone happy, but that Mark's current API probably can't be >> >> incrementally evolved to become that API.) >> >> >> > >> > No one could object to coming up with an API that makes everyone happy, >> > provided that it actually gets coded up, tested, and is found to be fast >> > and maintainable. When you say the API probably can't be evolved, do >> > you mean that the underlying implementation also has to be redone? And >> > if so, who will do it, and when? >> > >> > Eric >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> >> I personally am a bit apprehensive as I am worried about the masked >> array abstraction "leaking" through to users of pandas, something >> which I simply will not accept (why I decided against using numpy.ma >> early on, that + performance problems). Basically if having an >> understanding of masked arrays is a prerequisite for using pandas, the >> whole thing is DOA to me as it undermines the usability arguments I've >> been making about switching to Python (from R) for data analysis and >> statistical computing. >> > > The missing data functionality looks far more like R than numpy.ma. > > For instance In [8]: a = arange(5, maskna=1) In [9]: a[2] = np.NA In [10]: a.mean() Out[10]: NA(dtype='float64') In [11]: a.mean(skipna=1) Out[11]: 2.0 In [12]: a = arange(5) In [13]: b = a.view(maskna=1) In [14]: a.mean() Out[14]: 2.0 In [15]: b[2] = np.NA In [16]: b.mean() Out[16]: NA(dtype='float64') In [17]: b.mean(skipna=1) Out[17]: 2.0 Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Mon Oct 24 11:31:01 2011 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 24 Oct 2011 17:31:01 +0200 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> Message-ID: 24.10.2011 16:40, Charles R Harris kirjoitti: [clip] > The missing data functionality looks far more like R than numpy.ma ... and masked arrays must be explicitly requested by the user [1]. The MA stuff can "leak through" only if the user makes use of a library that returns masked results (or explicitly creates masked arrays), but as far as I understand that's about the same situation as with np.ma. .. [1] http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html -- Pauli Virtanen From nadavh at visionsense.com Mon Oct 24 11:34:57 2011 From: nadavh at visionsense.com (Nadav Horesh) Date: Mon, 24 Oct 2011 08:34:57 -0700 Subject: [Numpy-discussion] neighborhood iterator speed In-Reply-To: References: <26FC23E7C398A64083C980D16001012D261BA8F04D@VA3DIAXVS361.RED001.local> <26FC23E7C398A64083C980D16001012D261B9F1C27@VA3DIAXVS361.RED001.local> <26FC23E7C398A64083C980D16001012D261B9F1C6B@VA3DIAXVS361.RED001.local> Message-ID: <26FC23E7C398A64083C980D16001012D261B9F1DFA@VA3DIAXVS361.RED001.local> My use case is a biliterl filter: It is a convolution-like filter used mainly in image-processing, which may use relatively large convolution kernels (in the order of 50x50). I would like to run the inner loop (iteration over the neighbourhood) with a direct indexing (in a cython code) rather then using the slow iterator, in order to save time. A separate issue is the new cython's parallel loop that raises the need for GIL-free numpy iterators (I might be wrong though). Anyway, it is not urgent for me. Nadav -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of David Cournapeau Sent: Monday, October 24, 2011 4:04 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] neighborhood iterator speed On Mon, Oct 24, 2011 at 1:23 PM, Nadav Horesh wrote: > * I'll try to implement the 2D iterator as far as far as my programming expertise goes. It might take few days. I am pretty sure the code is in the history, if you are patient enough to look for it in git history. I can't remember why I removed it (maybe because it was not faster ?). > > * There is a risk in providing a buffer pointer, and for my (and probably most) use cases it is better for the iterator constructor to provide it. I was thinking about the possibility to give the iterator a shared memory pointer, to open a door for multiprocessing. Maybe it is better instead to provide a contiguous ndarray object to enable a sanity check. One could ask for an optional buffer (if NULL -> auto-allocation). But I would need a more detailed explanation about what you are trying to do to warrant changing the API here. cheers, David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion __________ Information from ESET NOD32 Antivirus, version of virus signature database 4628 (20091122) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com From wesmckinn at gmail.com Mon Oct 24 13:12:15 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 24 Oct 2011 13:12:15 -0400 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> Message-ID: On Mon, Oct 24, 2011 at 10:54 AM, Charles R Harris wrote: > > > On Mon, Oct 24, 2011 at 8:40 AM, Charles R Harris > wrote: >> >> >> On Sun, Oct 23, 2011 at 11:23 PM, Wes McKinney >> wrote: >>> >>> On Sun, Oct 23, 2011 at 8:07 PM, Eric Firing wrote: >>> > On 10/23/2011 12:34 PM, Nathaniel Smith wrote: >>> > >>> >> like. And in this case I do think we can come up with an API that will >>> >> make everyone happy, but that Mark's current API probably can't be >>> >> incrementally evolved to become that API.) >>> >> >>> > >>> > No one could object to coming up with an API that makes everyone happy, >>> > provided that it actually gets coded up, tested, and is found to be >>> > fast >>> > and maintainable. ?When you say the API probably can't be evolved, do >>> > you mean that the underlying implementation also has to be redone? ?And >>> > if so, who will do it, and when? >>> > >>> > Eric >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at scipy.org >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> > >>> >>> I personally am a bit apprehensive as I am worried about the masked >>> array abstraction "leaking" through to users of pandas, something >>> which I simply will not accept (why I decided against using numpy.ma >>> early on, that + performance problems). Basically if having an >>> understanding of masked arrays is a prerequisite for using pandas, the >>> whole thing is DOA to me as it undermines the usability arguments I've >>> been making about switching to Python (from R) for data analysis and >>> statistical computing. >> >> The missing data functionality looks far more like R than numpy.ma. >> > > For instance > > In [8]: a = arange(5, maskna=1) > > In [9]: a[2] = np.NA > > In [10]: a.mean() > Out[10]: NA(dtype='float64') > > In [11]: a.mean(skipna=1) > Out[11]: 2.0 > > In [12]: a = arange(5) > > In [13]: b = a.view(maskna=1) > > In [14]: a.mean() > Out[14]: 2.0 > > In [15]: b[2] = np.NA > > In [16]: b.mean() > Out[16]: NA(dtype='float64') > > In [17]: b.mean(skipna=1) > Out[17]: 2.0 > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > I don't really agree with you. some sample R code > arr <- rnorm(10) > arr[5:8] <- NA > arr [1] 0.6451460 -1.1285552 0.6869828 0.4018868 NA NA [7] NA NA 0.3322803 -1.9201257 In your examples you had to pass maskna=True-- I suppose that my only recourse would be to make sure that every array inside a DataFrame, for example, has maskna=True set. I'll have to look in more detail and see if it's feasible/desirable. There's a memory cost to pay, but you can't get the functionality for free. I may just end up sticking with NaN as it's worked pretty well so far the last few years-- it's an impure solution but one with reasonably good performance characteristics in the places that matter. From charlesr.harris at gmail.com Mon Oct 24 13:48:40 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 24 Oct 2011 11:48:40 -0600 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> Message-ID: On Mon, Oct 24, 2011 at 11:12 AM, Wes McKinney wrote: > On Mon, Oct 24, 2011 at 10:54 AM, Charles R Harris > wrote: > > > > > > On Mon, Oct 24, 2011 at 8:40 AM, Charles R Harris > > wrote: > >> > >> > >> On Sun, Oct 23, 2011 at 11:23 PM, Wes McKinney > >> wrote: > >>> > >>> On Sun, Oct 23, 2011 at 8:07 PM, Eric Firing > wrote: > >>> > On 10/23/2011 12:34 PM, Nathaniel Smith wrote: > >>> > > >>> >> like. And in this case I do think we can come up with an API that > will > >>> >> make everyone happy, but that Mark's current API probably can't be > >>> >> incrementally evolved to become that API.) > >>> >> > >>> > > >>> > No one could object to coming up with an API that makes everyone > happy, > >>> > provided that it actually gets coded up, tested, and is found to be > >>> > fast > >>> > and maintainable. When you say the API probably can't be evolved, do > >>> > you mean that the underlying implementation also has to be redone? > And > >>> > if so, who will do it, and when? > >>> > > >>> > Eric > >>> > _______________________________________________ > >>> > NumPy-Discussion mailing list > >>> > NumPy-Discussion at scipy.org > >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >>> > > >>> > >>> I personally am a bit apprehensive as I am worried about the masked > >>> array abstraction "leaking" through to users of pandas, something > >>> which I simply will not accept (why I decided against using numpy.ma > >>> early on, that + performance problems). Basically if having an > >>> understanding of masked arrays is a prerequisite for using pandas, the > >>> whole thing is DOA to me as it undermines the usability arguments I've > >>> been making about switching to Python (from R) for data analysis and > >>> statistical computing. > >> > >> The missing data functionality looks far more like R than numpy.ma. > >> > > > > For instance > > > > In [8]: a = arange(5, maskna=1) > > > > In [9]: a[2] = np.NA > > > > In [10]: a.mean() > > Out[10]: NA(dtype='float64') > > > > In [11]: a.mean(skipna=1) > > Out[11]: 2.0 > > > > In [12]: a = arange(5) > > > > In [13]: b = a.view(maskna=1) > > > > In [14]: a.mean() > > Out[14]: 2.0 > > > > In [15]: b[2] = np.NA > > > > In [16]: b.mean() > > Out[16]: NA(dtype='float64') > > > > In [17]: b.mean(skipna=1) > > Out[17]: 2.0 > > > > Chuck > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > I don't really agree with you. > > some sample R code > > > arr <- rnorm(10) > > arr[5:8] <- NA > > arr > [1] 0.6451460 -1.1285552 0.6869828 0.4018868 NA NA > [7] NA NA 0.3322803 -1.9201257 > > In your examples you had to pass maskna=True-- I suppose that my only > recourse would be to make sure that every array inside a DataFrame, > for example, has maskna=True set. I'll have to look in more detail and > see if it's feasible/desirable. There's a memory cost to pay, but you > can't get the functionality for free. I may just end up sticking with > NaN as it's worked pretty well so far the last few years-- it's an > impure solution but one with reasonably good performance > characteristics in the places that matter. > It might useful to have a way of setting global defaults, or something like a with statement. These are the sort of things that can be adjusted based on experience. For instance, I'm thinking skipna=1 is the natural default for the masked arrays. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From xscript at gmx.net Mon Oct 24 14:09:37 2011 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Mon, 24 Oct 2011 20:09:37 +0200 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: (Charles R. Harris's message of "Mon, 24 Oct 2011 11:48:40 -0600") References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> Message-ID: <87pqhmjm72.fsf@ginnungagap.bsc.es> Charles R Harris writes: [...] > It might useful to have a way of setting global defaults, or something like a > with statement. These are the sort of things that can be adjusted based on > experience. For instance, I'm thinking skipna=1 is the natural default for the > masked arrays. I already raised this concern during the initial discussions, and Mark came up with nice solution. Instead of having an additional stateful global interface that code would have to check in addition to the "skipna" argument, you can have a simple function that takes and/or constructs an ndarray and redefines its ufunc wrapper to always set the "skipna = True" argument. Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From tmp50 at ukr.net Mon Oct 24 14:16:39 2011 From: tmp50 at ukr.net (Dmitrey) Date: Mon, 24 Oct 2011 21:16:39 +0300 Subject: [Numpy-discussion] [ANN] Multifactor analysis tool for experiment planning Message-ID: <2669.1319480199.16946892812193628160@ffe15.ukr.net> Hi all, > > new OpenOpt feature is available: Multifactor analysis tool for experiment planning (in physics, chemistry, biology etc). It is based on numerical optimization solver BOBYQA, released in 2009 by Michael J.D. Powell, and has easy and convenient GUI frontend, written in Python + tkinter. Maybe other (alternative) engines will be available in future. > > See its webpage for details. > > Regards, Dmitrey. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nadavh at visionsense.com Mon Oct 24 15:01:46 2011 From: nadavh at visionsense.com (Nadav Horesh) Date: Mon, 24 Oct 2011 12:01:46 -0700 Subject: [Numpy-discussion] neighborhood iterator speed In-Reply-To: References: <26FC23E7C398A64083C980D16001012D261BA8F04D@VA3DIAXVS361.RED001.local> <26FC23E7C398A64083C980D16001012D261B9F1C27@VA3DIAXVS361.RED001.local> <26FC23E7C398A64083C980D16001012D261B9F1C6B@VA3DIAXVS361.RED001.local>, Message-ID: <26FC23E7C398A64083C980D16001012D261BA8F04F@VA3DIAXVS361.RED001.local> I found the 2d iterator definition active in numpy 1.6.1. I'll test it. Nadav ________________________________________ From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of David Cournapeau [cournape at gmail.com] Sent: 24 October 2011 16:04 To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] neighborhood iterator speed On Mon, Oct 24, 2011 at 1:23 PM, Nadav Horesh wrote: > * I'll try to implement the 2D iterator as far as far as my programming expertise goes. It might take few days. I am pretty sure the code is in the history, if you are patient enough to look for it in git history. I can't remember why I removed it (maybe because it was not faster ?). > > * There is a risk in providing a buffer pointer, and for my (and probably most) use cases it is better for the iterator constructor to provide it. I was thinking about the possibility to give the iterator a shared memory pointer, to open a door for multiprocessing. Maybe it is better instead to provide a contiguous ndarray object to enable a sanity check. One could ask for an optional buffer (if NULL -> auto-allocation). But I would need a more detailed explanation about what you are trying to do to warrant changing the API here. cheers, David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From voong.david at gmail.com Mon Oct 24 18:54:33 2011 From: voong.david at gmail.com (David Voong) Date: Tue, 25 Oct 2011 00:54:33 +0200 Subject: [Numpy-discussion] numpy.matrix subclassing Message-ID: Hi guys, I have a question regarding subclassing of the numpy.matrix class. I read through the wiki page, http://docs.scipy.org/doc/numpy/user/basics.subclassing.html and tried to subclass numpy.matrix, I find that if I override the __finalize_array__ method I have problems using the sum method and get the following error, Traceback (most recent call last): File "test.py", line 61, in print (a * b).sum() File "/afs/ cern.ch/user/d/dvoong/programs/lib/python2.6/site-packages/numpy/matrixlib/defmatrix.py", line 435, in sum return N.ndarray.sum(self, axis, dtype, out)._align(axis) File "/afs/ cern.ch/user/d/dvoong/programs/lib/python2.6/site-packages/numpy/matrixlib/defmatrix.py", line 370, in _align return self[0,0] File "/afs/ cern.ch/user/d/dvoong/programs/lib/python2.6/site-packages/numpy/matrixlib/defmatrix.py", line 305, in __getitem__ out = N.ndarray.__getitem__(self, index) IndexError: 0-d arrays can only use a single () or a list of newaxes (and a single ...) as an index Can anyone help? It's my first time on this mailing list so apologies if this is not the right place to discuss this. Regards David Voong -------------- next part -------------- An HTML attachment was scrubbed... URL: From akshar.bhosale at gmail.com Mon Oct 24 20:30:33 2011 From: akshar.bhosale at gmail.com (akshar bhosale) Date: Tue, 25 Oct 2011 06:00:33 +0530 Subject: [Numpy-discussion] Fwd: libmkl_lapack error in numpy In-Reply-To: References: Message-ID: ---------- Forwarded message ---------- From: akshar bhosale Date: Sun, Oct 23, 2011 at 1:05 PM Subject: Re: libmkl_lapack error in numpy To: numpy-discussion at scipy.org Hi, libmkl_lapack.so is added in site.cfg and now the matrix function is not giving an error, but numpy.test hangs. On Sun, Oct 23, 2011 at 11:24 AM, akshar bhosale wrote: > i am getting following error. > python -c 'import numpy;numpy.matrix([[1, 5, 10], [1.0, 3j, 4]], > numpy.complex128).T.I.H' > MKL FATAL ERROR: Cannot load libmkl_lapack.so > > have installed numpy 1.6.0 with python 2.6. > i have intel cluster toolkit installed on my system. (11/069 version and > mlk=10.3). i have machine having intel xeon processor and rhel 5.2 x86_64 > platform. i am trying with intel compilers. > if i do > python -c 'import numpy;numpy.matrix([[1, 5, 10], [1.0, 3j, 4]], > numpy.complex128).T.I.H' > python: symbol lookup error: > /opt/intel/Compiler/11.0/069/ > mkl/lib/em64/libmkl_lapack.so: undefined > symbol: mkl_lapack_zgeqrf > my site.cfg is : > #################### > [mkl] > mkl_libs =3D mkl_def, mkl_intel_lp64, mkl_intel_thread, mkl_core, mkl_mc > > - Ignored: > lapack_libs =3D mkl_lapack95_lp64 > > library_dirs =3D /opt/intel/Compiler/11.0/069/ > mkl/lib/em64t:/opt/intel/Compiler/11.0/069/lib/intel64/ > include_dirs =3D > > /opt/intel/Compiler/11.0/069/mkl/include:/opt/intel/Compiler/11.0/069/inclu= > de/ > #################### > and intelcompiler.py is : > ############################ > from distutils.unixccompiler import UnixCCompiler > from numpy.distutils.exec_command import find_executable > import sys > > class IntelCCompiler(UnixCCompiler): > """ A modified Intel compiler compatible with an gcc built > Python.""" > compiler_type =3D 'intel' > cc_exe =3D 'icc' > cc_args =3D 'fPIC' > > def __init__ (self, verbose=3D0, dry_run=3D0, force=3D0): > sys.exit(0) > UnixCCompiler.__init__ (self, verbose,dry_run, force) > self.cc_exe =3D 'icc -fPIC ' > compiler =3D self.cc_exe > self.set_executables(compiler=3Dcompiler, > compiler_so=3Dcompiler, > compiler_cxx=3Dcompiler, > linker_exe=3Dcompiler, > linker_so=3Dcompiler + ' -shared -lstdc++') > > class IntelItaniumCCompiler(IntelCCompiler): > compiler_type =3D 'intele' > > # On Itanium, the Intel Compiler used to be called ecc, let's search > fo= > r > # it (now it's also icc, so ecc is last in the search). > for cc_exe in map(find_executable,['icc','ecc']): > if cc_exe: > break > > class IntelEM64TCCompiler(UnixCCompiler): > """ A modified Intel x86_64 compiler compatible with a 64bit gcc > built > Python. > """ > compiler_type =3D 'intelem' > cc_exe =3D 'icc -m64 -fPIC' > cc_args =3D "-fPIC -openmp" > def __init__ (self, verbose=3D0, dry_run=3D0, force=3D0): > UnixCCompiler.__init__ (self, verbose,dry_run, force) > self.cc_exe =3D 'icc -m64 -fPIC -openmp ' > compiler =3D self.cc_exe > self.set_executables(compiler=3Dcompiler, > compiler_so=3Dcompiler, > compiler_cxx=3Dcompiler, > linker_exe=3Dcompiler, > linker_so=3Dcompiler + ' -shared -lstdc++') > ########################## > LD_LIBRARY_PATH is : > ######################### > > /opt/intel/Compiler/11.0/069/mkl/lib/em64t:/opt/scalasca-1.3.3/lib:/opt/PBS= > > /lib:/opt/intel/mpi/lib64:/opt/maui/lib:/opt/jdk1.6.0_23/lib:/opt/intel/Com= > > piler/11.0/069/ipp/em64t/sharedlib:/opt/intel/Compiler/11.0/069/mkl/lib/em6= > > 4t:/opt/intel/Compiler/11.0/069/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/l= > > ib:/opt/intel/Compiler/11.0/069/lib/intel64:/opt/intel/Compiler/11.0/069/ip= > > p/em64t/sharedlib:/opt/intel/Compiler/11.0/069/mkl/lib/em64t:/opt/intel/Com= > > piler/11.0/069/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/intel/Com= > piler/11.0/069/lib/intel64:/usr/local/lib > ######################### > > - Done. > > > > ---------- Forwarded message ---------- > From: akshar bhosale > To: numpy-discussion-request at scipy.org > Date: Sun, 23 Oct 2011 11:19:27 +0530 > Subject: libmkl_lapack error > Hi, > i am getting following error. > python -c 'import numpy;numpy.matrix([[1, 5, 10], [1.0, 3j, 4]], > numpy.complex128).T.I.H' > MKL FATAL ERROR: Cannot load libmkl_lapack.so > > have installed numpy 1.6.0 with python 2.6. > > i have intel cluster toolkit installed on my system. (11/069 version and > mkl 10.3). i have machine having intel xeon processor and rhel 5.2 x86_64 > platform. i am trying with intel compilers. > if i do > > python -c 'import numpy;numpy.matrix([[1, 5, 10], [1.0, 3j, 4]], > numpy.complex128).T.I.H' > python: symbol lookup error: > /opt/intel/Compiler/11.0/069/mkl/lib/em64/libmkl_lapack.so: undefined > symbol: mkl_lapack_zgeqrf > > my site.cfg is : > #################### > [mkl] > > mkl_libs = mkl_def, mkl_intel_lp64, mkl_intel_thread, mkl_core, mkl_mc > lapack_libs = mkl_lapack95_lp64 > > library_dirs = /opt/intel/Compiler/11.0/069/ > mkl/lib/em64t:/opt/intel/Compiler/11.0/069/lib/intel64/ > include_dirs = > /opt/intel/Compiler/11.0/069/mkl/include:/opt/intel/Compiler/11.0/069/include/ > #################### > and intelcompiler.py is : > ############################ > from distutils.unixccompiler import UnixCCompiler > from numpy.distutils.exec_command import find_executable > import sys > > class IntelCCompiler(UnixCCompiler): > """ A modified Intel compiler compatible with an gcc built Python.""" > compiler_type = 'intel' > cc_exe = 'icc' > cc_args = 'fPIC' > > def __init__ (self, verbose=0, dry_run=0, force=0): > sys.exit(0) > UnixCCompiler.__init__ (self, verbose,dry_run, force) > self.cc_exe = 'icc -fPIC ' > compiler = self.cc_exe > self.set_executables(compiler=compiler, > compiler_so=compiler, > compiler_cxx=compiler, > linker_exe=compiler, > linker_so=compiler + ' -shared -lstdc++') > > class IntelItaniumCCompiler(IntelCCompiler): > compiler_type = 'intele' > > # On Itanium, the Intel Compiler used to be called ecc, let's search > for > # it (now it's also icc, so ecc is last in the search). > for cc_exe in map(find_executable,['icc','ecc']): > if cc_exe: > break > > class IntelEM64TCCompiler(UnixCCompiler): > """ A modified Intel x86_64 compiler compatible with a 64bit gcc built > Python. > """ > compiler_type = 'intelem' > cc_exe = 'icc -m64 -fPIC' > cc_args = "-fPIC -openmp" > def __init__ (self, verbose=0, dry_run=0, force=0): > UnixCCompiler.__init__ (self, verbose,dry_run, force) > self.cc_exe = 'icc -m64 -fPIC -openmp ' > compiler = self.cc_exe > self.set_executables(compiler=compiler, > compiler_so=compiler, > compiler_cxx=compiler, > linker_exe=compiler, > linker_so=compiler + ' -shared -lstdc++') > ########################## > LD_LIBRARY_PATH is : > ######################### > > /opt/intel/Compiler/11.0/069/mkl/lib/em64t:/opt/scalasca-1.3.3/lib:/opt/PBS/lib:/opt/intel/mpi/lib64:/opt/maui/lib:/opt/jdk1.6.0_23/lib:/opt/intel/Compiler/11.0/069/ipp/em64t/sharedlib:/opt/intel/Compiler/11.0/069/mkl/lib/em64t:/opt/intel/Compiler/11.0/069/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/intel/Compiler/11.0/069/lib/intel64:/opt/intel/Compiler/11.0/069/ipp/em64t/sharedlib:/opt/intel/Compiler/11.0/069/mkl/lib/em64t:/opt/intel/Compiler/11.0/069/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib:/opt/intel/Compiler/11.0/069/lib/intel64:/usr/local/lib > ######################### > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Oct 25 00:59:25 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 24 Oct 2011 21:59:25 -0700 Subject: [Numpy-discussion] float128 / longdouble on PPC - is it broken? Message-ID: Hi, I just ran into this on a PPC machine: In [1]: import numpy as np In [2]: np.__version__ Out[2]: '2.0.0.dev-4daf949' In [3]: res = np.longdouble(2)**64 In [4]: res Out[4]: 18446744073709551616.0 In [5]: 2**64 Out[5]: 18446744073709551616L In [6]: res-1 Out[6]: 36893488147419103231.0 Same for numpy 1.4.1. I don't have a SPARC to test on but I believe it's the same double-double type? See you, Matthew From hangenuit at gmail.com Tue Oct 25 02:44:37 2011 From: hangenuit at gmail.com (Han Genuit) Date: Tue, 25 Oct 2011 08:44:37 +0200 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: <87pqhmjm72.fsf@ginnungagap.bsc.es> References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> Message-ID: Well, if I may have a say, I think that an open source project is especially open when users as developers can contribute to the code base and can participate in discussions on how to improve the existing designs and ideas. I do not think a project is open when it crumbles down into politics.. I have seen a lot of work done by Mark especially to ensure that everyone had a say in what he was doing, up to the point where this might not be fun anymore. And from what I can see at the time, which was back in August, everyone has had plenty of opportunity to discuss or contribute to the specific changes that were made. This was an open contribution to the NumPy code, not some cooked up shady business by high and mighty developers and I, for one, am happy with how it turned out. From matthew.brett at gmail.com Tue Oct 25 03:15:32 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 25 Oct 2011 00:15:32 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> Message-ID: Hi, On Mon, Oct 24, 2011 at 11:44 PM, Han Genuit wrote: > Well, if I may have a say, I think that an open source project is > especially open when users as developers can contribute to the code > base and can participate in discussions on how to improve the existing > designs and ideas. I do not think a project is open when it crumbles > down into politics.. I have seen a lot of work done by Mark especially > to ensure that everyone had a say in what he was doing, up to the > point where this might not be fun anymore. And from what I can see at > the time, which was back in August, everyone has had plenty of > opportunity to discuss or contribute to the specific changes that were > made. > > This was an open contribution to the NumPy code, not some cooked up > shady business by high and mighty developers and I, for one, am happy > with how it turned out. I'm afraid I find this whole thread very unpleasant. I have the odd impression of being back at high school. Some of the big kids are pushing me around and then the other kids join in. It didn't have to be this way. Someone could have replied like this to Nathaniel: "Oh - yes - I'm sorry - we actually had the discussion on the pull request. Looking back, I see that we didn't flag this up on the mailing list and maybe we should have. Thanks for pointing that out. Maybe we could start another discussion of the API in view of the changes that have gone in". But that didn't happen. Best, Matthew From pav at iki.fi Tue Oct 25 05:43:08 2011 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 25 Oct 2011 11:43:08 +0200 Subject: [Numpy-discussion] float128 / longdouble on PPC - is it broken? In-Reply-To: References: Message-ID: 25.10.2011 06:59, Matthew Brett kirjoitti: > res = np.longdouble(2)**64 > res-1 > 36893488147419103231.0 Can you check if long double works properly (not a given) in C on that platform: long double x; x = powl(2, 64); x -= 1; printf("%g %Lg\n", (double)x, x); or, in case the platform doesn't have powl: long double x; x = pow(2, 64); x -= 1; printf("%g %Lg\n", (double)x, x); From nadavh at visionsense.com Tue Oct 25 07:14:52 2011 From: nadavh at visionsense.com (Nadav Horesh) Date: Tue, 25 Oct 2011 04:14:52 -0700 Subject: [Numpy-discussion] neighborhood iterator speed In-Reply-To: <26FC23E7C398A64083C980D16001012D261BA8F04F@VA3DIAXVS361.RED001.local> References: <26FC23E7C398A64083C980D16001012D261BA8F04D@VA3DIAXVS361.RED001.local> <26FC23E7C398A64083C980D16001012D261B9F1C27@VA3DIAXVS361.RED001.local> <26FC23E7C398A64083C980D16001012D261B9F1C6B@VA3DIAXVS361.RED001.local>, <26FC23E7C398A64083C980D16001012D261BA8F04F@VA3DIAXVS361.RED001.local> Message-ID: <26FC23E7C398A64083C980D16001012D261B9F2151@VA3DIAXVS361.RED001.local> Finally managed to use PyArrayNeighborhoodIter_Next2D with numpy 1.5.0 (in numpy 1.6 it doesn't get along with halffloat). Benchmark results (not the same computer and parameters I used in the previous benchmark): 1. ...Next2D (zero padding, it doesn't accept mirror padding): 10 sec 2. ...Next (zero padding): 53 sec 3. ...Next (mirror padding): 128 sec Remarks: 1. I did not check the validity of the results 2. Mirror padding is preferable for my specific case. What does it mean for the potential for the neighbourhood iterator acceleration? Nadav. -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Nadav Horesh Sent: Monday, October 24, 2011 9:02 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] neighborhood iterator speed I found the 2d iterator definition active in numpy 1.6.1. I'll test it. Nadav ________________________________________ From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of David Cournapeau [cournape at gmail.com] Sent: 24 October 2011 16:04 To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] neighborhood iterator speed On Mon, Oct 24, 2011 at 1:23 PM, Nadav Horesh wrote: > * I'll try to implement the 2D iterator as far as far as my programming expertise goes. It might take few days. I am pretty sure the code is in the history, if you are patient enough to look for it in git history. I can't remember why I removed it (maybe because it was not faster ?). > > * There is a risk in providing a buffer pointer, and for my (and probably most) use cases it is better for the iterator constructor to provide it. I was thinking about the possibility to give the iterator a shared memory pointer, to open a door for multiprocessing. Maybe it is better instead to provide a contiguous ndarray object to enable a sanity check. One could ask for an optional buffer (if NULL -> auto-allocation). But I would need a more detailed explanation about what you are trying to do to warrant changing the API here. cheers, David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion __________ Information from ESET NOD32 Antivirus, version of virus signature database 4628 (20091122) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com From charlesr.harris at gmail.com Tue Oct 25 10:31:13 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 25 Oct 2011 08:31:13 -0600 Subject: [Numpy-discussion] float128 / longdouble on PPC - is it broken? In-Reply-To: References: Message-ID: On Mon, Oct 24, 2011 at 10:59 PM, Matthew Brett wrote: > Hi, > > I just ran into this on a PPC machine: > > In [1]: import numpy as np > > In [2]: np.__version__ > Out[2]: '2.0.0.dev-4daf949' > > In [3]: res = np.longdouble(2)**64 > > In [4]: res > Out[4]: 18446744073709551616.0 > > In [5]: 2**64 > Out[5]: 18446744073709551616L > > In [6]: res-1 > Out[6]: 36893488147419103231.0 > > Same for numpy 1.4.1. > > I don't have a SPARC to test on but I believe it's the same double-double > type? > > The PPC uses two doubles to represent long doubles, the SPARC uses software emulation of ieee quad precision for long doubles, very different. The subtraction of 1 working like multiplication by two is strange, perhaps the one is getting subtracted from the exponent somehow? It would be interesting to see if the same problem happens in pure c. As a work around, can I ask what you are trying to do with the long doubles? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From xscript at gmx.net Tue Oct 25 11:04:31 2011 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Tue, 25 Oct 2011 17:04:31 +0200 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: (Matthew Brett's message of "Tue, 25 Oct 2011 00:15:32 -0700") References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> Message-ID: <87hb2xf6yo.fsf@ginnungagap.bsc.es> Matthew Brett writes: > I'm afraid I find this whole thread very unpleasant. > I have the odd impression of being back at high school. Some of the > big kids are pushing me around and then the other kids join in. > It didn't have to be this way. > Someone could have replied like this to Nathaniel: > "Oh - yes - I'm sorry - we actually had the discussion on the pull > request. Looking back, I see that we didn't flag this up on the > mailing list and maybe we should have. Thanks for pointing that out. > Maybe we could start another discussion of the API in view of the > changes that have gone in". > But that didn't happen. Well, I really thought that all the interested parties would take a look at [1]. While it's true that the pull requests are not obvious if you're not using the functionalities of the github web (or unless announced in this list), I think that Mark's announcement was precisely directed at having a new round of discussions after having some code to play around with and see how intuitive or counter-intuitive the implemented concepts could be. [1] http://old.nabble.com/NA-masks-for-NumPy-are-ready-to-test-td32291024.html Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From aronne.merrelli at gmail.com Tue Oct 25 11:24:59 2011 From: aronne.merrelli at gmail.com (Aronne Merrelli) Date: Tue, 25 Oct 2011 10:24:59 -0500 Subject: [Numpy-discussion] numpy.matrix subclassing In-Reply-To: References: Message-ID: On Mon, Oct 24, 2011 at 5:54 PM, David Voong wrote: > Hi guys, > > I have a question regarding subclassing of the numpy.matrix class. > > I read through the wiki page, > http://docs.scipy.org/doc/numpy/user/basics.subclassing.html > > and tried to subclass numpy.matrix, I find that if I override the > __finalize_array__ method I have problems using the sum method and get the > following error, > > > Traceback (most recent call last): > File "test.py", line 61, in > print (a * b).sum() > File "/afs/ > cern.ch/user/d/dvoong/programs/lib/python2.6/site-packages/numpy/matrixlib/defmatrix.py", > line 435, in sum > return N.ndarray.sum(self, axis, dtype, out)._align(axis) > File "/afs/ > cern.ch/user/d/dvoong/programs/lib/python2.6/site-packages/numpy/matrixlib/defmatrix.py", > line 370, in _align > return self[0,0] > File "/afs/ > cern.ch/user/d/dvoong/programs/lib/python2.6/site-packages/numpy/matrixlib/defmatrix.py", > line 305, in __getitem__ > out = N.ndarray.__getitem__(self, index) > IndexError: 0-d arrays can only use a single () or a list of newaxes (and a > single ...) as an index > > > Hi, Thanks for asking this - I'm also trying to subclass np.matrix and running into similar problems; I never generally need to sum my vectors so this wasn't a problem I had noticed thus far. Anyway, for np.matrix, there are definitely particular issues beyond what is described on the array subclassing wiki. I think I have a workaround, based on struggling with my own subclass. This is really a hack since I'm not sure how some parts of matrix actually work, so if someone has a better solution please speak up! You didn't give details on the actual subclass, but I can recreate the error with the following minimal example (testing with Numpy 1.6.1 inside EPD 7.1): class MatSubClass1(np.matrix): def __new__(cls, input_array): obj = np.asarray(input_array).view(cls) return obj def __array_finalize__(self, obj): pass def __array_wrap__(self, out_arr, context=None): return np.ndarray.__array_wrap__(self, out_arr, context) In [2]: m1 = MatSubClass1( [[2,0],[1,1]] ) In [3]: m1.sum() ... IndexError: 0-d arrays can only use a single () or a list of newaxes (and a single ...) as an index The problem is that __array_finalize__ of the matrix class that needs to get called, to preserve dimensions (matrix should always have 2 dimensions). You can't just add the matrix __array_finalize__ because the initial call happens when you create the object, in which case obj is a ndarray object, not a matrix. So, check to see obj is a matrix first before calling it. In addition, there is some undocumented _getitem attribute inside matrix, and I do not know what it does. If you just set that attribute during __new__, you get something that seems to work: class MatSubClass2(np.matrix): def __new__(cls, input_array): obj = np.asarray(input_array).view(cls) obj._getitem = False return obj def __array_finalize__(self, obj): if isinstance(obj, np.matrix): np.matrix.__array_finalize__(self, obj) def __array_wrap__(self, out_arr, context=None): return np.ndarray.__array_wrap__(self, out_arr, context) In [4]: m2 = MatSubClass2( [[2,0],[1,1]] ) In [5]: m2.sum(), m2.sum(0), m2.sum(1) Out[5]: (4, matrix([[3, 1]]), matrix([[2], [2]])) HTH, Aronne -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Oct 25 13:44:33 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 25 Oct 2011 10:44:33 -0700 Subject: [Numpy-discussion] float128 / longdouble on PPC - is it broken? In-Reply-To: References: Message-ID: Hi, On Tue, Oct 25, 2011 at 7:31 AM, Charles R Harris wrote: > > > On Mon, Oct 24, 2011 at 10:59 PM, Matthew Brett > wrote: >> >> Hi, >> >> I just ran into this on a PPC machine: >> >> In [1]: import numpy as np >> >> In [2]: np.__version__ >> Out[2]: '2.0.0.dev-4daf949' >> >> In [3]: res = np.longdouble(2)**64 >> >> In [4]: res >> Out[4]: 18446744073709551616.0 >> >> In [5]: 2**64 >> Out[5]: 18446744073709551616L >> >> In [6]: res-1 >> Out[6]: 36893488147419103231.0 >> >> Same for numpy 1.4.1. >> >> I don't have a SPARC to test on but I believe it's the same double-double >> type? >> > > The PPC uses two doubles to represent long doubles, the SPARC uses software > emulation of ieee quad precision for long doubles, very different. Yes, thanks - I read more after my post. I guess from this: http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.genprogc/doc/genprogc/128bit_long_double_floating-point_datatype.htm that AIX does use double-double. > The > subtraction of 1 working like multiplication by two is strange, perhaps the > one is getting subtracted from the exponent somehow? It would be interesting > to see if the same problem happens in pure c. > > As a work around, can I ask what you are trying to do with the long doubles? I was trying to use them as an intermediate format for high-precision floating point calculations, before converting to integers. Best, Matthew From matthew.brett at gmail.com Tue Oct 25 13:45:02 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 25 Oct 2011 10:45:02 -0700 Subject: [Numpy-discussion] float128 / longdouble on PPC - is it broken? In-Reply-To: References: Message-ID: Hi, On Tue, Oct 25, 2011 at 2:43 AM, Pauli Virtanen wrote: > 25.10.2011 06:59, Matthew Brett kirjoitti: >> res = np.longdouble(2)**64 >> res-1 >> 36893488147419103231.0 > > Can you check if long double works properly (not a given) in C on that > platform: > > ? ? ? ?long double x; > ? ? ? ?x = powl(2, 64); > ? ? ? ?x -= 1; > ? ? ? ?printf("%g %Lg\n", (double)x, x); > > or, in case the platform doesn't have powl: > > ? ? ? ?long double x; > ? ? ? ?x = pow(2, 64); > ? ? ? ?x -= 1; > ? ? ? ?printf("%g %Lg\n", (double)x, x); Both the same as numpy: [mb312 at jerry ~]$ gcc test.c test.c: In function 'main': test.c:5: warning: incompatible implicit declaration of built-in function 'powl' [mb312 at jerry ~]$ ./a.out 1.84467e+19 3.68935e+19 Thanks, Matthew From charlesr.harris at gmail.com Tue Oct 25 13:52:36 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 25 Oct 2011 11:52:36 -0600 Subject: [Numpy-discussion] float128 / longdouble on PPC - is it broken? In-Reply-To: References: Message-ID: On Tue, Oct 25, 2011 at 11:45 AM, Matthew Brett wrote: > Hi, > > On Tue, Oct 25, 2011 at 2:43 AM, Pauli Virtanen wrote: > > 25.10.2011 06:59, Matthew Brett kirjoitti: > >> res = np.longdouble(2)**64 > >> res-1 > >> 36893488147419103231.0 > > > > Can you check if long double works properly (not a given) in C on that > > platform: > > > > long double x; > > x = powl(2, 64); > > x -= 1; > > printf("%g %Lg\n", (double)x, x); > > > > or, in case the platform doesn't have powl: > > > > long double x; > > x = pow(2, 64); > > x -= 1; > > printf("%g %Lg\n", (double)x, x); > > Both the same as numpy: > > [mb312 at jerry ~]$ gcc test.c > test.c: In function 'main': > test.c:5: warning: incompatible implicit declaration of built-in function > 'powl' > I think implicit here means that that the arguments and the return values are treated as integers. Did you #include ? > [mb312 at jerry ~]$ ./a.out > 1.84467e+19 3.68935e+19 > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Oct 25 14:03:35 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 25 Oct 2011 11:03:35 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: <87hb2xf6yo.fsf@ginnungagap.bsc.es> References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> Message-ID: Hi, On Tue, Oct 25, 2011 at 8:04 AM, Llu?s wrote: > Matthew Brett writes: >> I'm afraid I find this whole thread very unpleasant. > >> I have the odd impression of being back at high school. ?Some of the >> big kids are pushing me around and then the other kids join in. > >> It didn't have to be this way. > >> Someone could have replied like this to Nathaniel: > >> "Oh - yes - I'm sorry - ?we actually had the discussion on the pull >> request. ?Looking back, I see that we didn't flag this up on the >> mailing list and maybe we should have. ?Thanks for pointing that out. >> ?Maybe we could start another discussion of the API in view of the >> changes that have gone in". > >> But that didn't happen. > > Well, I really thought that all the interested parties would take a look at [1]. > > While it's true that the pull requests are not obvious if you're not using the > functionalities of the github web (or unless announced in this list), I think > that Mark's announcement was precisely directed at having a new round of > discussions after having some code to play around with and see how intuitive or > counter-intuitive the implemented concepts could be. I just wanted to be clear what I meant. The key point is not whether or not the pull-request or request for testing was in fact the right place for the discussion that Travis suggested. I guess you can argue that either way. I'd say no, but I can see how you would disagree on that. The key point is - how much do we value constructive disagreement? If we do value constructive disagreement then we'll go out of our way to talk through the points of contention, and make sure that the people who disagree, especially the minority, feel that they have been fully heard. If we don't value constructive disagreement then we'll let the other side know that further disagreement will be taken as a sign of bad faith. Now - what do you see here? I see the second and that worries me. Best, Matthew From matthew.brett at gmail.com Tue Oct 25 14:05:40 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 25 Oct 2011 11:05:40 -0700 Subject: [Numpy-discussion] float128 / longdouble on PPC - is it broken? In-Reply-To: References: Message-ID: Hi, On Tue, Oct 25, 2011 at 10:52 AM, Charles R Harris wrote: > > > On Tue, Oct 25, 2011 at 11:45 AM, Matthew Brett > wrote: >> >> Hi, >> >> On Tue, Oct 25, 2011 at 2:43 AM, Pauli Virtanen wrote: >> > 25.10.2011 06:59, Matthew Brett kirjoitti: >> >> res = np.longdouble(2)**64 >> >> res-1 >> >> 36893488147419103231.0 >> > >> > Can you check if long double works properly (not a given) in C on that >> > platform: >> > >> > ? ? ? ?long double x; >> > ? ? ? ?x = powl(2, 64); >> > ? ? ? ?x -= 1; >> > ? ? ? ?printf("%g %Lg\n", (double)x, x); >> > >> > or, in case the platform doesn't have powl: >> > >> > ? ? ? ?long double x; >> > ? ? ? ?x = pow(2, 64); >> > ? ? ? ?x -= 1; >> > ? ? ? ?printf("%g %Lg\n", (double)x, x); >> >> Both the same as numpy: >> >> [mb312 at jerry ~]$ gcc test.c >> test.c: In function 'main': >> test.c:5: warning: incompatible implicit declaration of built-in function >> 'powl' > > I think implicit here means that that the arguments and the return values > are treated as integers. Did you #include ? Ah - you've detected my severe ignorance of c. But with math.h, the result is the same, #include #include int main(int argc, char* argv) { long double x; x = pow(2, 64); x -= 1; printf("%g %Lg\n", (double)x, x); } See you, Matthew From matthew.brett at gmail.com Tue Oct 25 14:08:32 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 25 Oct 2011 11:08:32 -0700 Subject: [Numpy-discussion] float128 / longdouble on PPC - is it broken? In-Reply-To: References: Message-ID: On Tue, Oct 25, 2011 at 11:05 AM, Matthew Brett wrote: > Hi, > > On Tue, Oct 25, 2011 at 10:52 AM, Charles R Harris > wrote: >> >> >> On Tue, Oct 25, 2011 at 11:45 AM, Matthew Brett >> wrote: >>> >>> Hi, >>> >>> On Tue, Oct 25, 2011 at 2:43 AM, Pauli Virtanen wrote: >>> > 25.10.2011 06:59, Matthew Brett kirjoitti: >>> >> res = np.longdouble(2)**64 >>> >> res-1 >>> >> 36893488147419103231.0 >>> > >>> > Can you check if long double works properly (not a given) in C on that >>> > platform: >>> > >>> > ? ? ? ?long double x; >>> > ? ? ? ?x = powl(2, 64); >>> > ? ? ? ?x -= 1; >>> > ? ? ? ?printf("%g %Lg\n", (double)x, x); >>> > >>> > or, in case the platform doesn't have powl: >>> > >>> > ? ? ? ?long double x; >>> > ? ? ? ?x = pow(2, 64); >>> > ? ? ? ?x -= 1; >>> > ? ? ? ?printf("%g %Lg\n", (double)x, x); >>> >>> Both the same as numpy: >>> >>> [mb312 at jerry ~]$ gcc test.c >>> test.c: In function 'main': >>> test.c:5: warning: incompatible implicit declaration of built-in function >>> 'powl' >> >> I think implicit here means that that the arguments and the return values >> are treated as integers. Did you #include ? > > Ah - you've detected my severe ignorance of c. ? But with math.h, the > result is the same, > > #include > #include > > int main(int argc, char* argv) { > ? ? ? long double x; > ? ? ? x = pow(2, 64); > ? ? ? x -= 1; > ? ? ? printf("%g %Lg\n", (double)x, x); > } By the way - if you want a login to this machine, let me know - it's always on and we're using it as a buildslave already. Matthew From pav at iki.fi Tue Oct 25 14:14:48 2011 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 25 Oct 2011 20:14:48 +0200 Subject: [Numpy-discussion] float128 / longdouble on PPC - is it broken? In-Reply-To: References: Message-ID: 25.10.2011 19:45, Matthew Brett kirjoitti: [clip] >> or, in case the platform doesn't have powl: >> >> long double x; >> x = pow(2, 64); >> x -= 1; >> printf("%g %Lg\n", (double)x, x); > > Both the same as numpy: > > [mb312 at jerry ~]$ gcc test.c > test.c: In function 'main': > test.c:5: warning: incompatible implicit declaration of built-in function 'powl' > [mb312 at jerry ~]$ ./a.out > 1.84467e+19 3.68935e+19 This result may indicate that it's the *printing* of long doubles that's broken. Note how the value cast as double prints the correct result, whereas the %Lg format code gives something wrong. Can you try to check this by doing something like: - do some set of calculations using np.longdouble in Numpy (that requires the extra accuracy) - at the end, cast the result back to double -- Pauli Virtanen From ben.root at ou.edu Tue Oct 25 14:24:42 2011 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 25 Oct 2011 13:24:42 -0500 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> Message-ID: On Tue, Oct 25, 2011 at 1:03 PM, Matthew Brett wrote: > Hi, > > On Tue, Oct 25, 2011 at 8:04 AM, Llu?s wrote: > > Matthew Brett writes: > >> I'm afraid I find this whole thread very unpleasant. > > > >> I have the odd impression of being back at high school. Some of the > >> big kids are pushing me around and then the other kids join in. > > > >> It didn't have to be this way. > > > >> Someone could have replied like this to Nathaniel: > > > >> "Oh - yes - I'm sorry - we actually had the discussion on the pull > >> request. Looking back, I see that we didn't flag this up on the > >> mailing list and maybe we should have. Thanks for pointing that out. > >> Maybe we could start another discussion of the API in view of the > >> changes that have gone in". > > > >> But that didn't happen. > > > > Well, I really thought that all the interested parties would take a look > at [1]. > > > > While it's true that the pull requests are not obvious if you're not > using the > > functionalities of the github web (or unless announced in this list), I > think > > that Mark's announcement was precisely directed at having a new round of > > discussions after having some code to play around with and see how > intuitive or > > counter-intuitive the implemented concepts could be. > > I just wanted to be clear what I meant. > > The key point is not whether or not the pull-request or request for > testing was in fact the right place for the discussion that Travis > suggested. I guess you can argue that either way. I'd say no, but > I can see how you would disagree on that. > > This is getting very meta... a disagreement about the disagreement. > The key point is - how much do we value constructive disagreement? > > Personally, I value it very much. My impression of the discussion we all had at the beginning was that the needs of the two distinct communities (R-users and masked array users) were both heard and largely addressed. Aspects of both approaches were used, and the final result is, IMHO, inspired and elegant. Is it perfect? No. Are there ways to improve it? Absolutely, and I fully expect that to happen. > If we do value constructive disagreement then we'll go out of our way > to talk through the points of contention, and make sure that the > people who disagree, especially the minority, feel that they have been > fully heard. > > If we don't value constructive disagreement then we'll let the other > side know that further disagreement will be taken as a sign of bad > faith. > > Now - what do you see here? I see the second and that worries me. > > It is disappointing that you choose not to participate in the thread linked above or in the pull request itself. If I remember correctly, you were working on finishing up your dissertation, so I fully understand the time constraints involved there. However, the pull request and the email notification is the de facto method of staging and discussing changes in any development project. No objections were raised in that pull request, so it went in after some time passed. To hold off the merge, all one would need to do is fire off a quick comment requesting a delay to have a chance to review the pull request. Luckily, git is a VCS, so we are fully capable of reverting any necessary changes if warranted. If you have any concerns or suggestions for changes in the current implementation, feel free to raise them and open additional pull requests. There is no "ganging up" here or any other subterfuge. Tell us exactly what are your issues with the current setup, provide example code demonstrating the issues, and we can certainly discuss ways to improve this. Remember, we *all* have a common agreement here. NumPy needs better support for missing data (in whatever form). Let's work from that assumption and make NumPy a better library to use for everybody! Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Oct 25 14:29:06 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 25 Oct 2011 11:29:06 -0700 Subject: [Numpy-discussion] float128 / longdouble on PPC - is it broken? In-Reply-To: References: Message-ID: Hi, On Tue, Oct 25, 2011 at 11:14 AM, Pauli Virtanen wrote: > 25.10.2011 19:45, Matthew Brett kirjoitti: > [clip] >>> or, in case the platform doesn't have powl: >>> >>> ? ? ? ? long double x; >>> ? ? ? ? x = pow(2, 64); >>> ? ? ? ? x -= 1; >>> ? ? ? ? printf("%g %Lg\n", (double)x, x); >> >> Both the same as numpy: >> >> [mb312 at jerry ~]$ gcc test.c >> test.c: In function 'main': >> test.c:5: warning: incompatible implicit declaration of built-in function 'powl' >> [mb312 at jerry ~]$ ./a.out >> 1.84467e+19 3.68935e+19 > > This result may indicate that it's the *printing* of long doubles that's > broken. Note how the value cast as double prints the correct result, > whereas the %Lg format code gives something wrong. Ah - sorry - I see now what you were trying to do. > Can you try to check this by doing something like: > > - do some set of calculations using np.longdouble in Numpy > ? (that requires the extra accuracy) > > - at the end, cast the result back to double In [1]: import numpy as np In [2]: res = np.longdouble(2)**64 In [6]: res / 2**32 Out[6]: 4294967296.0 In [7]: (res-1) / 2**32 Out[7]: 8589934591.9999999998 In [8]: np.float((res-1) / 2**32) Out[8]: 4294967296.0 In [9]: np.float((res) / 2**32) Out[9]: 4294967296.0 Thanks, Matthew From derek at astro.physik.uni-goettingen.de Tue Oct 25 15:01:10 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Tue, 25 Oct 2011 21:01:10 +0200 Subject: [Numpy-discussion] float128 / longdouble on PPC - is it broken? In-Reply-To: References: Message-ID: <50A02334-B626-47AA-B543-A1866BCFCEAE@astro.physik.uni-goettingen.de> On 25 Oct 2011, at 20:05, Matthew Brett wrote: >>> Both the same as numpy: >>> >>> [mb312 at jerry ~]$ gcc test.c >>> test.c: In function 'main': >>> test.c:5: warning: incompatible implicit declaration of built-in function >>> 'powl' >> >> I think implicit here means that that the arguments and the return values >> are treated as integers. Did you #include ? > > Ah - you've detected my severe ignorance of c. But with math.h, the > result is the same, > > #include > #include > > int main(int argc, char* argv) { > long double x; > x = pow(2, 64); > x -= 1; > printf("%g %Lg\n", (double)x, x); > } What system/compiler is this? I am getting ./ldouble 1.84467e+19 1.84467e+19 and >>> res = np.longdouble(2)**64 >>> res 18446744073709551616.0 >>> 2**64 18446744073709551616L >>> res-1 18446744073709551615.0 >>> np.__version__ '1.6.1' as well as with >>> np.__version__ '2.0.0.dev-3d06f02' [yes, not very up to date] and for all gcc versions /usr/bin/gcc -v Using built-in specs. Target: powerpc-apple-darwin9 Configured with: /var/tmp/gcc/gcc-5493~1/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=i686-apple-darwin9 --program-prefix= --host=powerpc-apple-darwin9 --target=powerpc-apple-darwin9 Thread model: posix gcc version 4.0.1 (Apple Inc. build 5493) to /sw/bin/gcc-fsf-4.6 -v Using built-in specs. COLLECT_GCC=/sw/bin/gcc-fsf-4.6 COLLECT_LTO_WRAPPER=/sw/lib/gcc4.6/libexec/gcc/powerpc-apple-darwin9.8.0/4.6.1/lto-wrapper Target: powerpc-apple-darwin9.8.0 Configured with: ../gcc-4.6.1/configure --prefix=/sw --prefix=/sw/lib/gcc4.6 --mandir=/sw/share/man --infodir=/sw/lib/gcc4.6/info --enable-languages=c,c++,fortran,lto,objc,obj-c++,java --with-gmp=/sw --with-libiconv-prefix=/sw --with-ppl=/sw --with-cloog=/sw --with-mpc=/sw --with-system-zlib --x-includes=/usr/X11R6/include --x-libraries=/usr/X11R6/lib --program-suffix=-fsf-4.6 --enable-cloog-backend=isl --disable-libjava-multilib --disable-libquadmath Thread model: posix gcc version 4.6.1 (GCC) uname -a Darwin osiris.astro.physik.uni-goettingen.de 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:57:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_PPC Power Macintosh Cheers, Derek From matthew.brett at gmail.com Tue Oct 25 15:04:52 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 25 Oct 2011 12:04:52 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> Message-ID: Hi, On Tue, Oct 25, 2011 at 11:24 AM, Benjamin Root wrote: > On Tue, Oct 25, 2011 at 1:03 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Tue, Oct 25, 2011 at 8:04 AM, Llu?s wrote: >> > Matthew Brett writes: >> >> I'm afraid I find this whole thread very unpleasant. >> > >> >> I have the odd impression of being back at high school. ?Some of the >> >> big kids are pushing me around and then the other kids join in. >> > >> >> It didn't have to be this way. >> > >> >> Someone could have replied like this to Nathaniel: >> > >> >> "Oh - yes - I'm sorry - ?we actually had the discussion on the pull >> >> request. ?Looking back, I see that we didn't flag this up on the >> >> mailing list and maybe we should have. ?Thanks for pointing that out. >> >> ?Maybe we could start another discussion of the API in view of the >> >> changes that have gone in". >> > >> >> But that didn't happen. >> > >> > Well, I really thought that all the interested parties would take a look >> > at [1]. >> > >> > While it's true that the pull requests are not obvious if you're not >> > using the >> > functionalities of the github web (or unless announced in this list), I >> > think >> > that Mark's announcement was precisely directed at having a new round of >> > discussions after having some code to play around with and see how >> > intuitive or >> > counter-intuitive the implemented concepts could be. >> >> I just wanted to be clear what I meant. >> >> The key point is not whether or not the pull-request or request for >> testing was in fact the right place for the discussion that Travis >> suggested. ? I guess you can argue that either way. ? I'd say no, but >> I can see how you would disagree on that. >> > > This is getting very meta... a disagreement about the disagreement. Yes, the important point is a social one. The other points are details. >> The key point is - how much do we value constructive disagreement? >> > > Personally, I value it very much. Well - I think everyone believes that that they value constructive discussion, but the question is, what happens when people really disagree? > My impression of the discussion we all > had at the beginning was that the needs of the two distinct communities > (R-users and masked array users) were both heard and largely addressed. > Aspects of both approaches were used, and the final result is, IMHO, > inspired and elegant.? Is it perfect? No.? Are there ways to improve it? > Absolutely, and I fully expect that to happen. To be clear once more, I personally feel we don't need to discuss: 1) Whether Mark did a good job on the code (I have high bias to imagine so). 2) Whether something along these lines would be good to have in numpy >> If we do value constructive disagreement then we'll go out of our way >> to talk through the points of contention, and make sure that the >> people who disagree, especially the minority, feel that they have been >> fully heard. >> >> If we don't value constructive disagreement then we'll let the other >> side know that further disagreement will be taken as a sign of bad >> faith. >> >> Now - what do you see here? ?I see the second and that worries me. >> > > It is disappointing that you choose not to participate in the thread linked > above or in the pull request itself.? If I remember correctly, you were > working on finishing up your dissertation, so I fully understand the time > constraints involved there.? However, the pull request and the email > notification is the de facto method of staging and discussing changes in any > development project.? No objections were raised in that pull request, so it > went in after some time passed.? To hold off the merge, all one would need > to do is fire off a quick comment requesting a delay to have a chance to > review the pull request. I think the pull-request was not the right vehicle for the discussion, you think it was, that's fine, I don't think we need to rehearse that. My question (if you are answering my question) is: if you put yourself in my or Nathaniel's shoes, would you feel that you had been warmly encouraged to express disagreement, or would you feel something else. > Luckily, git is a VCS, so we are fully capable of reverting any necessary > changes if warranted.? If you have any concerns or suggestions for changes > in the current implementation, feel free to raise them and open additional > pull requests.? There is no "ganging up" here or any other subterfuge.? Tell > us exactly what are your issues with the current setup, provide example code > demonstrating the issues, and we can certainly discuss ways to improve this. Has the situation changed since the counter-NEP that Nathaniel and I wrote up? > Remember, we *all* have a common agreement here.? NumPy needs better support > for missing data (in whatever form).? Let's work from that assumption and > make NumPy a better library to use for everybody! I remember walking past a church in a small town in the California desert. It had a sign outside saying 'People who are busy rowing do not have time to rock the boat'. This seemed to me a total failure to understand the New Testament, but also a recipe for organizational disaster. See you, Matthew From matthew.brett at gmail.com Tue Oct 25 15:13:39 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 25 Oct 2011 12:13:39 -0700 Subject: [Numpy-discussion] float128 / longdouble on PPC - is it broken? In-Reply-To: <50A02334-B626-47AA-B543-A1866BCFCEAE@astro.physik.uni-goettingen.de> References: <50A02334-B626-47AA-B543-A1866BCFCEAE@astro.physik.uni-goettingen.de> Message-ID: Hi, On Tue, Oct 25, 2011 at 12:01 PM, Derek Homeier wrote: > On 25 Oct 2011, at 20:05, Matthew Brett wrote: > >>>> Both the same as numpy: >>>> >>>> [mb312 at jerry ~]$ gcc test.c >>>> test.c: In function 'main': >>>> test.c:5: warning: incompatible implicit declaration of built-in function >>>> 'powl' >>> >>> I think implicit here means that that the arguments and the return values >>> are treated as integers. Did you #include ? >> >> Ah - you've detected my severe ignorance of c. ? But with math.h, the >> result is the same, >> >> #include >> #include >> >> int main(int argc, char* argv) { >> ? ? ? long double x; >> ? ? ? x = pow(2, 64); >> ? ? ? x -= 1; >> ? ? ? printf("%g %Lg\n", (double)x, x); >> } > > What system/compiler is this? I am getting > ./ldouble > 1.84467e+19 1.84467e+19 > > and > >>>> res = np.longdouble(2)**64 >>>> res > 18446744073709551616.0 >>>> 2**64 > 18446744073709551616L >>>> res-1 > 18446744073709551615.0 >>>> np.__version__ > '1.6.1' > > as well as with > >>>> np.__version__ > '2.0.0.dev-3d06f02' > [yes, not very up to date] > > and for all gcc versions > /usr/bin/gcc -v > Using built-in specs. > Target: powerpc-apple-darwin9 > Configured with: /var/tmp/gcc/gcc-5493~1/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=i686-apple-darwin9 --program-prefix= --host=powerpc-apple-darwin9 --target=powerpc-apple-darwin9 > Thread model: posix > gcc version 4.0.1 (Apple Inc. build 5493) > > to > > /sw/bin/gcc-fsf-4.6 -v > Using built-in specs. > COLLECT_GCC=/sw/bin/gcc-fsf-4.6 > COLLECT_LTO_WRAPPER=/sw/lib/gcc4.6/libexec/gcc/powerpc-apple-darwin9.8.0/4.6.1/lto-wrapper > Target: powerpc-apple-darwin9.8.0 > Configured with: ../gcc-4.6.1/configure --prefix=/sw --prefix=/sw/lib/gcc4.6 --mandir=/sw/share/man --infodir=/sw/lib/gcc4.6/info --enable-languages=c,c++,fortran,lto,objc,obj-c++,java --with-gmp=/sw --with-libiconv-prefix=/sw --with-ppl=/sw --with-cloog=/sw --with-mpc=/sw --with-system-zlib --x-includes=/usr/X11R6/include --x-libraries=/usr/X11R6/lib --program-suffix=-fsf-4.6 --enable-cloog-backend=isl --disable-libjava-multilib --disable-libquadmath > Thread model: posix > gcc version 4.6.1 (GCC) > > uname -a > Darwin osiris.astro.physik.uni-goettingen.de 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:57:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_PPC Power Macintosh mb312 at jerry ~]$ gcc -v Using built-in specs. Target: powerpc-apple-darwin8 Configured with: /var/tmp/gcc/gcc-5370~2/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=powerpc-apple-darwin8 --host=powerpc-apple-darwin8 --target=powerpc-apple-darwin8 Thread model: posix gcc version 4.0.1 (Apple Computer, Inc. build 5370) [mb312 at jerry ~]$ uname -a Darwin jerry.bic.berkeley.edu 8.11.0 Darwin Kernel Version 8.11.0: Wed Oct 10 18:26:00 PDT 2007; root:xnu-792.24.17~1/RELEASE_PPC Power Macintosh powerpc Best, Matthew From pav at iki.fi Tue Oct 25 15:14:50 2011 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 25 Oct 2011 21:14:50 +0200 Subject: [Numpy-discussion] float128 / longdouble on PPC - is it broken? In-Reply-To: References: Message-ID: 25.10.2011 20:29, Matthew Brett kirjoitti: [clip] > In [7]: (res-1) / 2**32 > Out[7]: 8589934591.9999999998 > > In [8]: np.float((res-1) / 2**32) > Out[8]: 4294967296.0 Looks like a bug in the C library installed on the machine, then. It's either in wontfix territory for us, or in the "cast to doubles before formatting" one. In the latter case, one would have to maintain a list of broken C libraries (ugh). -- Pauli Virtanen From derek at astro.physik.uni-goettingen.de Tue Oct 25 15:18:38 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Tue, 25 Oct 2011 21:18:38 +0200 Subject: [Numpy-discussion] float128 / longdouble on PPC - is it broken? In-Reply-To: References: Message-ID: <10C4EFD5-B407-48AC-B8FF-E864A12A19BB@astro.physik.uni-goettingen.de> Hi, On 25 Oct 2011, at 21:14, Pauli Virtanen wrote: > 25.10.2011 20:29, Matthew Brett kirjoitti: > [clip] >> In [7]: (res-1) / 2**32 >> Out[7]: 8589934591.9999999998 >> >> In [8]: np.float((res-1) / 2**32) >> Out[8]: 4294967296.0 > > Looks like a bug in the C library installed on the machine, then. > > It's either in wontfix territory for us, or in the "cast to doubles > before formatting" one. In the latter case, one would have to maintain a > list of broken C libraries (ugh). as it appears to be a "Tiger-only" problem, probably the former? On 25 Oct 2011, at 21:13, Matthew Brett wrote: > [mb312 at jerry ~]$ uname -a > Darwin jerry.bic.berkeley.edu 8.11.0 Darwin Kernel Version 8.11.0: Wed > Oct 10 18:26:00 PDT 2007; root:xnu-792.24.17~1/RELEASE_PPC Power > Macintosh powerpc Cheers, Derek From matthew.brett at gmail.com Tue Oct 25 15:22:06 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 25 Oct 2011 12:22:06 -0700 Subject: [Numpy-discussion] float128 / longdouble on PPC - is it broken? In-Reply-To: References: Message-ID: Hi, On Tue, Oct 25, 2011 at 12:14 PM, Pauli Virtanen wrote: > 25.10.2011 20:29, Matthew Brett kirjoitti: > [clip] >> In [7]: (res-1) / 2**32 >> Out[7]: 8589934591.9999999998 >> >> In [8]: np.float((res-1) / 2**32) >> Out[8]: 4294967296.0 > > Looks like a bug in the C library installed on the machine, then. > > It's either in wontfix territory for us, or in the "cast to doubles > before formatting" one. In the latter case, one would have to maintain a > list of broken C libraries (ugh). How about a check at import time and a warning when printing? Is that hard to do? See you, Matthew From massimodisasha at gmail.com Tue Oct 25 15:28:13 2011 From: massimodisasha at gmail.com (Massimo Di Stefano) Date: Tue, 25 Oct 2011 15:28:13 -0400 Subject: [Numpy-discussion] skip lines at the end of file with loadtxt Message-ID: i'm tring to generate an array reading a txt file from internet. my target is to use python instead of matlab, to replace this steps in matlab : url=['http://www.cdc.noaa.gov/Correlation/amon.us.long.data']; urlwrite(url,'file.txt'); i'm using this code : urllib.urlretrieve('http://www.cdc.noaa.gov/Correlation/amon.us.long.data', 'file.txt') a = np.loadtxt('file.txt', skiprows=1) but it fails becouse of the txt description at the end of the file, do you know if exist a way to skip the X lines at the end, something like "skipmultiplerows='1,-4'" (to skip the first and the last 4 rows in the file) or i have to use some sort of string manipulation (readlines?) instead ? Thanks! --Massimo -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Tue Oct 25 15:33:47 2011 From: shish at keba.be (Olivier Delalleau) Date: Tue, 25 Oct 2011 15:33:47 -0400 Subject: [Numpy-discussion] skip lines at the end of file with loadtxt In-Reply-To: References: Message-ID: Maybe try genfromtxt instead of loadtxt, it has a skip_footer option. -=- Olivier 2011/10/25 Massimo Di Stefano > i'm tring to generate an array reading a txt file from internet. > > my target is to use python instead of matlab, to replace this steps in > matlab : > > url=['http://www.cdc.noaa.gov/Correlation/amon.us.long.data']; > urlwrite(url,'file.txt'); > > i'm using this code : > > urllib.urlretrieve('http://www.cdc.noaa.gov/Correlation/amon.us.long.data', > 'file.txt') a = np.loadtxt('file.txt', skiprows=1) > > but it fails becouse of the txt description at the end of the file, > > do you know if exist a way to skip the X lines at the end, > > something like "skipmultiplerows='1,-4'" (to skip the first and the last 4 > rows in the file) > > or i have to use some sort of string manipulation (readlines?) instead ? > > Thanks! > > --Massimo > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xscript at gmx.net Tue Oct 25 16:08:59 2011 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Tue, 25 Oct 2011 22:08:59 +0200 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: (Matthew Brett's message of "Tue, 25 Oct 2011 12:04:52 -0700") References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> Message-ID: <87mxcobzqc.fsf@ginnungagap.bsc.es> Matthew Brett writes: [...] >>> If we do value constructive disagreement then we'll go out of our way >>> to talk through the points of contention, and make sure that the >>> people who disagree, especially the minority, feel that they have been >>> fully heard. >>> >>> If we don't value constructive disagreement then we'll let the other >>> side know that further disagreement will be taken as a sign of bad >>> faith. >>> >>> Now - what do you see here? ?I see the second and that worries me. >>> >> >> It is disappointing that you choose not to participate in the thread linked >> above or in the pull request itself.? If I remember correctly, you were >> working on finishing up your dissertation, so I fully understand the time >> constraints involved there.? However, the pull request and the email >> notification is the de facto method of staging and discussing changes in any >> development project.? No objections were raised in that pull request, so it >> went in after some time passed.? To hold off the merge, all one would need >> to do is fire off a quick comment requesting a delay to have a chance to >> review the pull request. > I think the pull-request was not the right vehicle for the discussion, > you think it was, that's fine, I don't think we need to rehearse that. > My question (if you are answering my question) is: if you put yourself > in my or Nathaniel's shoes, would you feel that you had been warmly > encouraged to express disagreement, or would you feel something else. I sense (bear with me, my senses are not very sharp) that you feel your concerns have not been addressed, and thus the sensation that features you disagreed upon were sneaked through a silent pull request. And yes, the initial discussions were too heated on some moments (me included), but that does not imply that the current state is ignoring the concerns everybody raised. >> Luckily, git is a VCS, so we are fully capable of reverting any necessary >> changes if warranted.? If you have any concerns or suggestions for changes >> in the current implementation, feel free to raise them and open additional >> pull requests.? There is no "ganging up" here or any other subterfuge.? Tell >> us exactly what are your issues with the current setup, provide example code >> demonstrating the issues, and we can certainly discuss ways to improve this. > Has the situation changed since the counter-NEP that Nathaniel and I wrote up? I couldn't find the link, but AFAIR the main concerns were: - Using bit patterns as a more efficient missing data mechanism that is compatible with third-party binary libraries. As the NEP says, although not implemented (due to lack of time), bit patterns are a desirable extension that will be able to coexist with masks while providing a single and consistent Python and C API for both bit patterns and masks. - Being able to expose the non-destructive nature of masks. There is only one very specific path leading to such behaviour [1], so users not interested in it should never inadvertently fall into its use (aka, they don't even need to know about it). [1] http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html#creating-na-masked-views If we agree that it is reasonable to think that the concerns in the "counter-NEP" have been addressed in the current implementation, then I think it is not unreasonable to take the silence to Mark's mail and the pull request as a green light. Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From oliphant at enthought.com Tue Oct 25 17:56:53 2011 From: oliphant at enthought.com (Travis Oliphant) Date: Tue, 25 Oct 2011 16:56:53 -0500 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> Message-ID: It is a shame that Nathaniel and perhaps Matthew do not feel like their voice was heard. I wish I could have participated more fully in some of the discussions. I don't know if I could have really helped, but I would have liked to have tried to perhaps work alongside Mark to integrate some of the other ideas that had been expressed during the discussion. Unfortunately, I was traveling in NYC most of the time that Mark was working on this project and did not get a chance to interact with him as much as I would have liked. My view is that we didn't get quite to where I thought we would get, nor where I think we could be. I think Nathaniel and Matthew provided very specific feedback that was helpful in understanding other perspectives of a difficult problem. In particular, I really wanted bit-patterns implemented. However, I also understand that Mark did quite a bit of work and altered his original designs quite a bit in response to community feedback. I wasn't a major part of the pull request discussion, nor did I merge the changes, but I support Charles if he reviewed the code and felt like it was the right thing to do. I likely would have done the same thing rather than let Mark Wiebe's work languish. Merging Mark's code does not mean there is not more work to be done, but it is consistent with the reality that currently development on NumPy happens when people have the time to do it. I have not seen anything to convince me that there is not still time to make specific API changes that address some of the concerns. Perhaps, Nathaniel and or Matthew could summarize their concerns again and if desired submit a pull request to revert the changes. However, there is a definite bias against removing working code unless the arguments are very strong and receive a lot of support from others. Thank you for continuing to voice your opinions even when it may feel that the tide is against you. My view is that we only learn from people who disagree with us. Best regards, -Travis On Oct 25, 2011, at 1:24 PM, Benjamin Root wrote: > On Tue, Oct 25, 2011 at 1:03 PM, Matthew Brett wrote: > Hi, > > On Tue, Oct 25, 2011 at 8:04 AM, Llu?s wrote: > > Matthew Brett writes: > >> I'm afraid I find this whole thread very unpleasant. > > > >> I have the odd impression of being back at high school. Some of the > >> big kids are pushing me around and then the other kids join in. > > > >> It didn't have to be this way. > > > >> Someone could have replied like this to Nathaniel: > > > >> "Oh - yes - I'm sorry - we actually had the discussion on the pull > >> request. Looking back, I see that we didn't flag this up on the > >> mailing list and maybe we should have. Thanks for pointing that out. > >> Maybe we could start another discussion of the API in view of the > >> changes that have gone in". > > > >> But that didn't happen. > > > > Well, I really thought that all the interested parties would take a look at [1]. > > > > While it's true that the pull requests are not obvious if you're not using the > > functionalities of the github web (or unless announced in this list), I think > > that Mark's announcement was precisely directed at having a new round of > > discussions after having some code to play around with and see how intuitive or > > counter-intuitive the implemented concepts could be. > > I just wanted to be clear what I meant. > > The key point is not whether or not the pull-request or request for > testing was in fact the right place for the discussion that Travis > suggested. I guess you can argue that either way. I'd say no, but > I can see how you would disagree on that. > > > This is getting very meta... a disagreement about the disagreement. > > The key point is - how much do we value constructive disagreement? > > > Personally, I value it very much. My impression of the discussion we all had at the beginning was that the needs of the two distinct communities (R-users and masked array users) were both heard and largely addressed. Aspects of both approaches were used, and the final result is, IMHO, inspired and elegant. Is it perfect? No. Are there ways to improve it? Absolutely, and I fully expect that to happen. > > If we do value constructive disagreement then we'll go out of our way > to talk through the points of contention, and make sure that the > people who disagree, especially the minority, feel that they have been > fully heard. > > If we don't value constructive disagreement then we'll let the other > side know that further disagreement will be taken as a sign of bad > faith. > > Now - what do you see here? I see the second and that worries me. > > > It is disappointing that you choose not to participate in the thread linked above or in the pull request itself. If I remember correctly, you were working on finishing up your dissertation, so I fully understand the time constraints involved there. However, the pull request and the email notification is the de facto method of staging and discussing changes in any development project. No objections were raised in that pull request, so it went in after some time passed. To hold off the merge, all one would need to do is fire off a quick comment requesting a delay to have a chance to review the pull request. > > Luckily, git is a VCS, so we are fully capable of reverting any necessary changes if warranted. If you have any concerns or suggestions for changes in the current implementation, feel free to raise them and open additional pull requests. There is no "ganging up" here or any other subterfuge. Tell us exactly what are your issues with the current setup, provide example code demonstrating the issues, and we can certainly discuss ways to improve this. > > Remember, we *all* have a common agreement here. NumPy needs better support for missing data (in whatever form). Let's work from that assumption and make NumPy a better library to use for everybody! > > Cheers! > Ben Root > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Tue Oct 25 17:58:53 2011 From: cournape at gmail.com (David Cournapeau) Date: Tue, 25 Oct 2011 22:58:53 +0100 Subject: [Numpy-discussion] float128 / longdouble on PPC - is it broken? In-Reply-To: References: Message-ID: On Tue, Oct 25, 2011 at 8:22 PM, Matthew Brett wrote: > Hi, > > On Tue, Oct 25, 2011 at 12:14 PM, Pauli Virtanen wrote: >> 25.10.2011 20:29, Matthew Brett kirjoitti: >> [clip] >>> In [7]: (res-1) / 2**32 >>> Out[7]: 8589934591.9999999998 >>> >>> In [8]: np.float((res-1) / 2**32) >>> Out[8]: 4294967296.0 >> >> Looks like a bug in the C library installed on the machine, then. >> >> It's either in wontfix territory for us, or in the "cast to doubles >> before formatting" one. In the latter case, one would have to maintain a >> list of broken C libraries (ugh). > > How about a check at import time and a warning when printing? ?Is that > hard to do? That's fragile IMO. I think that Chuck summed it well: long double are not portable, don't use them unless you have to or you can rely on platform-specificities. I would rather spend some time on implementing/integrating portable quad precision in software, cheers, David From matthew.brett at gmail.com Tue Oct 25 19:49:01 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 25 Oct 2011 16:49:01 -0700 Subject: [Numpy-discussion] float128 / longdouble on PPC - is it broken? In-Reply-To: References: Message-ID: Hi, On Tue, Oct 25, 2011 at 2:58 PM, David Cournapeau wrote: > On Tue, Oct 25, 2011 at 8:22 PM, Matthew Brett wrote: >> Hi, >> >> On Tue, Oct 25, 2011 at 12:14 PM, Pauli Virtanen wrote: >>> 25.10.2011 20:29, Matthew Brett kirjoitti: >>> [clip] >>>> In [7]: (res-1) / 2**32 >>>> Out[7]: 8589934591.9999999998 >>>> >>>> In [8]: np.float((res-1) / 2**32) >>>> Out[8]: 4294967296.0 >>> >>> Looks like a bug in the C library installed on the machine, then. >>> >>> It's either in wontfix territory for us, or in the "cast to doubles >>> before formatting" one. In the latter case, one would have to maintain a >>> list of broken C libraries (ugh). >> >> How about a check at import time and a warning when printing? ?Is that >> hard to do? > > That's fragile IMO. I think that Chuck summed it well: long double are > not portable, don't use them unless you have to or you can rely on > platform-specificities. That reminds me of the old joke about the Irishman giving directions - "If I were you, I wouldn't start from here". > I would rather spend some time on implementing/integrating portable > quad precision in software, I guess from your answer that such a warning would be complicated to implement, and if that's the case, I can imagine it would be low priority. See you, Matthew From matthew.brett at gmail.com Tue Oct 25 20:02:21 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 25 Oct 2011 17:02:21 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> Message-ID: Hi, Thank you for your gracious email. On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant wrote: > It is a shame that Nathaniel and perhaps Matthew do not feel like their > voice was heard. ? I wish I could have participated more fully in some of > the discussions. ?I don't know if I could have really helped, but I would > have liked to have tried to perhaps work alongside Mark to integrate some of > the other ideas that had been expressed during the discussion. > Unfortunately, ?I was traveling in NYC most of the time that Mark was > working on this project and did not get a chance to interact with him as > much as I would have liked. > My view is that we didn't get quite to where I thought we would get, nor > where I think we could be. ?I think Nathaniel and Matthew provided very > specific feedback that was helpful in understanding other perspectives of a > difficult problem. ? ? In particular, I really wanted bit-patterns > implemented. ? ?However, I also understand that Mark did quite a bit of work > and altered his original designs quite a bit in response to community > feedback. ? I wasn't a major part of the pull request discussion, nor did I > merge the changes, but I support Charles if he reviewed the code and felt > like it was the right thing to do. ?I likely would have done the same thing > rather than let Mark Wiebe's work languish. > Merging Mark's code does not mean there is not more work to be done, but it > is consistent with the reality that currently development on NumPy happens > when people have the time to do it. ? ?I have not seen anything to convince > me that there is not still time to make specific API changes that address > some of the concerns. > Perhaps, Nathaniel and or Matthew could summarize their concerns again and > if desired submit a pull request to revert the changes. ? However, there is > a definite bias against removing working code unless the arguments are very > strong and receive a lot of support from others. Honestly - I am not sure whether there is any interest now, in the arguments we made before. If there is, who is interested? I mean, past politeness. I wasn't trying to restart that discussion, because I didn't know what good it could do. At first I was hoping that we could ask whether there was a better way of dealing with disagreements like this. Later it seemed to me that the atmosphere was getting bad, and I wanted to say that because I thought it was important. > Thank you for continuing to voice your opinions even when it may feel that > the tide is against you. ? My view is that we only learn from people who > disagree with us. Thank you for saying that. I hope that y'all will tell me if I am making it harder for you to disagree, and I am sorry if I did so here. Best, Matthew From massimodisasha at gmail.com Tue Oct 25 20:56:35 2011 From: massimodisasha at gmail.com (Massimo Di Stefano) Date: Tue, 25 Oct 2011 20:56:35 -0400 Subject: [Numpy-discussion] skip lines at the end of file with loadtxt In-Reply-To: References: Message-ID: Many thanks Oliver! i missed it in the description, works great :-) --Massimo. Il giorno 25/ott/2011, alle ore 15.33, Olivier Delalleau ha scritto: > Maybe try genfromtxt instead of loadtxt, it has a skip_footer option. > > -=- Olivier > > 2011/10/25 Massimo Di Stefano > i'm tring to generate an array reading a txt file from internet. > my target is to use python instead of matlab, to replace this steps in matlab : > > url=['http://www.cdc.noaa.gov/Correlation/amon.us.long.data']; urlwrite(url,'file.txt'); > > i'm using this code : > > urllib.urlretrieve('http://www.cdc.noaa.gov/Correlation/amon.us.long.data', 'file.txt') a = np.loadtxt('file.txt', skiprows=1) > > but it fails becouse of the txt description at the end of the file, > > do you know if exist a way to skip the X lines at the end, > > something like "skipmultiplerows='1,-4'" (to skip the first and the last 4 rows in the file) > > or i have to use some sort of string manipulation (readlines?) instead ? > > > Thanks! > > --Massimo > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant at enthought.com Tue Oct 25 22:56:26 2011 From: oliphant at enthought.com (Travis Oliphant) Date: Tue, 25 Oct 2011 21:56:26 -0500 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> Message-ID: <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> So, I am very interested in making sure I remember the details of the counterproposal. What I recall is that you wanted to be able to differentiate between a "bit-pattern" mask and a boolean-array mask in the API. I believe currently even when bit-pattern masks are implemented the difference will be "hidden" from the user on the Python level. I am sure to be missing other parts of the discussion as I have been in and out of it. Thanks, -Travis On Oct 25, 2011, at 7:02 PM, Matthew Brett wrote: > Hi, > > Thank you for your gracious email. > > On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant wrote: >> It is a shame that Nathaniel and perhaps Matthew do not feel like their >> voice was heard. I wish I could have participated more fully in some of >> the discussions. I don't know if I could have really helped, but I would >> have liked to have tried to perhaps work alongside Mark to integrate some of >> the other ideas that had been expressed during the discussion. >> Unfortunately, I was traveling in NYC most of the time that Mark was >> working on this project and did not get a chance to interact with him as >> much as I would have liked. >> My view is that we didn't get quite to where I thought we would get, nor >> where I think we could be. I think Nathaniel and Matthew provided very >> specific feedback that was helpful in understanding other perspectives of a >> difficult problem. In particular, I really wanted bit-patterns >> implemented. However, I also understand that Mark did quite a bit of work >> and altered his original designs quite a bit in response to community >> feedback. I wasn't a major part of the pull request discussion, nor did I >> merge the changes, but I support Charles if he reviewed the code and felt >> like it was the right thing to do. I likely would have done the same thing >> rather than let Mark Wiebe's work languish. >> Merging Mark's code does not mean there is not more work to be done, but it >> is consistent with the reality that currently development on NumPy happens >> when people have the time to do it. I have not seen anything to convince >> me that there is not still time to make specific API changes that address >> some of the concerns. >> Perhaps, Nathaniel and or Matthew could summarize their concerns again and >> if desired submit a pull request to revert the changes. However, there is >> a definite bias against removing working code unless the arguments are very >> strong and receive a lot of support from others. > > Honestly - I am not sure whether there is any interest now, in the > arguments we made before. If there is, who is interested? I mean, > past politeness. > > I wasn't trying to restart that discussion, because I didn't know what > good it could do. At first I was hoping that we could ask whether > there was a better way of dealing with disagreements like this. > Later it seemed to me that the atmosphere was getting bad, and I > wanted to say that because I thought it was important. > >> Thank you for continuing to voice your opinions even when it may feel that >> the tide is against you. My view is that we only learn from people who >> disagree with us. > > Thank you for saying that. I hope that y'all will tell me if I am > making it harder for you to disagree, and I am sorry if I did so > here. > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com From efiring at hawaii.edu Wed Oct 26 00:30:17 2011 From: efiring at hawaii.edu (Eric Firing) Date: Tue, 25 Oct 2011 18:30:17 -1000 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> References: <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> Message-ID: <4EA78CD9.4020307@hawaii.edu> On 10/25/2011 04:56 PM, Travis Oliphant wrote: > So, I am very interested in making sure I remember the details of the > counterproposal. What I recall is that you wanted to be able to > differentiate between a "bit-pattern" mask and a boolean-array mask > in the API. I believe currently even when bit-pattern masks are > implemented the difference will be "hidden" from the user on the > Python level. > > I am sure to be missing other parts of the discussion as I have been > in and out of it. > > Thanks, > > -Travis The alternative-NEP is here: https://gist.github.com/1056379/ One thread of discussion is here: http://www.mail-archive.com/numpy-discussion at scipy.org/msg32268.html and continued here: http://www.mail-archive.com/numpy-discussion at scipy.org/msg32371.html Eric From hangenuit at gmail.com Wed Oct 26 02:02:51 2011 From: hangenuit at gmail.com (Han Genuit) Date: Wed, 26 Oct 2011 08:02:51 +0200 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: <4EA78CD9.4020307@hawaii.edu> References: <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <4EA78CD9.4020307@hawaii.edu> Message-ID: There is also: Missing/accumulating data http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057406.html An NA compromise idea -- many-NA http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057408.html NEPaNEP lessons - was: alterNEP http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057435.html NA/Missing Data Conference Call Summary http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057474.html HPC missing data - was: NA/Missing Data Conference Call Summary http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057482.html using the same vocabulary for missing value ideas http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057485.html towards a more productive missing values/masked arrays discussion... http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057511.html miniNEP1: where= argument for ufuncs http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057513.html miniNEP 2: NA support via special dtypes http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057542.html Missing Data development plan http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057567.html Missing Values Discussion http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057579.html NA masks for NumPy are ready to test http://mail.scipy.org/pipermail/numpy-discussion/2011-August/058103.html From grove.steyn at gmail.com Wed Oct 26 03:09:22 2011 From: grove.steyn at gmail.com (=?utf-8?b?R3JvdsOp?=) Date: Wed, 26 Oct 2011 07:09:22 +0000 (UTC) Subject: [Numpy-discussion] np.in1d() capacity limit? Message-ID: I have picked up a strange limit to np.in1d(): ---------- b Out[100]: array(['2007-01-01T02:30:00+0200', '2007-01-01T03:00:00+0200', '2007-01-01T03:30:00+0200', ..., '2008-01-01T01:00:00+0200', '2008-01-01T01:30:00+0200', '2008-01-01T02:00:00+0200'], dtype='datetime64[s]') b.shape Out[101]: (17520,) a = b[0:42] np.in1d(b,a) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /home/grove/ in () ----> 1 np.in1d(b,a) /usr/local/lib/python2.6/dist-packages/numpy/lib/arraysetops.pyc in in1d(ar1, ar2, assume_unique) 338 # here. The values from the first array should always come before 339 # the values from the second array. --> 340 order = ar.argsort(kind='mergesort') 341 sar = ar[order] 342 equal_adj = (sar[1:] == sar[:-1]) TypeError: requested sort not available for type But this works: a = b[0:41] np.in1d(b,a) Out[105]: array([ True, True, True, ..., False, False, False], dtype=bool) --------- In other words the limit seems to be 41 elements for a. Is this a bug or am I getting something wrong? Grov? From cournape at gmail.com Wed Oct 26 03:33:59 2011 From: cournape at gmail.com (David Cournapeau) Date: Wed, 26 Oct 2011 08:33:59 +0100 Subject: [Numpy-discussion] float128 / longdouble on PPC - is it broken? In-Reply-To: References: Message-ID: On Wed, Oct 26, 2011 at 12:49 AM, Matthew Brett wrote: > That reminds me of the old joke about the Irishman giving directions - > "If I were you, I wouldn't start from here". Sounds about accurate `1 > >> I would rather spend some time on implementing/integrating portable >> quad precision in software, > > I guess from your answer that such a warning would be complicated to > implement, and if that's the case, I can imagine it would be low > priority. No, it would not be hard to implement, but I don't think it is a nice solution, because it is very platform dependent, so it will likely bitrot as it won't appear on usual platforms. David From njs at pobox.com Wed Oct 26 04:07:01 2011 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 26 Oct 2011 01:07:01 -0700 Subject: [Numpy-discussion] float128 / longdouble on PPC - is it broken? In-Reply-To: References: Message-ID: On Tue, Oct 25, 2011 at 4:49 PM, Matthew Brett wrote: > I guess from your answer that such a warning would be complicated to > implement, and if that's the case, I can imagine it would be low > priority. I assume the problem is more that it would be a weirdo check that becomes a maintenance burden ("what is this doing here? Do we still need it? who knows?") than that it would be hard to do. You can easily do it yourself as a workaround... if not str(np.longdouble(2)**64 - 1).startswith("1844"): warn("Printing of longdoubles is fubared! Beware! Beware!") -- Nathaniel From pav at iki.fi Wed Oct 26 06:00:00 2011 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 26 Oct 2011 12:00:00 +0200 Subject: [Numpy-discussion] np.in1d() capacity limit? In-Reply-To: References: Message-ID: 26.10.2011 09:09, Grov? kirjoitti: > I have picked up a strange limit to np.in1d(): > ---------- > > b > Out[100]: > array(['2007-01-01T02:30:00+0200', '2007-01-01T03:00:00+0200', > '2007-01-01T03:30:00+0200', ..., '2008-01-01T01:00:00+0200', > '2008-01-01T01:30:00+0200', '2008-01-01T02:00:00+0200'], > dtype='datetime64[s]') > > b.shape > Out[101]: (17520,) > > a = b[0:42] > > np.in1d(b,a) > --------------------------------------------------------------------------- > TypeError Traceback (most recent call last) [clip] > In other words the limit seems to be 41 elements for a. Is this a bug or am I > getting something wrong? The problem here seems to be that argsort (or only the mergesort?) for datetime datatypes is not implemented. There's a faster code path that is triggered for small selection arrays, and that does not require argsort, and that's why the error occurs in only some of the cases. -- Pauli Virtanen From matthew.brett at gmail.com Wed Oct 26 11:52:32 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 26 Oct 2011 08:52:32 -0700 Subject: [Numpy-discussion] float128 / longdouble on PPC - is it broken? In-Reply-To: References: Message-ID: Hi, On Wed, Oct 26, 2011 at 1:07 AM, Nathaniel Smith wrote: > On Tue, Oct 25, 2011 at 4:49 PM, Matthew Brett wrote: >> I guess from your answer that such a warning would be complicated to >> implement, and if that's the case, I can imagine it would be low >> priority. > > I assume the problem is more that it would be a weirdo check that > becomes a maintenance burden ("what is this doing here? Do we still > need it? who knows?") than that it would be hard to do. > > You can easily do it yourself as a workaround... > > if not str(np.longdouble(2)**64 - 1).startswith("1844"): > ?warn("Printing of longdoubles is fubared! Beware! Beware!") Thanks - yes - I was only thinking of someone like me getting confused and thinking badly of us if they run into this. See you, Matthew From matthew.brett at gmail.com Wed Oct 26 11:56:16 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 26 Oct 2011 08:56:16 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> Message-ID: Hi, On Tue, Oct 25, 2011 at 7:56 PM, Travis Oliphant wrote: > So, I am very interested in making sure I remember the details of the counterproposal. ? ?What I recall is that you wanted to be able to differentiate between a "bit-pattern" mask and a boolean-array mask in the API. ? I believe currently even when bit-pattern masks are implemented the difference will be "hidden" from the user on the Python level. > > I am sure to be missing other parts of the discussion as I have been in and out of it. Nathaniel - are you online today? Do you have time to review the current implementation and see if it affects the initial discussion? I'm running around most of today but I should have time to do some thinking later this afternoon CA time. See you, Matthew From pav at iki.fi Wed Oct 26 12:41:05 2011 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 26 Oct 2011 18:41:05 +0200 Subject: [Numpy-discussion] float128 / longdouble on PPC - is it broken? In-Reply-To: References: Message-ID: 26.10.2011 10:07, Nathaniel Smith kirjoitti: > On Tue, Oct 25, 2011 at 4:49 PM, Matthew Brett wrote: >> I guess from your answer that such a warning would be complicated to >> implement, and if that's the case, I can imagine it would be low >> priority. > > I assume the problem is more that it would be a weirdo check that > becomes a maintenance burden ("what is this doing here? Do we still > need it? who knows?") than that it would be hard to do. This check should be done compile-time, and set a flag like BROKEN_LONG_DOUBLE_FORMATTING for compilation. IIRC, we already have some related workarounds in the code (for Windows). The main argument is, (i) it's not our bug, (ii) adding workarounds for broken platforms should be weighed having the commonness of the issue in mind. But I'll admit that someone could have got into work and implemented the workaround in the same time it has taken to write mails to this thread :) Pauli From cournape at gmail.com Thu Oct 27 09:02:04 2011 From: cournape at gmail.com (David Cournapeau) Date: Thu, 27 Oct 2011 14:02:04 +0100 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? Message-ID: Hi, I was wondering if we could finally move to a more recent version of compilers for official win32 installers. This would of course concern the next release cycle, not the ones where beta/rc are already in progress. Basically, the pros: - we will have to move at some point - gcc 4.* seem less buggy, especially C++ and fortran. - no need to maintain msvcr90 vodoo The cons: - it will most likely break the ABI - we need to recompile atlas (but I can take care of it) - the biggest: it is difficult to combine gfortran with visual studio (more exactly you cannot link gfortran runtime to a visual studio executable). The only solution I could think of would be to recompile the gfortran runtime with Visual Studio, which for some reason does not sound very appealing :) Thoughts ? cheers, David From numpy-discussion at maubp.freeserve.co.uk Thu Oct 27 09:16:10 2011 From: numpy-discussion at maubp.freeserve.co.uk (Peter) Date: Thu, 27 Oct 2011 14:16:10 +0100 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: References: Message-ID: On Thu, Oct 27, 2011 at 2:02 PM, David Cournapeau wrote: > > Hi, > > I was wondering if we could finally move to a more recent version of > compilers for official win32 installers. This would of course concern > the next release cycle, not the ones where beta/rc are already in > progress. > > Basically, the pros: > ?- we will have to move at some point > ?- gcc 4.* seem less buggy, especially C++ and fortran. > ?- no need to maintain msvcr90 vodoo > The cons: > ?- it will most likely break the ABI > ?- we need to recompile atlas (but I can take care of it) > ?- the biggest: it is difficult to combine gfortran with visual > studio (more exactly you cannot link gfortran runtime to a visual > studio executable). The only solution I could think of would be to > recompile the gfortran runtime with Visual Studio, which for some > reason does not sound very appealing :) > > Thoughts ? Does this make any difference for producing 64bit Windows installers? Peter From cournape at gmail.com Thu Oct 27 10:04:13 2011 From: cournape at gmail.com (David Cournapeau) Date: Thu, 27 Oct 2011 15:04:13 +0100 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: References: Message-ID: On Thu, Oct 27, 2011 at 2:16 PM, Peter wrote: > On Thu, Oct 27, 2011 at 2:02 PM, David Cournapeau wrote: >> >> Hi, >> >> I was wondering if we could finally move to a more recent version of >> compilers for official win32 installers. This would of course concern >> the next release cycle, not the ones where beta/rc are already in >> progress. >> >> Basically, the pros: >> ?- we will have to move at some point >> ?- gcc 4.* seem less buggy, especially C++ and fortran. >> ?- no need to maintain msvcr90 vodoo >> The cons: >> ?- it will most likely break the ABI >> ?- we need to recompile atlas (but I can take care of it) >> ?- the biggest: it is difficult to combine gfortran with visual >> studio (more exactly you cannot link gfortran runtime to a visual >> studio executable). The only solution I could think of would be to >> recompile the gfortran runtime with Visual Studio, which for some >> reason does not sound very appealing :) >> >> Thoughts ? > > Does this make any difference for producing 64bit Windows > installers? I have not tried in a long time to compile numpy/scipy with mingw 64 bits, but being able to use the same gcc line on 32 bits (4.*, 3.* does not support 64 bits) should not hurt. cheers, David From Jim.Vickroy at noaa.gov Thu Oct 27 10:15:11 2011 From: Jim.Vickroy at noaa.gov (Jim Vickroy) Date: Thu, 27 Oct 2011 08:15:11 -0600 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: References: Message-ID: <4EA9676F.70701@noaa.gov> On 10/27/2011 7:02 AM, David Cournapeau wrote: > Hi, > > I was wondering if we could finally move to a more recent version of > compilers for official win32 installers. This would of course concern > the next release cycle, not the ones where beta/rc are already in > progress. > > Basically, the pros: > - we will have to move at some point > - gcc 4.* seem less buggy, especially C++ and fortran. > - no need to maintain msvcr90 vodoo > The cons: > - it will most likely break the ABI > - we need to recompile atlas (but I can take care of it) > - the biggest: it is difficult to combine gfortran with visual > studio (more exactly you cannot link gfortran runtime to a visual > studio executable). The only solution I could think of would be to > recompile the gfortran runtime with Visual Studio, which for some > reason does not sound very appealing :) > > Thoughts ? > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Hi David, What is the "msvcr90 vodoo" you are referring to? Thanks for your great efforts on this project. -- jv From cournape at gmail.com Thu Oct 27 10:21:06 2011 From: cournape at gmail.com (David Cournapeau) Date: Thu, 27 Oct 2011 15:21:06 +0100 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: <4EA9676F.70701@noaa.gov> References: <4EA9676F.70701@noaa.gov> Message-ID: On Thu, Oct 27, 2011 at 3:15 PM, Jim Vickroy wrote: > > Hi David, > > What is the "msvcr90 vodoo" you are referring to? gcc 3.* versions don't have stubs to link against recent versions of MS C runtime, so we have to build them by ourselves. 4.x series don't have this issue, cheers, David From josef.pktd at gmail.com Thu Oct 27 12:18:30 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 27 Oct 2011 12:18:30 -0400 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: References: Message-ID: On Thu, Oct 27, 2011 at 9:02 AM, David Cournapeau wrote: > Hi, > > I was wondering if we could finally move to a more recent version of > compilers for official win32 installers. This would of course concern > the next release cycle, not the ones where beta/rc are already in > progress. > > Basically, the pros: > ?- we will have to move at some point > ?- gcc 4.* seem less buggy, especially C++ and fortran. > ?- no need to maintain msvcr90 vodoo > The cons: > ?- it will most likely break the ABI > ?- we need to recompile atlas (but I can take care of it) > ?- the biggest: it is difficult to combine gfortran with visual > studio (more exactly you cannot link gfortran runtime to a visual > studio executable). The only solution I could think of would be to > recompile the gfortran runtime with Visual Studio, which for some > reason does not sound very appealing :) What does the last mean in practice? (definition of linking in this case?) If numpy and scipy are compiled with MingW gcc 4.*, then it cannot be used with the standard python? Or does it just mean we cannot combine fortran extensions that are build with MingW with extension build with visual studio? another example: would Matplotlib compiled against visual studio work with a new MingW compiled numpy? I guess that's what the ABI break would prevent? Since we will have to update MingW sooner or later anyway, I'm in favor of doing it. And given the comments on the mailing list about the Linux transition to gfortran, I expect that the transition will take some time. Thanks for your successful effort that installation on Windows was without problems for years for a user like me. Josef > > Thoughts ? > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ralf.gommers at googlemail.com Thu Oct 27 12:19:46 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 27 Oct 2011 18:19:46 +0200 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: References: Message-ID: Hi David, On Thu, Oct 27, 2011 at 3:02 PM, David Cournapeau wrote: > Hi, > > I was wondering if we could finally move to a more recent version of > compilers for official win32 installers. This would of course concern > the next release cycle, not the ones where beta/rc are already in > progress. > > Basically, the pros: > - we will have to move at some point > - gcc 4.* seem less buggy, especially C++ and fortran. > - no need to maintain msvcr90 vodoo > The cons: > - it will most likely break the ABI > - we need to recompile atlas (but I can take care of it) > - the biggest: it is difficult to combine gfortran with visual > studio (more exactly you cannot link gfortran runtime to a visual > studio executable). The only solution I could think of would be to > recompile the gfortran runtime with Visual Studio, which for some > reason does not sound very appealing :) > To get the datetime changes to work with MinGW, we already concluded that building with 4.x is more or less required (without recognizing some of the points you list above). Changes to mingw32ccompiler to fix compilation with 4.x went in in https://github.com/numpy/numpy/pull/156. It would be good if you could check those. It probably makes sense make this move for numpy 1.7. If this breaks the ABI then it would be easiest to make numpy 1.7 the minimum required version for scipy 0.11. The gfortran + VS issue sounds painful though. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Thu Oct 27 12:34:28 2011 From: cournape at gmail.com (David Cournapeau) Date: Thu, 27 Oct 2011 17:34:28 +0100 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: References: Message-ID: On Thu, Oct 27, 2011 at 5:18 PM, wrote: > On Thu, Oct 27, 2011 at 9:02 AM, David Cournapeau wrote: >> Hi, >> >> I was wondering if we could finally move to a more recent version of >> compilers for official win32 installers. This would of course concern >> the next release cycle, not the ones where beta/rc are already in >> progress. >> >> Basically, the pros: >> ?- we will have to move at some point >> ?- gcc 4.* seem less buggy, especially C++ and fortran. >> ?- no need to maintain msvcr90 vodoo >> The cons: >> ?- it will most likely break the ABI >> ?- we need to recompile atlas (but I can take care of it) >> ?- the biggest: it is difficult to combine gfortran with visual >> studio (more exactly you cannot link gfortran runtime to a visual >> studio executable). The only solution I could think of would be to >> recompile the gfortran runtime with Visual Studio, which for some >> reason does not sound very appealing :) > > What does the last mean in practice? (definition of linking in this case?) > If numpy and scipy are compiled with MingW gcc 4.*, then it cannot be > used with the standard python? It would of course work with the standard python, it would be rather useless otherwise :) The main difference is that you could build numpy/scipy with MS compilers for C/C++ code and use g77 for the fortran part. This has always been hackish, but kinda worked. Not so much with gfortran anymore. IOW, the choice for people building extensions on top of numpy becomes full MingW or full MS/Intel compilers. > Or does it just mean we cannot combine fortran extensions that are > build with MingW with extension build with visual studio? I hope that you could mix them as long as the extension does not contain any fortran code. > > another example: would Matplotlib compiled against visual studio work > with a new MingW compiled numpy? I guess that's what the ABI break > would prevent? This, I am not so sure, this would need testing. > Since we will have to update MingW sooner or later anyway, I'm in > favor of doing it. And given the comments on the mailing list about > the Linux transition to gfortran, I expect that the transition will > take some time. > > Thanks for your successful effort that installation on Windows was > without problems for years for a user like me. You're welcome ! David From matthew.brett at gmail.com Thu Oct 27 20:31:52 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 27 Oct 2011 17:31:52 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> Message-ID: Hi, On Tue, Oct 25, 2011 at 7:56 PM, Travis Oliphant wrote: > So, I am very interested in making sure I remember the details of the counterproposal. ? ?What I recall is that you wanted to be able to differentiate between a "bit-pattern" mask and a boolean-array mask in the API. ? I believe currently even when bit-pattern masks are implemented the difference will be "hidden" from the user on the Python level. > > I am sure to be missing other parts of the discussion as I have been in and out of it. The ideas -------------- The question that we were addressing in the alter-NEP was: should missing values implemented as bitpatterns appear to be the same as missing values implemented with masks? We said no, and Mark said yes. To restate the argument in brief; Nathaniel and I and some others thought that there were two separable ideas in play: 1) A value that is finally and completely missing. == ABSENT 2) A value that we would like to ignore for the moment but might want back at some future time == IGNORED (I'm using the adjectives ABSENT and IGNORED here to be short for the objects 'absent value' and 'ignored value'. This is to distinguish from the verbs below). We thought bitpatterns were a good match for the former, and masking was a good match for the latter. We all agreed there were two things you might like to do with values that were missing in both senses above: A) PROPAGATE; V + 1 == V B) SKIP; K + 1 == 1 (Note verbs for the behaviors). I believe the original np.ma masked arrays always SKIP. In [2]: a = np.ma.masked_array? In [3]: a = np.ma.masked_array([99, 2], mask=[True, False]) In [4]: a Out[4]: masked_array(data = [-- 2], mask = [ True False], fill_value = 999999) In [5]: a.sum() Out[5]: 2 There was some discussion as to whether there was a reason to think that ABSENT should always or by default PROPAGATE, and IGNORED should always or by default SKIP. Chuck is referring to this idea when he said further up this thread: > For instance, I'm thinking skipna=1 is the natural default for the masked arrays. The current implementation --------------------------------------- What we have now is an implementation of masked arrays, but more tightly integrated into the numpy core. In our language we have an implementation of IGNORED that is tuned to be nearly indistinguishable from the behavior we are expecting of ABSENT. Specifically, once you have done this: In [9]: a = np.array([99, 2], maskna=True) you can get something representing the mask: In [11]: np.isna(a) Out[11]: array([False, False], dtype=bool) but I believe there is no way of setting the mask directly. In order to set the mask, you have to do what looks like an assignment: In [12]: a[0] = np.NA In [14]: a Out[14]: array([NA, 2]) In fact, what has happened is the mask has changed, but the underlying value has not: In [18]: orig = np.array([99, 2]) In [19]: a = orig.view(maskna=True) In [20]: a[0] = np.NA In [21]: a Out[21]: array([NA, 2]) In [22]: orig Out[22]: array([99, 2]) This is different from real assignment: In [23]: a[0] = 0 In [24]: a Out[24]: array([0, 2], maskna=True) In [25]: orig Out[25]: array([0, 2]) Some effort has gone into making it difficult to pull off the mask: In [30]: a.view(np.int64) Out[30]: array([NA, 2]) In [31]: a.view(np.int64).flags Out[31]: C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : False MASKNA : True OWNMASKNA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False In [32]: a.astype(np.int64) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /home/mb312/ in () ----> 1 a.astype(np.int64) ValueError: Cannot assign NA to an array which does not support NAs The default behavior of the masked values is PROPAGATE, but they can be individually made to SKIP: In [28]: a.sum() # PROPAGATE Out[28]: NA(dtype='int64') In [29]: a.sum(skipna=True) # SKIP Out[29]: 2 Where's the beef? ------------------------- I personally still think that it is confusing to fuse the concept of: 1) Masked arrays 2) Arrays with bitpattern codes for missing and the concepts of A) ABSENT and B) IGNORED Consequences for current code -------------------------------------------- Specifically, it still seems to me to make sense to prefer this: >> a = np.array([99, 2[, masking=True) >> a.mask [ True, True ] >> a.sum() 101 >> a.mask[0] = False >> a.sum() 2 It might make sense, as Chuck suggests, to change the default to 'skipna=True', and I'd further suggest renaming np.NA to np.IGNORED and 'skipna' to skipignored' for clarity. I still think the pseudo-assignment: In [20]: a[0] = np.NA is confusing, and should be removed. Later, should we ever have bitpatterns, there would be something like np.ABSENT. This of course would make sense for assignment: In [20]: a[0] = np.ABSENT There would be another keyword argument 'skipabsent=False' such that, when this is False, the ABSENT values propagate. Honestly, I think that NA should be a synonym for ABSENT, and so should be removed until the dust has settled, and restored as (np.NA == np.ABSENT) And I think, these two ideas, of masking / IGNORED and bitpattern / ABSENT, would be much easier to explain. That's my best shot. Matthew From oliphant at enthought.com Thu Oct 27 21:16:24 2011 From: oliphant at enthought.com (Travis Oliphant) Date: Thu, 27 Oct 2011 20:16:24 -0500 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> Message-ID: <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> That is a pretty good explanation. I find myself convinced by Matthew's arguments. I think that being able to separate ABSENT from IGNORED is a good idea. I also like being able to control SKIP and PROPAGATE (but I think the current implementation allows this already). What is the counter-argument to this proposal? -Travis On Oct 27, 2011, at 7:31 PM, Matthew Brett wrote: > Hi, > > On Tue, Oct 25, 2011 at 7:56 PM, Travis Oliphant wrote: >> So, I am very interested in making sure I remember the details of the counterproposal. What I recall is that you wanted to be able to differentiate between a "bit-pattern" mask and a boolean-array mask in the API. I believe currently even when bit-pattern masks are implemented the difference will be "hidden" from the user on the Python level. >> >> I am sure to be missing other parts of the discussion as I have been in and out of it. > > The ideas > -------------- > > The question that we were addressing in the alter-NEP was: should > missing values implemented as bitpatterns appear to be the same as > missing values implemented with masks? We said no, and Mark said yes. > > To restate the argument in brief; Nathaniel and I and some others > thought that there were two separable ideas in play: > > 1) A value that is finally and completely missing. == ABSENT > 2) A value that we would like to ignore for the moment but might want > back at some future time == IGNORED > > (I'm using the adjectives ABSENT and IGNORED here to be short for the > objects 'absent value' and 'ignored value'. This is to distinguish > from the verbs below). > > We thought bitpatterns were a good match for the former, and masking > was a good match for the latter. > > We all agreed there were two things you might like to do with values > that were missing in both senses above: > > A) PROPAGATE; V + 1 == V > B) SKIP; K + 1 == 1 > > (Note verbs for the behaviors). > > I believe the original np.ma masked arrays always SKIP. > > In [2]: a = np.ma.masked_array? > In [3]: a = np.ma.masked_array([99, 2], mask=[True, False]) > In [4]: a > Out[4]: > masked_array(data = [-- 2], > mask = [ True False], > fill_value = 999999) > In [5]: a.sum() > Out[5]: 2 > > There was some discussion as to whether there was a reason to think > that ABSENT should always or by default PROPAGATE, and IGNORED should > always or by default SKIP. Chuck is referring to this idea when he > said further up this thread: > >> For instance, I'm thinking skipna=1 is the natural default for the masked arrays. > > The current implementation > --------------------------------------- > > What we have now is an implementation of masked arrays, but more > tightly integrated into the numpy core. In our language we have an > implementation of IGNORED that is tuned to be nearly indistinguishable > from the behavior we are expecting of ABSENT. > > Specifically, once you have done this: > > In [9]: a = np.array([99, 2], maskna=True) > > you can get something representing the mask: > > In [11]: np.isna(a) > Out[11]: array([False, False], dtype=bool) > > but I believe there is no way of setting the mask directly. In order > to set the mask, you have to do what looks like an assignment: > > In [12]: a[0] = np.NA > In [14]: a > Out[14]: array([NA, 2]) > > In fact, what has happened is the mask has changed, but the underlying > value has not: > > In [18]: orig = np.array([99, 2]) > > In [19]: a = orig.view(maskna=True) > > In [20]: a[0] = np.NA > > In [21]: a > Out[21]: array([NA, 2]) > > In [22]: orig > Out[22]: array([99, 2]) > > This is different from real assignment: > > In [23]: a[0] = 0 > > In [24]: a > Out[24]: array([0, 2], maskna=True) > > In [25]: orig > Out[25]: array([0, 2]) > > Some effort has gone into making it difficult to pull off the mask: > > In [30]: a.view(np.int64) > Out[30]: array([NA, 2]) > > In [31]: a.view(np.int64).flags > Out[31]: > C_CONTIGUOUS : True > F_CONTIGUOUS : True > OWNDATA : False > MASKNA : True > OWNMASKNA : False > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > In [32]: a.astype(np.int64) > --------------------------------------------------------------------------- > ValueError Traceback (most recent call last) > /home/mb312/ in () > ----> 1 a.astype(np.int64) > > ValueError: Cannot assign NA to an array which does not support NAs > > The default behavior of the masked values is PROPAGATE, but they can > be individually made to SKIP: > > In [28]: a.sum() # PROPAGATE > Out[28]: NA(dtype='int64') > > In [29]: a.sum(skipna=True) # SKIP > Out[29]: 2 > > Where's the beef? > ------------------------- > > I personally still think that it is confusing to fuse the concept of: > > 1) Masked arrays > 2) Arrays with bitpattern codes for missing > > and the concepts of > > A) ABSENT and > B) IGNORED > > Consequences for current code > -------------------------------------------- > > Specifically, it still seems to me to make sense to prefer this: > >>> a = np.array([99, 2[, masking=True) >>> a.mask > [ True, True ] >>> a.sum() > 101 >>> a.mask[0] = False >>> a.sum() > 2 > > It might make sense, as Chuck suggests, to change the default to > 'skipna=True', and I'd further suggest renaming np.NA to np.IGNORED > and 'skipna' to skipignored' for clarity. > > I still think the pseudo-assignment: > > In [20]: a[0] = np.NA > > is confusing, and should be removed. > > Later, should we ever have bitpatterns, there would be something like > np.ABSENT. This of course would make sense for assignment: > > In [20]: a[0] = np.ABSENT > > There would be another keyword argument 'skipabsent=False' such that, > when this is False, the ABSENT values propagate. > > Honestly, I think that NA should be a synonym for ABSENT, and so > should be removed until the dust has settled, and restored as (np.NA > == np.ABSENT) > > And I think, these two ideas, of masking / IGNORED and bitpattern / > ABSENT, would be much easier to explain. > > That's my best shot. > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com From charlesr.harris at gmail.com Thu Oct 27 22:08:37 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 27 Oct 2011 20:08:37 -0600 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> Message-ID: On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant wrote: > That is a pretty good explanation. I find myself convinced by Matthew's > arguments. I think that being able to separate ABSENT from IGNORED is a > good idea. I also like being able to control SKIP and PROPAGATE (but I > think the current implementation allows this already). > > What is the counter-argument to this proposal? > > What exactly do you find convincing? The current masks propagate by default: In [1]: a = ones(5, maskna=1) In [2]: a[2] = NA In [3]: a Out[3]: array([ 1., 1., NA, 1., 1.]) In [4]: a + 1 Out[4]: array([ 2., 2., NA, 2., 2.]) In [5]: a[2] = 10 In [5]: a Out[5]: array([ 1., 1., 10., 1., 1.], maskna=True) I don't see an essential difference between the implementation using masks and one using bit patterns, the mask when attached to the original array just adds a bit pattern by extending all the types by one byte, an approach that easily extends to all existing and future types, which is why Mark went that way for the first implementation given the time available. The masks are hidden because folks wanted something that behaved more like R and also because of the desire to combine the missing, ignore, and later possibly bit patterns in a unified manner. Note that the pseudo assignment was also meant to look like R. Adding true bit patterns to numpy isn't trivial and I believe Mark was thinking of parametrized types for that. The main problems I see with masks are unified storage and possibly memory use. The rest is just behavor and desired API and that can be adjusted within the current implementation. There is nothing essentially masky about masks. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant at enthought.com Thu Oct 27 22:51:20 2011 From: oliphant at enthought.com (Travis Oliphant) Date: Thu, 27 Oct 2011 21:51:20 -0500 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> Message-ID: <4A15979D-CC34-402B-9DBD-2B8AC7AFA044@enthought.com> As I mentioned. I find the ability to separate an ABSENT idea from an IGNORED idea convincing. In other words, I think distinguishing between masks and bit-patterns is not just an implementation detail, but provides a useful concept for multiple use-cases. I understand exactly what it would take to add bit-patterns to NumPy. I also understand what Mark did and agree that it is possible to add Matthew's idea to the current code-base. I think it is worth exploring -Travis On Oct 27, 2011, at 9:08 PM, Charles R Harris wrote: > > > On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant wrote: > That is a pretty good explanation. I find myself convinced by Matthew's arguments. I think that being able to separate ABSENT from IGNORED is a good idea. I also like being able to control SKIP and PROPAGATE (but I think the current implementation allows this already). > > What is the counter-argument to this proposal? > > > What exactly do you find convincing? The current masks propagate by default: > > In [1]: a = ones(5, maskna=1) > > In [2]: a[2] = NA > > In [3]: a > Out[3]: array([ 1., 1., NA, 1., 1.]) > > In [4]: a + 1 > Out[4]: array([ 2., 2., NA, 2., 2.]) > > In [5]: a[2] = 10 > > In [5]: a > Out[5]: array([ 1., 1., 10., 1., 1.], maskna=True) > > > I don't see an essential difference between the implementation using masks and one using bit patterns, the mask when attached to the original array just adds a bit pattern by extending all the types by one byte, an approach that easily extends to all existing and future types, which is why Mark went that way for the first implementation given the time available. The masks are hidden because folks wanted something that behaved more like R and also because of the desire to combine the missing, ignore, and later possibly bit patterns in a unified manner. Note that the pseudo assignment was also meant to look like R. Adding true bit patterns to numpy isn't trivial and I believe Mark was thinking of parametrized types for that. > > The main problems I see with masks are unified storage and possibly memory use. The rest is just behavor and desired API and that can be adjusted within the current implementation. There is nothing essentially masky about masks. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Fri Oct 28 01:56:22 2011 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 28 Oct 2011 00:56:22 -0500 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> Message-ID: On Thursday, October 27, 2011, Charles R Harris wrote: > > > On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant wrote: >> >> That is a pretty good explanation. I find myself convinced by Matthew's arguments. I think that being able to separate ABSENT from IGNORED is a good idea. I also like being able to control SKIP and PROPAGATE (but I think the current implementation allows this already). >> >> What is the counter-argument to this proposal? >> > > What exactly do you find convincing? The current masks propagate by default: > > In [1]: a = ones(5, maskna=1) > > In [2]: a[2] = NA > > In [3]: a > Out[3]: array([ 1., 1., NA, 1., 1.]) > > In [4]: a + 1 > Out[4]: array([ 2., 2., NA, 2., 2.]) > > In [5]: a[2] = 10 > > In [5]: a > Out[5]: array([ 1., 1., 10., 1., 1.], maskna=True) > > > I don't see an essential difference between the implementation using masks and one using bit patterns, the mask when attached to the original array just adds a bit pattern by extending all the types by one byte, an approach that easily extends to all existing and future types, which is why Mark went that way for the first implementation given the time available. The masks are hidden because folks wanted something that behaved more like R and also because of the desire to combine the missing, ignore, and later possibly bit patterns in a unified manner. Note that the pseudo assignment was also meant to look like R. Adding true bit patterns to numpy isn't trivial and I believe Mark was thinking of parametrized types for that. > > The main problems I see with masks are unified storage and possibly memory use. The rest is just behavor and desired API and that can be adjusted within the current implementation. There is nothing essentially masky about masks. > > Chuck > > I think chuck sums it up quite nicely. The implementation detail about using mask versus bit patterns can still be discussed and addressed. Personally, I just don't see how parameterized dtypes would be easier to use than the pseudo assignment. The elegance of mark's solution was to consider the treatment of missing data in a unified manner. This puts missing data in a more prominent spot for extension builders, which should greatly improve support throughout the ecosystem. By letting there be a single missing data framework (instead of two) all that users need to figure out is when they want nan-like behavior (propagate) or to be more like masks (skip). Numpy takes care of the rest. There is a reason why I like using masked arrays because I don't have to use nansum in my library functions to guard against the possibility of receiving nans. Duck-typing is a good thing. My argument against separating IGNORE and PROPAGATE is that it becomes too tempting to want to mix these in an array, but the desired behavior would likely become ambiguous.. There is one other proplem that I just thought of that I don't think has been outlined in either NEP. What if I perform an operation between an array set up with propagate NAs and an array with skip NAs? cheers, Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From hangenuit at gmail.com Fri Oct 28 02:22:00 2011 From: hangenuit at gmail.com (Han Genuit) Date: Fri, 28 Oct 2011 08:22:00 +0200 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> Message-ID: There is a way to assign whole masks in the current implementation: >>> a = np.arange(9, maskna=True).reshape((3,3)) >>> a array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) >>> mask = np.array([[False, False, True], [False, True, False], [True, False, True]]) >>> np.copyto(a, np.NA, where=mask) >>> a array([[0, 1, NA], [3, NA, 5], [NA, 7, NA]]) I think the "ValueError: Cannot assign NA to an array which does not support NAs" when trying to copy an array with a mask to an array without a mask is a bug.. >>> a = np.arange(9, maskna=True).reshape((3,3)) >>> a.flags.maskna True >>> b = a.copy(maskna=False) >>> b.flags.maskna False It should be possible to remove a mask when copying an array. From ben.root at ou.edu Fri Oct 28 02:45:50 2011 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 28 Oct 2011 01:45:50 -0500 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> Message-ID: > It should be possible to remove a mask when copying an array. > This was a concession on the part of those pushing for masks. Eventually, I ended up realizing that it resulted in a stronger design. Consider the following: foo(a[4:10]) Should function foo be able to access the rest of array "a", even though it has a part of it? Of course not! Now, if one considers masking as a form of advanced slicing, then it wouldnt make sense for foo() to be able to access parts it wasn't given. That being said, this is where NumPy array views come into play. You can create a view of the original data, add masks to the view, and still have access to all of the original data, unmasked. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From hangenuit at gmail.com Fri Oct 28 04:45:09 2011 From: hangenuit at gmail.com (Han Genuit) Date: Fri, 28 Oct 2011 10:45:09 +0200 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> Message-ID: Yes, to further iterate on that, you can also create multiple masked views with each its own mask properties. It would be ambiguous to mix a bit-pattern NA together with standard NA's in the same mask, but you can make different specialized masked views on the same data. Also, I like the short and concise abbreviation for 'Not Applicable', NA. It has more common uses than IGNORE. (See also here: http://www.johndcook.com/R_language_for_programmers.html#missing) Concerning the assignment, it is a bit implicit, I agree, but the representation and application of masks is also implicit. I think you only have to know that NA will be a mask assignment and not a data assignment. From stefan at sun.ac.za Fri Oct 28 05:11:40 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Fri, 28 Oct 2011 02:11:40 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: <4A15979D-CC34-402B-9DBD-2B8AC7AFA044@enthought.com> References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> <4A15979D-CC34-402B-9DBD-2B8AC7AFA044@enthought.com> Message-ID: Hi all, On Thu, Oct 27, 2011 at 7:51 PM, Travis Oliphant wrote: > I understand exactly what it would take to add bit-patterns to NumPy. ?I > also understand what Mark did and agree that it is possible to add Matthew's > idea to the current code-base. ?I think it is worth exploring Another data point: I've been spending some time on scikits-image recently, and although masked values would be highly useful in that context, the cost of doubling memory use (for uint8 images, e.g.) is too high. Many users with large data sets (and I think almost all researchers working on >2D data would be included here as well) may have the same problem. So, while I applaud the efforts made to include a masked array implementation, I'd like to ask that: 1) We are mindful that any design decisions taken before the next release should not *preclude* the implementation of bit-masks (with, hopefully, a shared interface) and 2) that we make a concerted effort to implement the bitmask mode of operation as soon as possible. The NEP stated that both would be implemented, and I understand that due to lack of time a pragmatic call had to be made--but that was, in my opinion, one of its strong features. Regards St?fan From gael.varoquaux at normalesup.org Fri Oct 28 06:03:42 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 28 Oct 2011 12:03:42 +0200 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> Message-ID: <20111028100342.GE26218@phare.normalesup.org> On Fri, Oct 28, 2011 at 10:45:09AM +0200, Han Genuit wrote: > Also, I like the short and concise abbreviation for 'Not Applicable', > NA. It has more common uses than IGNORE. > (See also here: > http://www.johndcook.com/R_language_for_programmers.html#missing) That's a very R centric point a view: you know what NA stands for, thus you find it meaningful. I can tell you that when I work with naive users, I keep having to explain that NA stands for 'not available', whereas IGNORE is at least somewhat explicit. Acronyms are a curse for communication, and they tend to be very domain specific. My two euro-cents (a rising currency, now that it has been saved by our generous leaders) G From charlesr.harris at gmail.com Fri Oct 28 10:35:47 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 28 Oct 2011 08:35:47 -0600 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: <4A15979D-CC34-402B-9DBD-2B8AC7AFA044@enthought.com> References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> <4A15979D-CC34-402B-9DBD-2B8AC7AFA044@enthought.com> Message-ID: On Thu, Oct 27, 2011 at 8:51 PM, Travis Oliphant wrote: > As I mentioned. I find the ability to separate an ABSENT idea from an > IGNORED idea convincing. In other words, I think distinguishing between > masks and bit-patterns is not just an implementation detail, but provides a > useful concept for multiple use-cases. > > I understand exactly what it would take to add bit-patterns to NumPy. I > also understand what Mark did and agree that it is possible to add Matthew's > idea to the current code-base. I think it is worth exploring > > A masked view can be considered as simply a mask on the viewed data. I agree that in that case it might be nicer to have some operations that are only allowed for views, such as taking a view with a mask from somewhere else rather than having to set it up with assignments. It might also be useful if masked values in a view could be exposed without assigning to the underlying value, perhaps with a np.EXPOSE assignment. But I think these operations could be implemented on top of the current code, although we might want an additional flag. Space saving can be addressed with bit masks. Unified storage can be addressed by bit patterns that get translated between stored data and numpy arrays with NA. So on and so forth. As people begin to use the current implementation I hope that they offer feedback as to what they discover so that the API and implementation can mature into something widely useful. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Fri Oct 28 12:21:46 2011 From: Chris.Barker at noaa.gov (Chris.Barker) Date: Fri, 28 Oct 2011 09:21:46 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: <4A15979D-CC34-402B-9DBD-2B8AC7AFA044@enthought.com> References: <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> <4A15979D-CC34-402B-9DBD-2B8AC7AFA044@enthought.com> Message-ID: <4EAAD69A.1040600@noaa.gov> On 10/27/11 7:51 PM, Travis Oliphant wrote: > As I mentioned. I find the ability to separate an ABSENT idea from an > IGNORED idea convincing. In other words, I think distinguishing between > masks and bit-patterns is not just an implementation detail, but > provides a useful concept for multiple use-cases. Exactly -- while one can implement ABSENT with a mask, one can not implement IGNORE with a bit-pattern. So it is not an implementation detail. I also think bit-patterns are a bit of a dead end: - there is only a standard for one data type family: i.e. NaN for ieee float types - So we would be coming up with our own standard (or adopting an existing one, but I don't think there is one widely supported) for other types. This means: 1) a lot of work to do 2) a binary format incompatible with other code, compilers, etc. This is a BIG deal -- a major strength of numpy is that it serves as a wrapper for a data block that is compatible with C, Fortran or whatever code -- special bit patterns would make this a lot harder. We also talked about the fact that a 8-bit mask provides the ability to carry other information in the mask -- not jsut "missing" or "ignored", but a handful of other possible reasons for masking. I think that has a lot of possibilities. On 10/28/11 2:11 AM, St?fan van der Walt wrote: > Another data point: I've been spending some time on scikits-image > recently, and although masked values would be highly useful in that > context, the cost of doubling memory use (for uint8 images, e.g.) is > too high. > 2) that we make a concerted effort to implement the bitmask mode of > operation as soon as possible. I wonder if that might be handled as a scikits-image extension, rather than core numpy? Is there a standard bit pattern for missing data in images? -- it's presumable quite important to maintain binary compatibility with image formats, processing tools, etc. I guess what I'm getting at is that special bit-pattern implementations may be domain specific. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From matthew.brett at gmail.com Fri Oct 28 13:39:07 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 28 Oct 2011 10:39:07 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> Message-ID: Hi, On Thu, Oct 27, 2011 at 10:56 PM, Benjamin Root wrote: > > > On Thursday, October 27, 2011, Charles R Harris > wrote: >> >> >> On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant >> wrote: >>> >>> That is a pretty good explanation. ? I find myself convinced by Matthew's >>> arguments. ? ?I think that being able to separate ABSENT from IGNORED is a >>> good idea. ? I also like being able to control SKIP and PROPAGATE (but I >>> think the current implementation allows this already). >>> >>> What is the counter-argument to this proposal? >>> >> >> What exactly do you find convincing? The current masks propagate by >> default: >> >> In [1]: a = ones(5, maskna=1) >> >> In [2]: a[2] = NA >> >> In [3]: a >> Out[3]: array([ 1.,? 1.,? NA,? 1.,? 1.]) >> >> In [4]: a + 1 >> Out[4]: array([ 2.,? 2.,? NA,? 2.,? 2.]) >> >> In [5]: a[2] = 10 >> >> In [5]: a >> Out[5]: array([? 1.,?? 1.,? 10.,?? 1.,?? 1.], maskna=True) >> >> >> I don't see an essential difference between the implementation using masks >> and one using bit patterns, the mask when attached to the original array >> just adds a bit pattern by extending all the types by one byte, an approach >> that easily extends to all existing and future types, which is why Mark went >> that way for the first implementation given the time available. The masks >> are hidden because folks wanted something that behaved more like R and also >> because of the desire to combine the missing, ignore, and later possibly bit >> patterns in a unified manner. Note that the pseudo assignment was also meant >> to look like R. Adding true bit patterns to numpy isn't trivial and I >> believe Mark was thinking of parametrized types for that. >> >> The main problems I see with masks are unified storage and possibly memory >> use. The rest is just behavor and desired API and that can be adjusted >> within the current implementation. There is nothing essentially masky about >> masks. >> >> Chuck >> >> > > I ?think chuck sums it up quite nicely. ?The implementation detail about > using mask versus bit patterns can still be discussed and addressed. > Personally, I just don't see how parameterized dtypes would be easier to use > than the pseudo assignment. > > The elegance of mark's solution was to consider the treatment of missing > data in a unified manner. ?This puts missing data in a more prominent spot > for extension builders, which should greatly improve support throughout the > ecosystem. Are extension builders then required to use the numpy C API to get their data? Speaking as an extension builder, I would rather you gave me the mask and the bitpattern information and let me do that myself. > By letting there be a single missing data framework (instead of > two) all that users need to figure out is when they want nan-like behavior > (propagate) or to be more like masks (skip). ?Numpy takes care of the rest. > ?There is a reason why I like using masked arrays because I don't have to > use nansum in my library functions to guard against the possibility of > receiving nans. ?Duck-typing is a good thing. > > My argument against separating IGNORE and PROPAGATE is that it becomes too > tempting to want to mix these in an array, but the desired behavior would > likely become ambiguous.. > > There is one other proplem that I just thought of that I don't think has > been outlined in either NEP. ?What if I perform an operation between an > array set up with propagate NAs and an array with skip NAs? These are explicitly covered in the alterNEP: https://gist.github.com/1056379/ Best, Matthew From millman at berkeley.edu Fri Oct 28 13:40:36 2011 From: millman at berkeley.edu (Jarrod Millman) Date: Fri, 28 Oct 2011 10:40:36 -0700 Subject: [Numpy-discussion] [ANN] SciPy India 2011 Abstracts due November 2nd Message-ID: ========================== SciPy 2011 Call for Papers ========================== The third `SciPy India Conference `_ will be held from December 4th through the 7th at the `Indian Institute of Technology, Bombay (IITB) `_ in Mumbai, Maharashtra India. At this conference, novel applications and breakthroughs made in the pursuit of science using Python are presented. Attended by leading figures from both academia and industry, it is an excellent opportunity to experience the cutting edge of scientific software development. The conference is followed by two days of tutorials and a code sprint, during which community experts provide training on several scientific Python packages. We invite you to take part by submitting a talk abstract on the conference website at: http://scipy.in Talk/Paper Submission ========================== We solicit talks and accompanying papers (either formal academic or magazine-style articles) that discuss topics regarding scientific computing using Python, including applications, teaching, development and research. We welcome contributions from academia as well as industry. Important Dates ========================== November 2, 2011, Wednesday: Abstracts Due November 7, 2011, Monday: Schedule announced November 28, 2011, Monday: Proceedings paper submission due December 4-5, 2011, Sunday-Monday: Conference December 6-7 2011, Tuesday-Wednesday: Tutorials/Sprints Organizers ========================== * Jarrod Millman, Neuroscience Institute, UC Berkeley, USA (Conference Co-Chair) * Prabhu Ramachandran, Department of Aerospace Engineering, IIT Bombay, India (Conference Co-Chair) * FOSSEE Team From matthew.brett at gmail.com Fri Oct 28 13:49:31 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 28 Oct 2011 10:49:31 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: <4EAAD69A.1040600@noaa.gov> References: <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> <4A15979D-CC34-402B-9DBD-2B8AC7AFA044@enthought.com> <4EAAD69A.1040600@noaa.gov> Message-ID: Hi, On Fri, Oct 28, 2011 at 9:21 AM, Chris.Barker wrote: > On 10/27/11 7:51 PM, Travis Oliphant wrote: >> As I mentioned. I find the ability to separate an ABSENT idea from an >> IGNORED idea convincing. In other words, I think distinguishing between >> masks and bit-patterns is not just an implementation detail, but >> provides a useful concept for multiple use-cases. > > Exactly -- while one can implement ABSENT with a mask, one can not > implement IGNORE with a bit-pattern. So it is not an implementation detail. > > I also think bit-patterns are a bit of a dead end: > > - there is only a standard for one data type family: i.e. NaN for ieee > float types > > - So we would be coming up with our own standard (or adopting an > existing one, but I don't think there is one widely supported) for other > types. This means: > ? 1) a lot of work to do Largest possible negative integer for ints / largest integer for uints / not allowed for bool? > ? 2) a binary format incompatible with other code, compilers, etc. This > is a BIG deal -- a major strength of numpy is that it serves as a > wrapper for a data block that is compatible with C, Fortran or whatever > code -- special bit patterns would make this a lot harder. Extension code is going to get harder. At the moment, as far as I understand it, our extension code can receive a masked array and (without an explicit check from us) ignore the mask and process all the values. Then you're in the unfortunate situation of caring what's under the mask. Bitpatterns would - I imagine - be safer in that respect in that they would be new dtypes and thus extension code would by default reject them as unknown. > We also talked about the fact that a 8-bit mask provides the ability to > carry other information in the mask -- not jsut "missing" or "ignored", > but a handful of other possible reasons for masking. I think that has a > lot of possibilities. > > On 10/28/11 2:11 AM, St?fan van der Walt wrote: >> Another data point: ?I've been spending some time on scikits-image >> recently, and although masked values would be highly useful in that >> context, the cost of doubling memory use (for uint8 images, e.g.) is >> too high. > >> 2) that we make a concerted effort to implement the bitmask mode of >> operation as soon as possible. > > I wonder if that might be handled as a scikits-image extension, rather > than core numpy? I think Stefan and Nathaniel and Gary Strangman and others are saying we don't want to pay the price of a large memory hike for masking. I suspect that Nathaniel is right, and that a large majority of those of us who want 'missing data' functionality, also want what we've called ABSENT missing values, and care about memory. See you, Matthew From strang at nmr.mgh.harvard.edu Fri Oct 28 13:58:22 2011 From: strang at nmr.mgh.harvard.edu (Gary Strangman) Date: Fri, 28 Oct 2011 13:58:22 -0400 (EDT) Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> <4A15979D-CC34-402B-9DBD-2B8AC7AFA044@enthought.com> <4EAAD69A.1040600@noaa.gov> Message-ID: >> I wonder if that might be handled as a scikits-image extension, rather >> than core numpy? > > I think Stefan and Nathaniel and Gary Strangman and others are saying > we don't want to pay the price of a large memory hike for masking. I > suspect that Nathaniel is right, and that a large majority of those of > us who want 'missing data' functionality, also want what we've called > ABSENT missing values, and care about memory. FWIW, Matthew correctly interprets my concerns. I also have very large non-image datasets, so pushing the problem into a more custom extension (esp. one focused on images) doesn't help me much. -best Gary The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail. From ben.root at ou.edu Fri Oct 28 14:16:52 2011 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 28 Oct 2011 13:16:52 -0500 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> Message-ID: On Fri, Oct 28, 2011 at 12:39 PM, Matthew Brett wrote: > Hi, > > On Thu, Oct 27, 2011 at 10:56 PM, Benjamin Root wrote: > > > > > > On Thursday, October 27, 2011, Charles R Harris < > charlesr.harris at gmail.com> > > wrote: > >> > >> > >> On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant < > oliphant at enthought.com> > >> wrote: > >>> > >>> That is a pretty good explanation. I find myself convinced by > Matthew's > >>> arguments. I think that being able to separate ABSENT from IGNORED > is a > >>> good idea. I also like being able to control SKIP and PROPAGATE (but > I > >>> think the current implementation allows this already). > >>> > >>> What is the counter-argument to this proposal? > >>> > >> > >> What exactly do you find convincing? The current masks propagate by > >> default: > >> > >> In [1]: a = ones(5, maskna=1) > >> > >> In [2]: a[2] = NA > >> > >> In [3]: a > >> Out[3]: array([ 1., 1., NA, 1., 1.]) > >> > >> In [4]: a + 1 > >> Out[4]: array([ 2., 2., NA, 2., 2.]) > >> > >> In [5]: a[2] = 10 > >> > >> In [5]: a > >> Out[5]: array([ 1., 1., 10., 1., 1.], maskna=True) > >> > >> > >> I don't see an essential difference between the implementation using > masks > >> and one using bit patterns, the mask when attached to the original array > >> just adds a bit pattern by extending all the types by one byte, an > approach > >> that easily extends to all existing and future types, which is why Mark > went > >> that way for the first implementation given the time available. The > masks > >> are hidden because folks wanted something that behaved more like R and > also > >> because of the desire to combine the missing, ignore, and later possibly > bit > >> patterns in a unified manner. Note that the pseudo assignment was also > meant > >> to look like R. Adding true bit patterns to numpy isn't trivial and I > >> believe Mark was thinking of parametrized types for that. > >> > >> The main problems I see with masks are unified storage and possibly > memory > >> use. The rest is just behavor and desired API and that can be adjusted > >> within the current implementation. There is nothing essentially masky > about > >> masks. > >> > >> Chuck > >> > >> > > > > I think chuck sums it up quite nicely. The implementation detail about > > using mask versus bit patterns can still be discussed and addressed. > > Personally, I just don't see how parameterized dtypes would be easier to > use > > than the pseudo assignment. > > > > The elegance of mark's solution was to consider the treatment of missing > > data in a unified manner. This puts missing data in a more prominent > spot > > for extension builders, which should greatly improve support throughout > the > > ecosystem. > > Are extension builders then required to use the numpy C API to get > their data? Speaking as an extension builder, I would rather you gave > me the mask and the bitpattern information and let me do that myself. > > Forgive me, I wasn't clear. What I am speaking of is more about a typical human failing. If a programmer for a module never encounters masked arrays, then when they code up a function to operate on numpy data, it is quite likely that they would never take it into consideration. Notice the prolific use of "np.asarray()" even within the numpy codebase, which destroys masked arrays. However, by making missing data support more integral into the core of numpy, then it is far more likely that a programmer would take it into consideration when designing their algorithm, or at least explicitly document that their module does not support missing data. Both NEPs does this by making missing data front-and-center. However, my belief is that Mark's approach is easier to comprehend and is cleaner. Cleaner features means that it is more likely to be used. > > By letting there be a single missing data framework (instead of > > two) all that users need to figure out is when they want nan-like > behavior > > (propagate) or to be more like masks (skip). Numpy takes care of the > rest. > > There is a reason why I like using masked arrays because I don't have to > > use nansum in my library functions to guard against the possibility of > > receiving nans. Duck-typing is a good thing. > > > > My argument against separating IGNORE and PROPAGATE is that it becomes > too > > tempting to want to mix these in an array, but the desired behavior would > > likely become ambiguous.. > > > > There is one other proplem that I just thought of that I don't think has > > been outlined in either NEP. What if I perform an operation between an > > array set up with propagate NAs and an array with skip NAs? > > These are explicitly covered in the alterNEP: > > https://gist.github.com/1056379/ > > Sort of. You speak of reduction operations for a single array with a mix of NA and IGNOREs. I guess in that case, it wouldn't make a difference for element-wise operations between two arrays (plus adding the NAs propagate harder rule). Although, what if skipna=True? I guess I would feel better seeing explicit examples for different combinations of settings (plus, how would one set those for math operators?). In this case, I have a problem with this mixed situation. I would think that IGNORE + NA = IGNORE, because if you are skipping it, then it is skipped, regardless of the other side of the operator. (precedence: a masked array summed against an array of NANs). Looking back over Mark's NEP, I see he does cover the issue I am talking about: "The design of this NEP does not distinguish between NAs that come from an NA mask or NAs that come from an NA dtype. Both of these get treated equivalently in computations, with masks dominating over NA dtypes". However, he goes on about the possibility of multi-NA being able to control the effects more directly. Cheers, Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Fri Oct 28 14:20:23 2011 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 28 Oct 2011 13:20:23 -0500 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> <4A15979D-CC34-402B-9DBD-2B8AC7AFA044@enthought.com> <4EAAD69A.1040600@noaa.gov> Message-ID: On Fri, Oct 28, 2011 at 12:58 PM, Gary Strangman wrote: > > >> I wonder if that might be handled as a scikits-image extension, rather > >> than core numpy? > > > > I think Stefan and Nathaniel and Gary Strangman and others are saying > > we don't want to pay the price of a large memory hike for masking. I > > suspect that Nathaniel is right, and that a large majority of those of > > us who want 'missing data' functionality, also want what we've called > > ABSENT missing values, and care about memory. > > FWIW, Matthew correctly interprets my concerns. I also have very large > non-image datasets, so pushing the problem into a more custom extension > (esp. one focused on images) doesn't help me much. > > -best > Gary > > I would wonder if the masks could benefit from the approach used for the "carray" (compressed arrays) project? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Oct 28 14:37:38 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 28 Oct 2011 11:37:38 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> Message-ID: Hi, On Fri, Oct 28, 2011 at 11:16 AM, Benjamin Root wrote: > On Fri, Oct 28, 2011 at 12:39 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Thu, Oct 27, 2011 at 10:56 PM, Benjamin Root wrote: >> > >> > >> > On Thursday, October 27, 2011, Charles R Harris >> > >> > wrote: >> >> >> >> >> >> On Thu, Oct 27, 2011 at 7:16 PM, Travis Oliphant >> >> >> >> wrote: >> >>> >> >>> That is a pretty good explanation. ? I find myself convinced by >> >>> Matthew's >> >>> arguments. ? ?I think that being able to separate ABSENT from IGNORED >> >>> is a >> >>> good idea. ? I also like being able to control SKIP and PROPAGATE (but >> >>> I >> >>> think the current implementation allows this already). >> >>> >> >>> What is the counter-argument to this proposal? >> >>> >> >> >> >> What exactly do you find convincing? The current masks propagate by >> >> default: >> >> >> >> In [1]: a = ones(5, maskna=1) >> >> >> >> In [2]: a[2] = NA >> >> >> >> In [3]: a >> >> Out[3]: array([ 1.,? 1.,? NA,? 1.,? 1.]) >> >> >> >> In [4]: a + 1 >> >> Out[4]: array([ 2.,? 2.,? NA,? 2.,? 2.]) >> >> >> >> In [5]: a[2] = 10 >> >> >> >> In [5]: a >> >> Out[5]: array([? 1.,?? 1.,? 10.,?? 1.,?? 1.], maskna=True) >> >> >> >> >> >> I don't see an essential difference between the implementation using >> >> masks >> >> and one using bit patterns, the mask when attached to the original >> >> array >> >> just adds a bit pattern by extending all the types by one byte, an >> >> approach >> >> that easily extends to all existing and future types, which is why Mark >> >> went >> >> that way for the first implementation given the time available. The >> >> masks >> >> are hidden because folks wanted something that behaved more like R and >> >> also >> >> because of the desire to combine the missing, ignore, and later >> >> possibly bit >> >> patterns in a unified manner. Note that the pseudo assignment was also >> >> meant >> >> to look like R. Adding true bit patterns to numpy isn't trivial and I >> >> believe Mark was thinking of parametrized types for that. >> >> >> >> The main problems I see with masks are unified storage and possibly >> >> memory >> >> use. The rest is just behavor and desired API and that can be adjusted >> >> within the current implementation. There is nothing essentially masky >> >> about >> >> masks. >> >> >> >> Chuck >> >> >> >> >> > >> > I ?think chuck sums it up quite nicely. ?The implementation detail about >> > using mask versus bit patterns can still be discussed and addressed. >> > Personally, I just don't see how parameterized dtypes would be easier to >> > use >> > than the pseudo assignment. >> > >> > The elegance of mark's solution was to consider the treatment of missing >> > data in a unified manner. ?This puts missing data in a more prominent >> > spot >> > for extension builders, which should greatly improve support throughout >> > the >> > ecosystem. >> >> Are extension builders then required to use the numpy C API to get >> their data? ?Speaking as an extension builder, I would rather you gave >> me the mask and the bitpattern information and let me do that myself. >> > > Forgive me, I wasn't clear.? What I am speaking of is more about a typical > human failing.? If a programmer for a module never encounters masked arrays, > then when they code up a function to operate on numpy data, it is quite > likely that they would never take it into consideration.? Notice the > prolific use of "np.asarray()" even within the numpy codebase, which > destroys masked arrays. Hmm - that sounds like it could cause some surprises. So, what you were saying was just that it was good that masked arrays were now closer to the core? That's reasonable, but I don't think it's relevant to the current discussion. I think we all agree it is nice to have masked arrays in the core. > However, by making missing data support more integral into the core of > numpy, then it is far more likely that a programmer would take it into > consideration when designing their algorithm, or at least explicitly > document that their module does not support missing data.? Both NEPs does > this by making missing data front-and-center.? However, my belief is that > Mark's approach is easier to comprehend and is cleaner.? Cleaner features > means that it is more likely to be used. The main motivation for the alterNEP was our strong feeling that separating ABSENT and IGNORE was easier to comprehend and cleaner. I think it would be hard to argue that the aterNEP idea is not more explicit. >> > By letting there be a single missing data framework (instead of >> > two) all that users need to figure out is when they want nan-like >> > behavior >> > (propagate) or to be more like masks (skip). ?Numpy takes care of the >> > rest. >> > ?There is a reason why I like using masked arrays because I don't have >> > to >> > use nansum in my library functions to guard against the possibility of >> > receiving nans. ?Duck-typing is a good thing. >> > >> > My argument against separating IGNORE and PROPAGATE is that it becomes >> > too >> > tempting to want to mix these in an array, but the desired behavior >> > would >> > likely become ambiguous.. >> > >> > There is one other proplem that I just thought of that I don't think has >> > been outlined in either NEP. ?What if I perform an operation between an >> > array set up with propagate NAs and an array with skip NAs? >> >> These are explicitly covered in the alterNEP: >> >> https://gist.github.com/1056379/ >> > > Sort of.? You speak of reduction operations for a single array with a mix of > NA and IGNOREs.? I guess in that case, it wouldn't make a difference for > element-wise operations between two arrays (plus adding the NAs propagate > harder rule).? Although, what if skipna=True?? I guess I would feel better > seeing explicit examples for different combinations of settings (plus, how > would one set those for math operators?).? In this case, I have a problem > with this mixed situation.? I would think that IGNORE + NA = IGNORE, because > if you are skipping it, then it is skipped, regardless of the other side of > the operator.? (precedence: a masked array summed against an array of NANs). I'm using IGNORED as a type of value. What you do to that value depends on what you said to do to that value. You might want to SKIP that type of value, or PROPAGATE. If you said to 'skip' IGNORED but 'propagate' ABSENT, then IGNORED + ABSENT == ABSENT. I think it isn't ambiguous, but I'm happy to be corrected. Best, Matthew From xscript at gmx.net Fri Oct 28 15:15:01 2011 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Fri, 28 Oct 2011 21:15:01 +0200 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: <4A15979D-CC34-402B-9DBD-2B8AC7AFA044@enthought.com> (Travis Oliphant's message of "Thu, 27 Oct 2011 21:51:20 -0500") References: <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> <4A15979D-CC34-402B-9DBD-2B8AC7AFA044@enthought.com> Message-ID: <87obx1lyh6.fsf@ginnungagap.bsc.es> I haven't actually tested the code, but AFAIK the following is a short overview with examples of how the two orthogonal feature axis (ABSENT/IGNORE and PROPAGATE/SKIP) are related and how it all is supposed to work. I have never talked to Mark or anybody else in this list (that is, outside of this list), so I may well be mistaken. Thus, sorry if there are any inaccuracies and/or if you are already aware of what I'm describing here. So please tell me if this has helped clarify why I (and I hope others) think the implementation mechanism is independent of the semantics. Lluis ABSENT vs IGNORE ================ Travis Oliphant writes: > As I mentioned. I find the ability to separate an ABSENT idea from an IGNORED idea convincing. In other words, I think distinguishing between masks > and bit-patterns is not just an implementation detail, but provides a useful concept for multiple use-cases. I think it's an implementation detail as long as you have two clear ways of separating them. Summarizing: let's forget for a moment that "mask" has a meaning in english: - "maskna" corresponds to ABSENT - "ownmaskna" corresponds to IGNORED The problem here is that of the two implementation mechanisms (masks and bitpatterns), only the first can provide both semantics. Let's start with an array that already supports NAs: In [1]: a = np.array([1, 2, 3], maskna = True) ABSENT (destructive NA assignment) ---------------------------------- Once you assign NA, even if you're using NA masks, the value seems to be lost forever (i.e., the assignment is destructive regardless of the value): In [2]: b = a.view() In [3]: c = a.view(maskna = True) In [4]: b[0] = np.NA In [5]: a Out[5]: array([NA, 2, 3]) In [6]: b Out[6]: array([NA, 2, 3]) In [7]: c Out[7]: array([NA, 2, 3]) This is the default behaviour, and is probably what the regular user expects by what has been learned from previous uses of the "view" method. Note that here "maskna" acts as an idempotent operation. Once an array has the "maskna" property, all its views will transitively (and destructively) use it. Also note that an array copy will make a copy of both "regular" data and NA values, as expected. IGNORED (non-destructive NA assignment) --------------------------------------- But you can also have non-destructuve NA assignments, although *only* if you explicitly (and thus purposefully) ask for it -> ownmaskna In [8]: b = a.view(ownmaskna = True) In [9]: b[1] = np.NA In [10]: a Out[10]: array([NA, 2, 3]) In [11]: b Out[11]: array([NA, NA, 3]) In [12]: a[2] = np.NA In [13]: a Out[13]: array([NA, 2, NA]) In [14]: b Out[14]: array([NA, NA, 3]) The mask is a copy: In [15]: a[0] = 1 In [16]: a Out[16]: array([1, 2, 3], maskna = True) In [17]: b Out[17]: array([NA, NA, 3]) But the data itself is not (aka, non-NA values are *always* destructive, but I think this is out of the scope of this discussion): In [17]: a[0] = -10 In [18]: a[2] = -30 In [19]: a Out[19]: array([-10, 2, -30], maskna = True) In [20]: b Out[20]: array([NA, NA, -30]) The dark corner --------------- The only potential misunderstanding can be the creation of a NA-masked array from a "regular" array. This is precisely why I put this case at the end, as it seems to break the intuition some people have about assignment being always destructive (unless you explicitly ask for IGNORED, which is not the case): In [21]: a = np.array([1, 2, 3]) Out[21]: array([1, 2, 3]) In [22]: b = a.view(maskna = True) In [23]: b[0] = np.NA In [24]: a Out[24]: array([1, 2, 3]) In [25]: b Out[25]: array([NA, 2, 3]) This is in fact a corner case, and there is no obvious (and efficient!) way to handle it. As "a" is just a "regular" array, and has no support for any type of NA values (neither masks nor bit-patterns), assignments to any of its views cannot, in any case, be destructive. Note that the previous holds true because it currently is a design decision to forbid the in-flight conversion from "regular" to "NA-enabled" arrays. In fact I forgot that, when reading the docs in [1], I thought that a slight change could make it all feel more consistent: the view of a regular array can have NA values only if "ownmaskna" is used (IGNORED/non-destructive NA assignments), and will give an error if "maskna" is used in entry number 19. [1] http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html#creating-na-masked-views PROPAGATE vs SKIP ================= I've also read some comments regarding this. Maybe I didn't explain myself correctly in previous mails, or maybe I just misunderstood other people's mails (which might not be about this at all). PROPAGATE --------- All ufuncs in ndarray propagate NA values. Note that ABSENT (destructive NA-assignment) is also a default, so we could say that the default is R-like behaviour (AFAIK). SKIP ---- You have a different array type (let's call it skip_array), where all ufuncs do *not* propagate NA values. Middle-ground ------------- For the sake of code maintainability (and the specific needs one might have on a per-ufunc basis), in fact you only have one type of ndarray that supports both PROPAGATE and SKIP with the very same NA values. This can be controlled on a per-ufunc basis through the "skipna" argument that is present on all ufuncs, so that ndarray defaults to "skipna = False" and skip_array defaults to "skipna = True". The latter is done by simply defining an ndarray subclass that provides an ufunc wrapper like this (fake code): class skip_array (np.ndarray): ... def __ufunc_wrap__ (ufunc, *args, **kwargs): kwargs["skipna"] = True return ufunc(*args, **kwargs) There are other ways of doing it, but IMHO how it can be done doesn't matter right now. -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From stefan at sun.ac.za Fri Oct 28 15:35:32 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Fri, 28 Oct 2011 12:35:32 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> Message-ID: On Fri, Oct 28, 2011 at 11:16 AM, Benjamin Root wrote: > this by making missing data front-and-center.? However, my belief is that > Mark's approach is easier to comprehend and is cleaner.? Cleaner features > means that it is more likely to be used. Cleaner features may be easier to adopt, but whether they are used or not depends on whether they address the problem in hand. The implementation as it stands essentially gives us a faster and more integrated version of numpy.ma; but it has become clear from this conversation that such an approach overlooks a very common subset of masked-related problems. We should be concerned about memory use; we often don't have too much of it, and accessing it is slow. Would it be workable to store 8 mask bits per byte instead? I don't think it should impact on the speed much, and we can always generate a full mask for the user on request. Regards St?fan From ben.root at ou.edu Fri Oct 28 15:47:33 2011 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 28 Oct 2011 14:47:33 -0500 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> Message-ID: 2011/10/28 St?fan van der Walt > On Fri, Oct 28, 2011 at 11:16 AM, Benjamin Root wrote: > > this by making missing data front-and-center. However, my belief is that > > Mark's approach is easier to comprehend and is cleaner. Cleaner features > > means that it is more likely to be used. > > Cleaner features may be easier to adopt, but whether they are used or > not depends on whether they address the problem in hand. The > implementation as it stands essentially gives us a faster and more > integrated version of numpy.ma; but it has become clear from this > conversation that such an approach overlooks a very common subset of > masked-related problems. > > Which are...? (given the history of this discussion, let's not assume anything is clear). > We should be concerned about memory use; we often don't have too much > of it, and accessing it is slow. > > Would it be workable to store 8 mask bits per byte instead? I don't > think it should impact on the speed much, and we can always generate a > full mask for the user on request. > > I suggested such an idea a while back. This is part of the reason why Mark decided that the masks should not be exposed for direct access in case it is decided that masks could be implemented that way. I have a vague recollection of him commenting about some tests he did along that route, but I don't remember it. Cheers, Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Oct 28 16:02:04 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 28 Oct 2011 13:02:04 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: <87obx1lyh6.fsf@ginnungagap.bsc.es> References: <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> <4A15979D-CC34-402B-9DBD-2B8AC7AFA044@enthought.com> <87obx1lyh6.fsf@ginnungagap.bsc.es> Message-ID: Hi, On Fri, Oct 28, 2011 at 12:15 PM, Llu?s wrote: > Summarizing: let's forget for a moment that "mask" has a meaning in english: This is at the core of the problem. You and I know what's really going on - there's a mask over the data. But in what follows we're going to try and pretend that is not what is going on. The result is something that is rather hard to understand, and, when you do understand it, it's surprising and inconvenient. This is all because we tried to conceal what was really going on. > ? ? ? ? ? ? - "maskna" corresponds to ABSENT > ? ? ? ? ? ? - "ownmaskna" corresponds to IGNORED > > The problem here is that of the two implementation mechanisms (masks and > bitpatterns), only the first can provide both semantics. But let's be clear. The current masked array implementation is made so it looks like ABSENT, and makes IGNORED hard to get to. > Let's start with an array that already supports NAs: > > In [1]: a = np.array([1, 2, 3], maskna = True) > > > > ABSENT (destructive NA assignment) > ---------------------------------- > > Once you assign NA, even if you're using NA masks, the value seems to be lost > forever (i.e., the assignment is destructive regardless of the value): > > In [2]: b = a.view() > In [3]: c = a.view(maskna = True) > In [4]: b[0] = np.NA > In [5]: a > Out[5]: array([NA, 2, 3]) > In [6]: b > Out[6]: array([NA, 2, 3]) > In [7]: c > Out[7]: array([NA, 2, 3]) Right - the mask (fundamentally an IGNORED signal) is pretending to implement ABSENT. But - as you point out below - I'm pasting it here - in fact it's IGNORED. > In [21]: a = np.array([1, 2, 3]) > Out[21]: array([1, 2, 3]) > In [22]: b = a.view(maskna = True) > In [23]: b[0] = np.NA > In [24]: a > Out[24]: array([1, 2, 3]) > In [25]: b > Out[25]: array([NA, 2, 3]) But now - I've done this: >>> a = np.array([99, 100, 3], maskna=True) >>> a[0:2] = np.NA You and I know that I've got an array with values [99, 100, 3] and a mask with values [False, False, True]. So maybe I'd like to see what happens if I take off the mask from the second value. I know that's what I want to do, but I don't know how to do it, because you won't let me manipulate the mask, because I'm not allowed to know that the NA values come from the mask. The alterNEP is just saying - please - be straight with me. If you're doing masking, show me the mask, and don't try and hide that there are stored values underneath. Best, Matthew From stefan at sun.ac.za Fri Oct 28 16:05:32 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Fri, 28 Oct 2011 13:05:32 -0700 Subject: [Numpy-discussion] skip lines at the end of file with loadtxt In-Reply-To: References: Message-ID: On Tue, Oct 25, 2011 at 12:28 PM, Massimo Di Stefano wrote: > urllib.urlretrieve('http://www.cdc.noaa.gov/Correlation/amon.us.long.data', > 'file.txt') a = np.loadtxt('file.txt', skiprows=1) > > but it fails becouse of the txt description at the end of the file, It's always hard to stop reading before the end of a file, since we don't know when that's going to happen; I guess it would require a buffered approach. Fortunately, there is an easy workaround in your case. All the text lines start with " ", so simply do: np.loadtxt("amon.us.long.data", comments=" ") You should see a speedup of about 3 times over genfromtxt (loadtxt does much less under the hood). St?fan From stefan at sun.ac.za Fri Oct 28 16:13:12 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Fri, 28 Oct 2011 13:13:12 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> Message-ID: On Fri, Oct 28, 2011 at 12:47 PM, Benjamin Root wrote: > > 2011/10/28 St?fan van der Walt >>?The >> implementation as it stands essentially gives us a faster and more >> integrated version of numpy.ma; but it has become clear from this >> conversation that such an approach overlooks a very common subset of >> masked-related problems. >> > Which are...? (given the history of this discussion, let's not assume > anything is clear). The case where the number of elements in the array vastly outnumbers the number of masked elements. (Images, 3D volumes, large time-series, tables, etc.) E.g., if you are taking measurements from a sensor, but once in a blue moon the sensor messes up, you simply want to mark those values as missing, but you do not want to allocate a whole new chunk of memory to do so. I had a chat with JB Poline this morning, who mentioned that sparse matrix storage of the mask may also be an option. Those containers typically trade off insertion vs. lookup speeds, so I'm not sure whether it'd be feasible, but I like the idea. St?fan From ben.root at ou.edu Fri Oct 28 16:14:59 2011 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 28 Oct 2011 15:14:59 -0500 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> <4A15979D-CC34-402B-9DBD-2B8AC7AFA044@enthought.com> <87obx1lyh6.fsf@ginnungagap.bsc.es> Message-ID: On Fri, Oct 28, 2011 at 3:02 PM, Matthew Brett wrote: > > You and I know that I've got an array with values [99, 100, 3] and a > mask with values [False, False, True]. So maybe I'd like to see what > happens if I take off the mask from the second value. I know that's > what I want to do, but I don't know how to do it, because you won't > let me manipulate the mask, because I'm not allowed to know that the > NA values come from the mask. > > The alterNEP is just saying - please - be straight with me. If > you're doing masking, show me the mask, and don't try and hide that > there are stored values underneath. > > Considering that you have admitted before to not regularly using masked arrays, I seriously doubt that you would be able to judge whether this is a significant detriment or not. My entire point that I have been making is that Mark's implementation is not the same as the current masked arrays. Instead, it is a cleaner, more mature implementation that gets rid of extraneous "features". Instead of fussing around with a mask directly in the array, the user of masked arrays should now consider the use of views as the masks. It works beautifully because it works off a well-documented and well-understood feature of numpy. Of course, when you look at the feature in your way, with those expectations, then I would agree that it might be confusing. But given that this is a completely new feature, then we have the opportunity to properly document and show how to rethink a user's pre-conceptions of masked arrays. Users can keep the original array as a plain array and have mask1, mask2, mask3, etc as being separate views. It is a completely different way to think of masked arrays, and considering that masked arrays are not widely used in other toolkits, I think we can be free to change the paradigm. Further, there is no reason why we can't keep numpy.ma around for backwards compatibility and for those who "just don't get it". Cheers, Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Fri Oct 28 16:21:27 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Fri, 28 Oct 2011 13:21:27 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> <4A15979D-CC34-402B-9DBD-2B8AC7AFA044@enthought.com> <87obx1lyh6.fsf@ginnungagap.bsc.es> Message-ID: On Fri, Oct 28, 2011 at 1:14 PM, Benjamin Root wrote: > Considering that you have admitted before to not regularly using masked > arrays, I seriously doubt that you would be able to judge whether this is a > significant detriment or not. Let's not be unreasonable; Matthew has a valid concern (maybe from experience in teaching numpy): once the machinery under the hood becomes opaque, it becomes much harder to use numpy intuitively. St?fan From matthew.brett at gmail.com Fri Oct 28 16:22:28 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 28 Oct 2011 13:22:28 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> <4A15979D-CC34-402B-9DBD-2B8AC7AFA044@enthought.com> <87obx1lyh6.fsf@ginnungagap.bsc.es> Message-ID: Hi, On Fri, Oct 28, 2011 at 1:14 PM, Benjamin Root wrote: > > > On Fri, Oct 28, 2011 at 3:02 PM, Matthew Brett > wrote: >> >> You and I know that I've got an array with values [99, 100, 3] and a >> mask with values [False, False, True]. ?So maybe I'd like to see what >> happens if I take off the mask from the second value. ? I know that's >> what I want to do, but I don't know how to do it, because you won't >> let me manipulate the mask, because I'm not allowed to know that the >> NA values come from the mask. >> >> The alterNEP is just saying - please - be straight with me. ? If >> you're doing masking, show me the mask, and don't try and hide that >> there are stored values underneath. >> > > Considering that you have admitted before to not regularly using masked > arrays, I seriously doubt that you would be able to judge whether this is a > significant detriment or not.? My entire point that I have been making is > that Mark's implementation is not the same as the current masked arrays. > Instead, it is a cleaner, more mature implementation that gets rid of > extraneous "features". This may explain why we don't seem to be getting anywhere. I am sure that Mark's implementation of masking is great. We're not talking about that. We're talking about whether it's a good idea to make masking look as though it is implementing the ABSENT idea. That's what I think is confusing, and that's the conversation I have been trying to pursue. Best, Matthew From charlesr.harris at gmail.com Fri Oct 28 16:27:42 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 28 Oct 2011 14:27:42 -0600 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA48739.4010206@hawaii.edu> <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> Message-ID: 2011/10/28 St?fan van der Walt > On Fri, Oct 28, 2011 at 12:47 PM, Benjamin Root wrote: > > > > 2011/10/28 St?fan van der Walt > >> The > >> implementation as it stands essentially gives us a faster and more > >> integrated version of numpy.ma; but it has become clear from this > >> conversation that such an approach overlooks a very common subset of > >> masked-related problems. > >> > > Which are...? (given the history of this discussion, let's not assume > > anything is clear). > > The case where the number of elements in the array vastly outnumbers > the number of masked elements. (Images, 3D volumes, large > time-series, tables, etc.) > > E.g., if you are taking measurements from a sensor, but once in a blue > moon the sensor messes up, you simply want to mark those values as > missing, but you do not want to allocate a whole new chunk of memory > to do so. > > I had a chat with JB Poline this morning, who mentioned that sparse > matrix storage of the mask may also be an option. Those containers > typically trade off insertion vs. lookup speeds, so I'm not sure > whether it'd be feasible, but I like the idea. > > I think simple run length encoding might work well with masks. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Fri Oct 28 16:52:35 2011 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 28 Oct 2011 15:52:35 -0500 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> <4A15979D-CC34-402B-9DBD-2B8AC7AFA044@enthought.com> <87obx1lyh6.fsf@ginnungagap.bsc.es> Message-ID: On Fri, Oct 28, 2011 at 3:22 PM, Matthew Brett wrote: > Hi, > > On Fri, Oct 28, 2011 at 1:14 PM, Benjamin Root wrote: > > > > > > On Fri, Oct 28, 2011 at 3:02 PM, Matthew Brett > > wrote: > >> > >> You and I know that I've got an array with values [99, 100, 3] and a > >> mask with values [False, False, True]. So maybe I'd like to see what > >> happens if I take off the mask from the second value. I know that's > >> what I want to do, but I don't know how to do it, because you won't > >> let me manipulate the mask, because I'm not allowed to know that the > >> NA values come from the mask. > >> > >> The alterNEP is just saying - please - be straight with me. If > >> you're doing masking, show me the mask, and don't try and hide that > >> there are stored values underneath. > >> > > > > Considering that you have admitted before to not regularly using masked > > arrays, I seriously doubt that you would be able to judge whether this is > a > > significant detriment or not. My entire point that I have been making is > > that Mark's implementation is not the same as the current masked arrays. > > Instead, it is a cleaner, more mature implementation that gets rid of > > extraneous "features". > > This may explain why we don't seem to be getting anywhere. I am sure > that Mark's implementation of masking is great. We're not talking > about that. We're talking about whether it's a good idea to make > masking look as though it is implementing the ABSENT idea. That's > what I think is confusing, and that's the conversation I have been > trying to pursue. > > Best, > > Matthew > Sorry if I came across too strongly there. No disrespect was intended. Personally, I think we are getting somewhere. We have been whittling away what it is that we do agree upon, and have begun to specify *exactly* what it is that we disagree on. I have understand your concern, and -- like I said in my previous email -- it makes sense from the perspective of numpy.mausers have had up to now. But, I re-raise my point that I have been making about the need to re-think masked arrays. If we consider masks as advanced slicing or boolean indexing, then being unable to access the underlying values actually makes a lot of sense. Consider it a contract when I pass a set of data with only certain values exposed. Because I passed the data with only those values exposed, then it must have been entirely my intention to let the function know of only those values. It would be a violation of that contract if the function obtained those masked values. If I want to communicate both the original values and a particular mask, then I pass the array and a view with a particular mask. Maybe it would be helpful that an array can never have its own mask, but rather, only views can carry masks? In conclusion, I submit that this is largely a problem that can be solved with the proper documentation. New users who never used numpy.ma before do not have to concern themselves with the old way of thinking and are just simply taught what masked arrays "are". Meanwhile, a special section of the documentation should be made that teaches numpy.ma users how masked arrays "should be". Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Oct 28 17:01:52 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 28 Oct 2011 14:01:52 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <4EA4AC24.5060805@hawaii.edu> <87pqhmjm72.fsf@ginnungagap.bsc.es> <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> <4A15979D-CC34-402B-9DBD-2B8AC7AFA044@enthought.com> <87obx1lyh6.fsf@ginnungagap.bsc.es> Message-ID: Hi, On Fri, Oct 28, 2011 at 1:52 PM, Benjamin Root wrote: > > > On Fri, Oct 28, 2011 at 3:22 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Fri, Oct 28, 2011 at 1:14 PM, Benjamin Root wrote: >> > >> > >> > On Fri, Oct 28, 2011 at 3:02 PM, Matthew Brett >> > wrote: >> >> >> >> You and I know that I've got an array with values [99, 100, 3] and a >> >> mask with values [False, False, True]. ?So maybe I'd like to see what >> >> happens if I take off the mask from the second value. ? I know that's >> >> what I want to do, but I don't know how to do it, because you won't >> >> let me manipulate the mask, because I'm not allowed to know that the >> >> NA values come from the mask. >> >> >> >> The alterNEP is just saying - please - be straight with me. ? If >> >> you're doing masking, show me the mask, and don't try and hide that >> >> there are stored values underneath. >> >> >> > >> > Considering that you have admitted before to not regularly using masked >> > arrays, I seriously doubt that you would be able to judge whether this >> > is a >> > significant detriment or not.? My entire point that I have been making >> > is >> > that Mark's implementation is not the same as the current masked arrays. >> > Instead, it is a cleaner, more mature implementation that gets rid of >> > extraneous "features". >> >> This may explain why we don't seem to be getting anywhere. ?I am sure >> that Mark's implementation of masking is great. ? We're not talking >> about that. ?We're talking about whether it's a good idea to make >> masking look as though it is implementing the ABSENT idea. ? That's >> what I think is confusing, and that's the conversation I have been >> trying to pursue. >> >> Best, >> >> Matthew > > Sorry if I came across too strongly there. No disrespect was intended. I wasn't worried about the disrespect. It's just I feel the discussion has not been to the point. > Personally, I think we are getting somewhere.? We have been whittling away > what it is that we do agree upon, and have begun to specify *exactly* what > it is that we disagree on.? I have understand your concern, and -- like I > said in my previous email -- it makes sense from the perspective of numpy.ma > users have had up to now. But I'm not a numpy.ma user, I'm just someone who knows that what you are doing is masking out values. The fact that I do not use numpy.ma points out that it's possible to find this highly counter-intuitive without prior bias. > But, I re-raise my point that I have been making > about the need to re-think masked arrays.? If we consider masks as advanced > slicing or boolean indexing, then being unable to access the underlying > values actually makes a lot of sense. > > Consider it a contract when I pass a set of data with only certain values > exposed.? Because I passed the data with only those values exposed, then it > must have been entirely my intention to let the function know of only those > values.? It would be a violation of that contract if the function obtained > those masked values.? If I want to communicate both the original values and > a particular mask, then I pass the array and a view with a particular mask. This is the old discussion about what Python users expect. I think they expect to be treated as adults. That is, breaking the contract should not be easy to do by accident, but it should be allowed. > Maybe it would be helpful that an array can never have its own mask, but > rather, only views can carry masks? > > In conclusion, I submit that this is largely a problem that can be solved > with the proper documentation.? New users who never used numpy.ma before do > not have to concern themselves with the old way of thinking and are just > simply taught what masked arrays "are".? Meanwhile, a special section of the > documentation should be made that teaches numpy.ma users how masked arrays > "should be". I don't think documentation will solve it. In a way, the ideal user is someone who doesn't know what's going on, because, for a while, they may not realize that when they thought they were doing assignment, in fact they are doing masking. Unfortunately, I suspect almost everyone using these things will start to realize that, and then they will start getting confused. I find it confusing, and I believe myself to understand the issues pretty well, and be of numpy-user-range comprehension powers. See you, Matthew From njs at pobox.com Fri Oct 28 17:16:11 2011 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 28 Oct 2011 14:16:11 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) Message-ID: On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant wrote: > I think Nathaniel and Matthew provided very > specific feedback that was helpful in understanding other perspectives of a > difficult problem. ? ? In particular, I really wanted bit-patterns > implemented. ? ?However, I also understand that Mark did quite a bit of work > and altered his original designs quite a bit in response to community > feedback. ? I wasn't a major part of the pull request discussion, nor did I > merge the changes, but I support Charles if he reviewed the code and felt > like it was the right thing to do. ?I likely would have done the same thing > rather than let Mark Wiebe's work languish. My connectivity is spotty this week, so I'll stay out of the technical discussion for now, but I want to share a story. Maybe a year ago now, Jonathan Taylor and I were debating what the best API for describing statistical models would be -- whether we wanted something like R's "formulas" (which I supported), or another approach based on sympy (his idea). To summarize, I thought his API was confusing, pointlessly complicated, and didn't actually solve the problem; he thought R-style formulas were superficially simpler but hopelessly confused and inconsistent underneath. Now, obviously, I was right and he was wrong. Well, obvious to me, anyway... ;-) But it wasn't like I could just wave a wand and make his arguments go away, no matter how annoying and wrong-headed I thought they were... I could write all the code I wanted but no-one would use it unless I could convince them it's actually the right solution, so I had to engage with him, and dig deep into his arguments. What I discovered was that (as I thought) R-style formulas *do* have a solid theoretical basis -- but (as he thought) all the existing implementations *are* broken and inconsistent! I'm still not sure I can actually convince Jonathan to go my way, but, because of his stubbornness, I had to invent a better way of handling these formulas, and so my library[1] is actually the first implementation of these things that has a rigorous theory behind it, and in the process it avoids two fundamental, decades-old bugs in R. (And I'm not sure the R folks can fix either of them at this point without breaking a ton of code, since they both have API consequences.) -- It's extremely common for healthy FOSS projects to insist on consensus for almost all decisions, where consensus means something like "every interested party has a veto"[2]. This seems counterintuitive, because if everyone's vetoing all the time, how does anything get done? The trick is that if anyone *can* veto, then vetoes turn out to actually be very rare. Everyone knows that they can't just ignore alternative points of view -- they have to engage with them if they want to get anything done. So you get buy-in on features early, and no vetoes are necessary. And by forcing people to engage with each other, like me with Jonathan, you get better designs. But what about the cost of all that code that doesn't get merged, or written, because everyone's spending all this time debating instead? Better designs are nice and all, but how does that justify letting working code languish? The greatest risk for a FOSS project is that people will ignore you. Projects and features live and die by community buy-in. Consider the "NA mask" feature right now. It works (at least the parts of it that are implemented). It's in mainline. But IIRC, Pierre said last time that he doesn't think the current design will help him improve or replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring this feature in favor of his library pandas' current hacky NA support. Members of the neuroimaging crowd are saying that the memory overhead is too high and the benefits too marginal, so they'll stick with NaNs. Together these folk a huge proportion of the this feature's target audience. So what have we actually accomplished by merging this to mainline? Are we going to be stuck supporting a feature that only a fraction of the target audience actually uses? (Maybe they're being dumb, but if people are ignoring your code for dumb reasons... they're still ignoring your code.) The consensus rule forces everyone to do the hardest and riskiest part -- building buy-in -- up front. Because you *have* to do it sooner or later, and doing it sooner doesn't just generate better designs. It drastically reduces the risk of ending up in a huge trainwreck. -- In my story at the beginning, I wished I had a magic wand to skip this annoying debate and political stuff. But giving it to me would have been a bad idea. I think that's went wrong with the NA discussion in the first place. Mark's an excellent programmer, and he tried his best to act in the good of everyone in the project -- but in the end, he did have a wand like that. He didn't have that sense that he *had* to get everyone on board (even the people who were saying dumb things), or he'd just be wasting his time. He didn't ask Pierre if the NA design would actually work for numpy.ma's purposes -- I did. You may have noticed that I do have some ideas for about how NA support should work. But my ideas aren't really the important thing. The alter-NEP was my attempt to find common ground between the different needs people were bringing up, so we could discuss whether it would work for people or not. I'm not wedded to anything in it. But this is a complicated issue with a lot of conflicting interests, and we need to find something that actually does work for everyone (or as large a subset as is practical). So here's what I think we should do: 1) I will submit a pull request backing Mark's NA work out of mainline, for now. (This is more or less done, I just need to get it onto github, see above re: connectivity) 2) I will also put together a new branch containing that work, rebased against current mainline, so it doesn't get lost. (Ditto.) 3) And we'll decide what to do with it *after* we hammer out a design that the various NA-supporting groups all find convincing. Or at least a design for some of the less controversial pieces (like the 'where=' ufunc argument?), get those merged, and then iterate incrementally. What do you all think? And in any case, thanks for reading, -- Nathaniel [1] https://github.com/charlton/charlton [2] For example, this is written into the Apache voting procedure: https://www.apache.org/foundation/voting.html (it's the "code modification" rules that are relevant). And as usual, Karl Fogel has more useful discussion: http://producingoss.com/en/consensus-democracy.html (see esp. the "When to vote" section, which is entirely about how to avoid voting) From matthew.brett at gmail.com Fri Oct 28 17:32:20 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 28 Oct 2011 14:32:20 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Fri, Oct 28, 2011 at 2:16 PM, Nathaniel Smith wrote: > On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant wrote: >> I think Nathaniel and Matthew provided very >> specific feedback that was helpful in understanding other perspectives of a >> difficult problem. ? ? In particular, I really wanted bit-patterns >> implemented. ? ?However, I also understand that Mark did quite a bit of work >> and altered his original designs quite a bit in response to community >> feedback. ? I wasn't a major part of the pull request discussion, nor did I >> merge the changes, but I support Charles if he reviewed the code and felt >> like it was the right thing to do. ?I likely would have done the same thing >> rather than let Mark Wiebe's work languish. > > My connectivity is spotty this week, so I'll stay out of the technical > discussion for now, but I want to share a story. > > Maybe a year ago now, Jonathan Taylor and I were debating what the > best API for describing statistical models would be -- whether we > wanted something like R's "formulas" (which I supported), or another > approach based on sympy (his idea). To summarize, I thought his API > was confusing, pointlessly complicated, and didn't actually solve the > problem; he thought R-style formulas were superficially simpler but > hopelessly confused and inconsistent underneath. Now, obviously, I was > right and he was wrong. Well, obvious to me, anyway... ;-) But it > wasn't like I could just wave a wand and make his arguments go away, > no matter how annoying and wrong-headed I thought they were... I could > write all the code I wanted but no-one would use it unless I could > convince them it's actually the right solution, so I had to engage > with him, and dig deep into his arguments. > > What I discovered was that (as I thought) R-style formulas *do* have a > solid theoretical basis -- but (as he thought) all the existing > implementations *are* broken and inconsistent! I'm still not sure I > can actually convince Jonathan to go my way, but, because of his > stubbornness, I had to invent a better way of handling these formulas, > and so my library[1] is actually the first implementation of these > things that has a rigorous theory behind it, and in the process it > avoids two fundamental, decades-old bugs in R. (And I'm not sure the R > folks can fix either of them at this point without breaking a ton of > code, since they both have API consequences.) > > -- > > It's extremely common for healthy FOSS projects to insist on consensus > for almost all decisions, where consensus means something like "every > interested party has a veto"[2]. This seems counterintuitive, because > if everyone's vetoing all the time, how does anything get done? The > trick is that if anyone *can* veto, then vetoes turn out to actually > be very rare. Everyone knows that they can't just ignore alternative > points of view -- they have to engage with them if they want to get > anything done. So you get buy-in on features early, and no vetoes are > necessary. And by forcing people to engage with each other, like me > with Jonathan, you get better designs. > > But what about the cost of all that code that doesn't get merged, or > written, because everyone's spending all this time debating instead? > Better designs are nice and all, but how does that justify letting > working code languish? > > The greatest risk for a FOSS project is that people will ignore you. > Projects and features live and die by community buy-in. Consider the > "NA mask" feature right now. It works (at least the parts of it that > are implemented). It's in mainline. But IIRC, Pierre said last time > that he doesn't think the current design will help him improve or > replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring > this feature in favor of his library pandas' current hacky NA support. > Members of the neuroimaging crowd are saying that the memory overhead > is too high and the benefits too marginal, so they'll stick with NaNs. > Together these folk a huge proportion of the this feature's target > audience. So what have we actually accomplished by merging this to > mainline? Are we going to be stuck supporting a feature that only a > fraction of the target audience actually uses? (Maybe they're being > dumb, but if people are ignoring your code for dumb reasons... they're > still ignoring your code.) > > The consensus rule forces everyone to do the hardest and riskiest part > -- building buy-in -- up front. Because you *have* to do it sooner or > later, and doing it sooner doesn't just generate better designs. It > drastically reduces the risk of ending up in a huge trainwreck. > > -- > > In my story at the beginning, I wished I had a magic wand to skip this > annoying debate and political stuff. But giving it to me would have > been a bad idea. I think that's went wrong with the NA discussion in > the first place. Mark's an excellent programmer, and he tried his best > to act in the good of everyone in the project -- but in the end, he > did have a wand like that. He didn't have that sense that he *had* to > get everyone on board (even the people who were saying dumb things), > or he'd just be wasting his time. He didn't ask Pierre if the NA > design would actually work for numpy.ma's purposes -- I did. > > You may have noticed that I do have some ideas for about how NA > support should work. But my ideas aren't really the important thing. > The alter-NEP was my attempt to find common ground between the > different needs people were bringing up, so we could discuss whether > it would work for people or not. I'm not wedded to anything in it. But > this is a complicated issue with a lot of conflicting interests, and > we need to find something that actually does work for everyone (or as > large a subset as is practical). > > So here's what I think we should do: > ?1) I will submit a pull request backing Mark's NA work out of > mainline, for now. (This is more or less done, I just need to get it > onto github, see above re: connectivity) > ?2) I will also put together a new branch containing that work, > rebased against current mainline, so it doesn't get lost. (Ditto.) > ?3) And we'll decide what to do with it *after* we hammer out a > design that the various NA-supporting groups all find convincing. Or > at least a design for some of the less controversial pieces (like the > 'where=' ufunc argument?), get those merged, and then iterate > incrementally. > > What do you all think? Nice post - thank you. I agree that we may have a problem with - process. I mean, maybe there is not much agreement on what the process for these kinds of discussions should be - and therefore - we can't point to some constitution or similar to say - hey - wait - we're not doing it right. It seems to me - from my technical reply to Travis - that it would be reasonable to keep Mark's implementation of masked arrays, but with some minor modifications to keep IGNORED (implemented) separable conceptually from ABSENT (not implemented). Maybe the discussion could be about those modifications? Specifically, where do you feel the points of disagreement are, after the masking idea has become clearly an implementation of IGNORED? I guess you also don't much care if the IGNORED default behavior is PROPAGATE or SKIP. I had thought about what would happen to numpy.ma - and I would really like to know what Pierre would need for this implementation to allow him to replace numpy.ma. See you, Matthew From matthew.brett at gmail.com Fri Oct 28 17:40:09 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 28 Oct 2011 14:40:09 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Fri, Oct 28, 2011 at 2:32 PM, Matthew Brett wrote: > Hi, > > On Fri, Oct 28, 2011 at 2:16 PM, Nathaniel Smith wrote: >> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant wrote: >>> I think Nathaniel and Matthew provided very >>> specific feedback that was helpful in understanding other perspectives of a >>> difficult problem. ? ? In particular, I really wanted bit-patterns >>> implemented. ? ?However, I also understand that Mark did quite a bit of work >>> and altered his original designs quite a bit in response to community >>> feedback. ? I wasn't a major part of the pull request discussion, nor did I >>> merge the changes, but I support Charles if he reviewed the code and felt >>> like it was the right thing to do. ?I likely would have done the same thing >>> rather than let Mark Wiebe's work languish. >> >> My connectivity is spotty this week, so I'll stay out of the technical >> discussion for now, but I want to share a story. >> >> Maybe a year ago now, Jonathan Taylor and I were debating what the >> best API for describing statistical models would be -- whether we >> wanted something like R's "formulas" (which I supported), or another >> approach based on sympy (his idea). To summarize, I thought his API >> was confusing, pointlessly complicated, and didn't actually solve the >> problem; he thought R-style formulas were superficially simpler but >> hopelessly confused and inconsistent underneath. Now, obviously, I was >> right and he was wrong. Well, obvious to me, anyway... ;-) But it >> wasn't like I could just wave a wand and make his arguments go away, >> no matter how annoying and wrong-headed I thought they were... I could >> write all the code I wanted but no-one would use it unless I could >> convince them it's actually the right solution, so I had to engage >> with him, and dig deep into his arguments. >> >> What I discovered was that (as I thought) R-style formulas *do* have a >> solid theoretical basis -- but (as he thought) all the existing >> implementations *are* broken and inconsistent! I'm still not sure I >> can actually convince Jonathan to go my way, but, because of his >> stubbornness, I had to invent a better way of handling these formulas, >> and so my library[1] is actually the first implementation of these >> things that has a rigorous theory behind it, and in the process it >> avoids two fundamental, decades-old bugs in R. (And I'm not sure the R >> folks can fix either of them at this point without breaking a ton of >> code, since they both have API consequences.) >> >> -- >> >> It's extremely common for healthy FOSS projects to insist on consensus >> for almost all decisions, where consensus means something like "every >> interested party has a veto"[2]. This seems counterintuitive, because >> if everyone's vetoing all the time, how does anything get done? The >> trick is that if anyone *can* veto, then vetoes turn out to actually >> be very rare. Everyone knows that they can't just ignore alternative >> points of view -- they have to engage with them if they want to get >> anything done. So you get buy-in on features early, and no vetoes are >> necessary. And by forcing people to engage with each other, like me >> with Jonathan, you get better designs. >> >> But what about the cost of all that code that doesn't get merged, or >> written, because everyone's spending all this time debating instead? >> Better designs are nice and all, but how does that justify letting >> working code languish? >> >> The greatest risk for a FOSS project is that people will ignore you. >> Projects and features live and die by community buy-in. Consider the >> "NA mask" feature right now. It works (at least the parts of it that >> are implemented). It's in mainline. But IIRC, Pierre said last time >> that he doesn't think the current design will help him improve or >> replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring >> this feature in favor of his library pandas' current hacky NA support. >> Members of the neuroimaging crowd are saying that the memory overhead >> is too high and the benefits too marginal, so they'll stick with NaNs. >> Together these folk a huge proportion of the this feature's target >> audience. So what have we actually accomplished by merging this to >> mainline? Are we going to be stuck supporting a feature that only a >> fraction of the target audience actually uses? (Maybe they're being >> dumb, but if people are ignoring your code for dumb reasons... they're >> still ignoring your code.) >> >> The consensus rule forces everyone to do the hardest and riskiest part >> -- building buy-in -- up front. Because you *have* to do it sooner or >> later, and doing it sooner doesn't just generate better designs. It >> drastically reduces the risk of ending up in a huge trainwreck. >> >> -- >> >> In my story at the beginning, I wished I had a magic wand to skip this >> annoying debate and political stuff. But giving it to me would have >> been a bad idea. I think that's went wrong with the NA discussion in >> the first place. Mark's an excellent programmer, and he tried his best >> to act in the good of everyone in the project -- but in the end, he >> did have a wand like that. He didn't have that sense that he *had* to >> get everyone on board (even the people who were saying dumb things), >> or he'd just be wasting his time. He didn't ask Pierre if the NA >> design would actually work for numpy.ma's purposes -- I did. >> >> You may have noticed that I do have some ideas for about how NA >> support should work. But my ideas aren't really the important thing. >> The alter-NEP was my attempt to find common ground between the >> different needs people were bringing up, so we could discuss whether >> it would work for people or not. I'm not wedded to anything in it. But >> this is a complicated issue with a lot of conflicting interests, and >> we need to find something that actually does work for everyone (or as >> large a subset as is practical). >> >> So here's what I think we should do: >> ?1) I will submit a pull request backing Mark's NA work out of >> mainline, for now. (This is more or less done, I just need to get it >> onto github, see above re: connectivity) >> ?2) I will also put together a new branch containing that work, >> rebased against current mainline, so it doesn't get lost. (Ditto.) >> ?3) And we'll decide what to do with it *after* we hammer out a >> design that the various NA-supporting groups all find convincing. Or >> at least a design for some of the less controversial pieces (like the >> 'where=' ufunc argument?), get those merged, and then iterate >> incrementally. >> >> What do you all think? > > Nice post - thank you. > > I agree that we may have a problem with - process. ?I mean, maybe > there is not much agreement on what the process for these kinds of > discussions should be - and therefore - we can't point to some > constitution or similar to say - hey - wait - we're not doing it > right. Your post reminded me of this: http://en.wikipedia.org/wiki/Rough_consensus It does depend on having something like a committee and a chairperson though. See you, Matthew From charlesr.harris at gmail.com Fri Oct 28 17:41:57 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 28 Oct 2011 15:41:57 -0600 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith wrote: > On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant > wrote: > > I think Nathaniel and Matthew provided very > > specific feedback that was helpful in understanding other perspectives of > a > > difficult problem. In particular, I really wanted bit-patterns > > implemented. However, I also understand that Mark did quite a bit of > work > > and altered his original designs quite a bit in response to community > > feedback. I wasn't a major part of the pull request discussion, nor did > I > > merge the changes, but I support Charles if he reviewed the code and felt > > like it was the right thing to do. I likely would have done the same > thing > > rather than let Mark Wiebe's work languish. > > My connectivity is spotty this week, so I'll stay out of the technical > discussion for now, but I want to share a story. > > Maybe a year ago now, Jonathan Taylor and I were debating what the > best API for describing statistical models would be -- whether we > wanted something like R's "formulas" (which I supported), or another > approach based on sympy (his idea). To summarize, I thought his API > was confusing, pointlessly complicated, and didn't actually solve the > problem; he thought R-style formulas were superficially simpler but > hopelessly confused and inconsistent underneath. Now, obviously, I was > right and he was wrong. Well, obvious to me, anyway... ;-) But it > wasn't like I could just wave a wand and make his arguments go away, > no matter how annoying and wrong-headed I thought they were... I could > write all the code I wanted but no-one would use it unless I could > convince them it's actually the right solution, so I had to engage > with him, and dig deep into his arguments. > > What I discovered was that (as I thought) R-style formulas *do* have a > solid theoretical basis -- but (as he thought) all the existing > implementations *are* broken and inconsistent! I'm still not sure I > can actually convince Jonathan to go my way, but, because of his > stubbornness, I had to invent a better way of handling these formulas, > and so my library[1] is actually the first implementation of these > things that has a rigorous theory behind it, and in the process it > avoids two fundamental, decades-old bugs in R. (And I'm not sure the R > folks can fix either of them at this point without breaking a ton of > code, since they both have API consequences.) > > -- > > It's extremely common for healthy FOSS projects to insist on consensus > for almost all decisions, where consensus means something like "every > interested party has a veto"[2]. This seems counterintuitive, because > if everyone's vetoing all the time, how does anything get done? The > trick is that if anyone *can* veto, then vetoes turn out to actually > be very rare. Everyone knows that they can't just ignore alternative > points of view -- they have to engage with them if they want to get > anything done. So you get buy-in on features early, and no vetoes are > necessary. And by forcing people to engage with each other, like me > with Jonathan, you get better designs. > > But what about the cost of all that code that doesn't get merged, or > written, because everyone's spending all this time debating instead? > Better designs are nice and all, but how does that justify letting > working code languish? > > The greatest risk for a FOSS project is that people will ignore you. > Projects and features live and die by community buy-in. Consider the > "NA mask" feature right now. It works (at least the parts of it that > are implemented). It's in mainline. But IIRC, Pierre said last time > that he doesn't think the current design will help him improve or > replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring > this feature in favor of his library pandas' current hacky NA support. > Members of the neuroimaging crowd are saying that the memory overhead > is too high and the benefits too marginal, so they'll stick with NaNs. > Together these folk a huge proportion of the this feature's target > audience. So what have we actually accomplished by merging this to > mainline? Are we going to be stuck supporting a feature that only a > fraction of the target audience actually uses? (Maybe they're being > dumb, but if people are ignoring your code for dumb reasons... they're > still ignoring your code.) > > The consensus rule forces everyone to do the hardest and riskiest part > -- building buy-in -- up front. Because you *have* to do it sooner or > later, and doing it sooner doesn't just generate better designs. It > drastically reduces the risk of ending up in a huge trainwreck. > > -- > > In my story at the beginning, I wished I had a magic wand to skip this > annoying debate and political stuff. But giving it to me would have > been a bad idea. I think that's went wrong with the NA discussion in > the first place. Mark's an excellent programmer, and he tried his best > to act in the good of everyone in the project -- but in the end, he > did have a wand like that. He didn't have that sense that he *had* to > get everyone on board (even the people who were saying dumb things), > or he'd just be wasting his time. He didn't ask Pierre if the NA > design would actually work for numpy.ma's purposes -- I did. > > You may have noticed that I do have some ideas for about how NA > support should work. But my ideas aren't really the important thing. > The alter-NEP was my attempt to find common ground between the > different needs people were bringing up, so we could discuss whether > it would work for people or not. I'm not wedded to anything in it. But > this is a complicated issue with a lot of conflicting interests, and > we need to find something that actually does work for everyone (or as > large a subset as is practical). > > So here's what I think we should do: > 1) I will submit a pull request backing Mark's NA work out of > mainline, for now. (This is more or less done, I just need to get it > onto github, see above re: connectivity) > 2) I will also put together a new branch containing that work, > rebased against current mainline, so it doesn't get lost. (Ditto.) > 3) And we'll decide what to do with it *after* we hammer out a > design that the various NA-supporting groups all find convincing. Or > at least a design for some of the less controversial pieces (like the > 'where=' ufunc argument?), get those merged, and then iterate > incrementally. > > What do you all think? > > Why don't you and Matthew work up an alternative implementation so we can compare the two? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Oct 28 17:43:07 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 28 Oct 2011 14:43:07 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris wrote: > > > On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith wrote: >> >> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant >> wrote: >> > I think Nathaniel and Matthew provided very >> > specific feedback that was helpful in understanding other perspectives >> > of a >> > difficult problem. ? ? In particular, I really wanted bit-patterns >> > implemented. ? ?However, I also understand that Mark did quite a bit of >> > work >> > and altered his original designs quite a bit in response to community >> > feedback. ? I wasn't a major part of the pull request discussion, nor >> > did I >> > merge the changes, but I support Charles if he reviewed the code and >> > felt >> > like it was the right thing to do. ?I likely would have done the same >> > thing >> > rather than let Mark Wiebe's work languish. >> >> My connectivity is spotty this week, so I'll stay out of the technical >> discussion for now, but I want to share a story. >> >> Maybe a year ago now, Jonathan Taylor and I were debating what the >> best API for describing statistical models would be -- whether we >> wanted something like R's "formulas" (which I supported), or another >> approach based on sympy (his idea). To summarize, I thought his API >> was confusing, pointlessly complicated, and didn't actually solve the >> problem; he thought R-style formulas were superficially simpler but >> hopelessly confused and inconsistent underneath. Now, obviously, I was >> right and he was wrong. Well, obvious to me, anyway... ;-) But it >> wasn't like I could just wave a wand and make his arguments go away, >> no matter how annoying and wrong-headed I thought they were... I could >> write all the code I wanted but no-one would use it unless I could >> convince them it's actually the right solution, so I had to engage >> with him, and dig deep into his arguments. >> >> What I discovered was that (as I thought) R-style formulas *do* have a >> solid theoretical basis -- but (as he thought) all the existing >> implementations *are* broken and inconsistent! I'm still not sure I >> can actually convince Jonathan to go my way, but, because of his >> stubbornness, I had to invent a better way of handling these formulas, >> and so my library[1] is actually the first implementation of these >> things that has a rigorous theory behind it, and in the process it >> avoids two fundamental, decades-old bugs in R. (And I'm not sure the R >> folks can fix either of them at this point without breaking a ton of >> code, since they both have API consequences.) >> >> -- >> >> It's extremely common for healthy FOSS projects to insist on consensus >> for almost all decisions, where consensus means something like "every >> interested party has a veto"[2]. This seems counterintuitive, because >> if everyone's vetoing all the time, how does anything get done? The >> trick is that if anyone *can* veto, then vetoes turn out to actually >> be very rare. Everyone knows that they can't just ignore alternative >> points of view -- they have to engage with them if they want to get >> anything done. So you get buy-in on features early, and no vetoes are >> necessary. And by forcing people to engage with each other, like me >> with Jonathan, you get better designs. >> >> But what about the cost of all that code that doesn't get merged, or >> written, because everyone's spending all this time debating instead? >> Better designs are nice and all, but how does that justify letting >> working code languish? >> >> The greatest risk for a FOSS project is that people will ignore you. >> Projects and features live and die by community buy-in. Consider the >> "NA mask" feature right now. It works (at least the parts of it that >> are implemented). It's in mainline. But IIRC, Pierre said last time >> that he doesn't think the current design will help him improve or >> replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring >> this feature in favor of his library pandas' current hacky NA support. >> Members of the neuroimaging crowd are saying that the memory overhead >> is too high and the benefits too marginal, so they'll stick with NaNs. >> Together these folk a huge proportion of the this feature's target >> audience. So what have we actually accomplished by merging this to >> mainline? Are we going to be stuck supporting a feature that only a >> fraction of the target audience actually uses? (Maybe they're being >> dumb, but if people are ignoring your code for dumb reasons... they're >> still ignoring your code.) >> >> The consensus rule forces everyone to do the hardest and riskiest part >> -- building buy-in -- up front. Because you *have* to do it sooner or >> later, and doing it sooner doesn't just generate better designs. It >> drastically reduces the risk of ending up in a huge trainwreck. >> >> -- >> >> In my story at the beginning, I wished I had a magic wand to skip this >> annoying debate and political stuff. But giving it to me would have >> been a bad idea. I think that's went wrong with the NA discussion in >> the first place. Mark's an excellent programmer, and he tried his best >> to act in the good of everyone in the project -- but in the end, he >> did have a wand like that. He didn't have that sense that he *had* to >> get everyone on board (even the people who were saying dumb things), >> or he'd just be wasting his time. He didn't ask Pierre if the NA >> design would actually work for numpy.ma's purposes -- I did. >> >> You may have noticed that I do have some ideas for about how NA >> support should work. But my ideas aren't really the important thing. >> The alter-NEP was my attempt to find common ground between the >> different needs people were bringing up, so we could discuss whether >> it would work for people or not. I'm not wedded to anything in it. But >> this is a complicated issue with a lot of conflicting interests, and >> we need to find something that actually does work for everyone (or as >> large a subset as is practical). >> >> So here's what I think we should do: >> ?1) I will submit a pull request backing Mark's NA work out of >> mainline, for now. (This is more or less done, I just need to get it >> onto github, see above re: connectivity) >> ?2) I will also put together a new branch containing that work, >> rebased against current mainline, so it doesn't get lost. (Ditto.) >> ?3) And we'll decide what to do with it *after* we hammer out a >> design that the various NA-supporting groups all find convincing. Or >> at least a design for some of the less controversial pieces (like the >> 'where=' ufunc argument?), get those merged, and then iterate >> incrementally. >> >> What do you all think? >> > > Why don't you and Matthew work up an alternative implementation so we can > compare the two? Do you have comments on the changes I suggested? Best, Matthew From matthew.brett at gmail.com Fri Oct 28 17:56:23 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 28 Oct 2011 14:56:23 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett wrote: > Hi, > > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris > wrote: >> >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith wrote: >>> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant >>> wrote: >>> > I think Nathaniel and Matthew provided very >>> > specific feedback that was helpful in understanding other perspectives >>> > of a >>> > difficult problem. ? ? In particular, I really wanted bit-patterns >>> > implemented. ? ?However, I also understand that Mark did quite a bit of >>> > work >>> > and altered his original designs quite a bit in response to community >>> > feedback. ? I wasn't a major part of the pull request discussion, nor >>> > did I >>> > merge the changes, but I support Charles if he reviewed the code and >>> > felt >>> > like it was the right thing to do. ?I likely would have done the same >>> > thing >>> > rather than let Mark Wiebe's work languish. >>> >>> My connectivity is spotty this week, so I'll stay out of the technical >>> discussion for now, but I want to share a story. >>> >>> Maybe a year ago now, Jonathan Taylor and I were debating what the >>> best API for describing statistical models would be -- whether we >>> wanted something like R's "formulas" (which I supported), or another >>> approach based on sympy (his idea). To summarize, I thought his API >>> was confusing, pointlessly complicated, and didn't actually solve the >>> problem; he thought R-style formulas were superficially simpler but >>> hopelessly confused and inconsistent underneath. Now, obviously, I was >>> right and he was wrong. Well, obvious to me, anyway... ;-) But it >>> wasn't like I could just wave a wand and make his arguments go away, >>> no matter how annoying and wrong-headed I thought they were... I could >>> write all the code I wanted but no-one would use it unless I could >>> convince them it's actually the right solution, so I had to engage >>> with him, and dig deep into his arguments. >>> >>> What I discovered was that (as I thought) R-style formulas *do* have a >>> solid theoretical basis -- but (as he thought) all the existing >>> implementations *are* broken and inconsistent! I'm still not sure I >>> can actually convince Jonathan to go my way, but, because of his >>> stubbornness, I had to invent a better way of handling these formulas, >>> and so my library[1] is actually the first implementation of these >>> things that has a rigorous theory behind it, and in the process it >>> avoids two fundamental, decades-old bugs in R. (And I'm not sure the R >>> folks can fix either of them at this point without breaking a ton of >>> code, since they both have API consequences.) >>> >>> -- >>> >>> It's extremely common for healthy FOSS projects to insist on consensus >>> for almost all decisions, where consensus means something like "every >>> interested party has a veto"[2]. This seems counterintuitive, because >>> if everyone's vetoing all the time, how does anything get done? The >>> trick is that if anyone *can* veto, then vetoes turn out to actually >>> be very rare. Everyone knows that they can't just ignore alternative >>> points of view -- they have to engage with them if they want to get >>> anything done. So you get buy-in on features early, and no vetoes are >>> necessary. And by forcing people to engage with each other, like me >>> with Jonathan, you get better designs. >>> >>> But what about the cost of all that code that doesn't get merged, or >>> written, because everyone's spending all this time debating instead? >>> Better designs are nice and all, but how does that justify letting >>> working code languish? >>> >>> The greatest risk for a FOSS project is that people will ignore you. >>> Projects and features live and die by community buy-in. Consider the >>> "NA mask" feature right now. It works (at least the parts of it that >>> are implemented). It's in mainline. But IIRC, Pierre said last time >>> that he doesn't think the current design will help him improve or >>> replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring >>> this feature in favor of his library pandas' current hacky NA support. >>> Members of the neuroimaging crowd are saying that the memory overhead >>> is too high and the benefits too marginal, so they'll stick with NaNs. >>> Together these folk a huge proportion of the this feature's target >>> audience. So what have we actually accomplished by merging this to >>> mainline? Are we going to be stuck supporting a feature that only a >>> fraction of the target audience actually uses? (Maybe they're being >>> dumb, but if people are ignoring your code for dumb reasons... they're >>> still ignoring your code.) >>> >>> The consensus rule forces everyone to do the hardest and riskiest part >>> -- building buy-in -- up front. Because you *have* to do it sooner or >>> later, and doing it sooner doesn't just generate better designs. It >>> drastically reduces the risk of ending up in a huge trainwreck. >>> >>> -- >>> >>> In my story at the beginning, I wished I had a magic wand to skip this >>> annoying debate and political stuff. But giving it to me would have >>> been a bad idea. I think that's went wrong with the NA discussion in >>> the first place. Mark's an excellent programmer, and he tried his best >>> to act in the good of everyone in the project -- but in the end, he >>> did have a wand like that. He didn't have that sense that he *had* to >>> get everyone on board (even the people who were saying dumb things), >>> or he'd just be wasting his time. He didn't ask Pierre if the NA >>> design would actually work for numpy.ma's purposes -- I did. >>> >>> You may have noticed that I do have some ideas for about how NA >>> support should work. But my ideas aren't really the important thing. >>> The alter-NEP was my attempt to find common ground between the >>> different needs people were bringing up, so we could discuss whether >>> it would work for people or not. I'm not wedded to anything in it. But >>> this is a complicated issue with a lot of conflicting interests, and >>> we need to find something that actually does work for everyone (or as >>> large a subset as is practical). >>> >>> So here's what I think we should do: >>> ?1) I will submit a pull request backing Mark's NA work out of >>> mainline, for now. (This is more or less done, I just need to get it >>> onto github, see above re: connectivity) >>> ?2) I will also put together a new branch containing that work, >>> rebased against current mainline, so it doesn't get lost. (Ditto.) >>> ?3) And we'll decide what to do with it *after* we hammer out a >>> design that the various NA-supporting groups all find convincing. Or >>> at least a design for some of the less controversial pieces (like the >>> 'where=' ufunc argument?), get those merged, and then iterate >>> incrementally. >>> >>> What do you all think? >>> >> >> Why don't you and Matthew work up an alternative implementation so we can >> compare the two? > > Do you have comments on the changes I suggested? Sorry - this was too short and a little rude. I'm sorry. I was reacting to what I perceived as intolerance for discussing the issues, and I may be wrong in that perception. I think what Nathaniel is saying, is that it is not in the best interests of numpy to push through code where there is not good agreement. In reverting the change, he is, I think, appealing for a commitment to that process, for the good of numpy. I have in the past taken some of your remarks to imply that if someone is prepared to write code then that overrides most potential disagreement. The reason I think Nathaniel is the more right, is because most of us, I believe, do honestly have the interests of numpy at heart, and, want to fully understand the problem, and are prepared to be proven wrong. In that situation, in my experience of writing code at least, by far the most fruitful way to proceed is by letting all voices be heard. On the other hand, if the rule becomes 'unless I see an implementation I'm not listening to you' - then we lose the great benefits, to the code, of having what is fundamentally a good and strong community. Best, Matthew From charlesr.harris at gmail.com Fri Oct 28 18:14:53 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 28 Oct 2011 16:14:53 -0600 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett wrote: > Hi, > > On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett > wrote: > > Hi, > > > > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris > > wrote: > >> > >> > >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith wrote: > >>> > >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant < > oliphant at enthought.com> > >>> wrote: > >>> > I think Nathaniel and Matthew provided very > >>> > specific feedback that was helpful in understanding other > perspectives > >>> > of a > >>> > difficult problem. In particular, I really wanted bit-patterns > >>> > implemented. However, I also understand that Mark did quite a bit > of > >>> > work > >>> > and altered his original designs quite a bit in response to community > >>> > feedback. I wasn't a major part of the pull request discussion, nor > >>> > did I > >>> > merge the changes, but I support Charles if he reviewed the code and > >>> > felt > >>> > like it was the right thing to do. I likely would have done the same > >>> > thing > >>> > rather than let Mark Wiebe's work languish. > >>> > >>> My connectivity is spotty this week, so I'll stay out of the technical > >>> discussion for now, but I want to share a story. > >>> > >>> Maybe a year ago now, Jonathan Taylor and I were debating what the > >>> best API for describing statistical models would be -- whether we > >>> wanted something like R's "formulas" (which I supported), or another > >>> approach based on sympy (his idea). To summarize, I thought his API > >>> was confusing, pointlessly complicated, and didn't actually solve the > >>> problem; he thought R-style formulas were superficially simpler but > >>> hopelessly confused and inconsistent underneath. Now, obviously, I was > >>> right and he was wrong. Well, obvious to me, anyway... ;-) But it > >>> wasn't like I could just wave a wand and make his arguments go away, > >>> no matter how annoying and wrong-headed I thought they were... I could > >>> write all the code I wanted but no-one would use it unless I could > >>> convince them it's actually the right solution, so I had to engage > >>> with him, and dig deep into his arguments. > >>> > >>> What I discovered was that (as I thought) R-style formulas *do* have a > >>> solid theoretical basis -- but (as he thought) all the existing > >>> implementations *are* broken and inconsistent! I'm still not sure I > >>> can actually convince Jonathan to go my way, but, because of his > >>> stubbornness, I had to invent a better way of handling these formulas, > >>> and so my library[1] is actually the first implementation of these > >>> things that has a rigorous theory behind it, and in the process it > >>> avoids two fundamental, decades-old bugs in R. (And I'm not sure the R > >>> folks can fix either of them at this point without breaking a ton of > >>> code, since they both have API consequences.) > >>> > >>> -- > >>> > >>> It's extremely common for healthy FOSS projects to insist on consensus > >>> for almost all decisions, where consensus means something like "every > >>> interested party has a veto"[2]. This seems counterintuitive, because > >>> if everyone's vetoing all the time, how does anything get done? The > >>> trick is that if anyone *can* veto, then vetoes turn out to actually > >>> be very rare. Everyone knows that they can't just ignore alternative > >>> points of view -- they have to engage with them if they want to get > >>> anything done. So you get buy-in on features early, and no vetoes are > >>> necessary. And by forcing people to engage with each other, like me > >>> with Jonathan, you get better designs. > >>> > >>> But what about the cost of all that code that doesn't get merged, or > >>> written, because everyone's spending all this time debating instead? > >>> Better designs are nice and all, but how does that justify letting > >>> working code languish? > >>> > >>> The greatest risk for a FOSS project is that people will ignore you. > >>> Projects and features live and die by community buy-in. Consider the > >>> "NA mask" feature right now. It works (at least the parts of it that > >>> are implemented). It's in mainline. But IIRC, Pierre said last time > >>> that he doesn't think the current design will help him improve or > >>> replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring > >>> this feature in favor of his library pandas' current hacky NA support. > >>> Members of the neuroimaging crowd are saying that the memory overhead > >>> is too high and the benefits too marginal, so they'll stick with NaNs. > >>> Together these folk a huge proportion of the this feature's target > >>> audience. So what have we actually accomplished by merging this to > >>> mainline? Are we going to be stuck supporting a feature that only a > >>> fraction of the target audience actually uses? (Maybe they're being > >>> dumb, but if people are ignoring your code for dumb reasons... they're > >>> still ignoring your code.) > >>> > >>> The consensus rule forces everyone to do the hardest and riskiest part > >>> -- building buy-in -- up front. Because you *have* to do it sooner or > >>> later, and doing it sooner doesn't just generate better designs. It > >>> drastically reduces the risk of ending up in a huge trainwreck. > >>> > >>> -- > >>> > >>> In my story at the beginning, I wished I had a magic wand to skip this > >>> annoying debate and political stuff. But giving it to me would have > >>> been a bad idea. I think that's went wrong with the NA discussion in > >>> the first place. Mark's an excellent programmer, and he tried his best > >>> to act in the good of everyone in the project -- but in the end, he > >>> did have a wand like that. He didn't have that sense that he *had* to > >>> get everyone on board (even the people who were saying dumb things), > >>> or he'd just be wasting his time. He didn't ask Pierre if the NA > >>> design would actually work for numpy.ma's purposes -- I did. > >>> > >>> You may have noticed that I do have some ideas for about how NA > >>> support should work. But my ideas aren't really the important thing. > >>> The alter-NEP was my attempt to find common ground between the > >>> different needs people were bringing up, so we could discuss whether > >>> it would work for people or not. I'm not wedded to anything in it. But > >>> this is a complicated issue with a lot of conflicting interests, and > >>> we need to find something that actually does work for everyone (or as > >>> large a subset as is practical). > >>> > >>> So here's what I think we should do: > >>> 1) I will submit a pull request backing Mark's NA work out of > >>> mainline, for now. (This is more or less done, I just need to get it > >>> onto github, see above re: connectivity) > >>> 2) I will also put together a new branch containing that work, > >>> rebased against current mainline, so it doesn't get lost. (Ditto.) > >>> 3) And we'll decide what to do with it *after* we hammer out a > >>> design that the various NA-supporting groups all find convincing. Or > >>> at least a design for some of the less controversial pieces (like the > >>> 'where=' ufunc argument?), get those merged, and then iterate > >>> incrementally. > >>> > >>> What do you all think? > >>> > >> > >> Why don't you and Matthew work up an alternative implementation so we > can > >> compare the two? > > > > Do you have comments on the changes I suggested? > > Sorry - this was too short and a little rude. I'm sorry. > > I was reacting to what I perceived as intolerance for discussing the > issues, and I may be wrong in that perception. > > I think what Nathaniel is saying, is that it is not in the best > interests of numpy to push through code where there is not good > agreement. In reverting the change, he is, I think, appealing for a > commitment to that process, for the good of numpy. > > I have in the past taken some of your remarks to imply that if someone > is prepared to write code then that overrides most potential > disagreement. > > The reason I think Nathaniel is the more right, is because most of us, > I believe, do honestly have the interests of numpy at heart, and, want > to fully understand the problem, and are prepared to be proven wrong. > In that situation, in my experience of writing code at least, by far > the most fruitful way to proceed is by letting all voices be heard. > On the other hand, if the rule becomes 'unless I see an implementation > I'm not listening to you' - then we lose the great benefits, to the > code, of having what is fundamentally a good and strong community. > > Matthew, the problem I have is that it seems that you and Nathaniel won't be satisfied unless things are done *your* way. To use your terminology, that comes across as a lack of respect for the rest of us. In order to reach consensus, some folks are going to have to give. I think Mark gave a lot, I don't see that from the two of you. Wanting reversion at this point, even when Nathaniel doesn't seem to have used the current implementation much -- if any -- might be considered arrogant by some. Asking that you put some skin in the game by devoting substantial time to an alternate implementation doesn't strike me as out of line. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Fri Oct 28 18:21:41 2011 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 28 Oct 2011 17:21:41 -0500 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Friday, October 28, 2011, Matthew Brett wrote: > Hi, > > On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett wrote: >> Hi, >> >> On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris >> wrote: >>> >>> >>> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith wrote: >>>> >>>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant < oliphant at enthought.com> >>>> wrote: >>>> > I think Nathaniel and Matthew provided very >>>> > specific feedback that was helpful in understanding other perspectives >>>> > of a >>>> > difficult problem. In particular, I really wanted bit-patterns >>>> > implemented. However, I also understand that Mark did quite a bit of >>>> > work >>>> > and altered his original designs quite a bit in response to community >>>> > feedback. I wasn't a major part of the pull request discussion, nor >>>> > did I >>>> > merge the changes, but I support Charles if he reviewed the code and >>>> > felt >>>> > like it was the right thing to do. I likely would have done the same >>>> > thing >>>> > rather than let Mark Wiebe's work languish. >>>> >>>> My connectivity is spotty this week, so I'll stay out of the technical >>>> discussion for now, but I want to share a story. >>>> >>>> Maybe a year ago now, Jonathan Taylor and I were debating what the >>>> best API for describing statistical models would be -- whether we >>>> wanted something like R's "formulas" (which I supported), or another >>>> approach based on sympy (his idea). To summarize, I thought his API >>>> was confusing, pointlessly complicated, and didn't actually solve the >>>> problem; he thought R-style formulas were superficially simpler but >>>> hopelessly confused and inconsistent underneath. Now, obviously, I was >>>> right and he was wrong. Well, obvious to me, anyway... ;-) But it >>>> wasn't like I could just wave a wand and make his arguments go away, >>>> no matter how annoying and wrong-headed I thought they were... I could >>>> write all the code I wanted but no-one would use it unless I could >>>> convince them it's actually the right solution, so I had to engage >>>> with him, and dig deep into his arguments. >>>> >>>> What I discovered was that (as I thought) R-style formulas *do* have a >>>> solid theoretical basis -- but (as he thought) all the existing >>>> implementations *are* broken and inconsistent! I'm still not sure I >>>> can actually convince Jonathan to go my way, but, because of his >>>> stubbornness, I had to invent a better way of handling these formulas, >>>> and so my library[1] is actually the first implementation of these >>>> things that has a rigorous theory behind it, and in the process it >>>> avoids two fundamental, decades-old bugs in R. (And I'm not sure the R >>>> folks can fix either of them at this point without breaking a ton of >>>> code, since they both have API consequences.) >>>> >>>> -- >>>> >>>> It's extremely common for healthy FOSS projects to insist on consensus >>>> for almost all decisions, where consensus means something like "every >>>> interested party has a veto"[2]. This seems counterintuitive, because >>>> if everyone's vetoing all the time, how does anything get done? The >>>> trick is that if anyone *can* veto, then vetoes turn out to actually >>>> be very rare. Everyone knows that they can't just ignore alternative >>>> points of view -- they have to engage with them if they want to get >>>> anything done. So you get buy-in on features early, and no vetoes are >>>> necessary. And by forcing people to engage with each other, like me >>>> with Jonathan, you get better designs. >>>> >>>> But what about the cost of all that code that doesn't get merged, or >>>> written, because everyone's spending all this time debating instead? >>>> Better designSorry - this was too short and a little rude. I'm sorry. > > I was reacting to what I perceived as intolerance for discussing the > issues, and I may be wrong in that perception. > > I think what Nathaniel is saying, is that it is not in the best > interests of numpy to push through code where there is not good > agreement. In reverting the change, he is, I think, appealing for a > commitment to that process, for the good of numpy. > > I have in the past taken some of your remarks to imply that if someone > is prepared to write code then that overrides most potential > disagreement. > > The reason I think Nathaniel is the more right, is because most of us, > I believe, do honestly have the interests of numpy at heart, and, want > to fully understand the problem, and are prepared to be proven wrong. > In that situation, in my experience of writing code at least, by far > the most fruitful way to proceed is by letting all voices be heard. > On the other hand, if the rule becomes 'unless I see an implementation > I'm not listening to you' - then we lose the great benefits, to the > code, of having what is fundamentally a good and strong community. > > Best, > > Matthew > Maybe an alternative implementation isn't really needed. It seemed to me that most of the current implantation isn't too far off the mark. There are just key portions missing or might need to be modified. The space issues was never ignored and Mark left room for that to be addressed. Parameterized dtypes can still be added (and isn't all that different from multi-na). Perhaps I could be convinced of a having np.MA assignments mean "ignore" and np.NA mean "absent". How far off are we really from consensus? Although, I still think that ignore + absent = ignore Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Oct 28 18:37:45 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 28 Oct 2011 15:37:45 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris wrote: > > > On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett >> wrote: >> > Hi, >> > >> > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris >> > wrote: >> >> >> >> >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith wrote: >> >>> >> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant >> >>> >> >>> wrote: >> >>> > I think Nathaniel and Matthew provided very >> >>> > specific feedback that was helpful in understanding other >> >>> > perspectives >> >>> > of a >> >>> > difficult problem. ? ? In particular, I really wanted bit-patterns >> >>> > implemented. ? ?However, I also understand that Mark did quite a bit >> >>> > of >> >>> > work >> >>> > and altered his original designs quite a bit in response to >> >>> > community >> >>> > feedback. ? I wasn't a major part of the pull request discussion, >> >>> > nor >> >>> > did I >> >>> > merge the changes, but I support Charles if he reviewed the code and >> >>> > felt >> >>> > like it was the right thing to do. ?I likely would have done the >> >>> > same >> >>> > thing >> >>> > rather than let Mark Wiebe's work languish. >> >>> >> >>> My connectivity is spotty this week, so I'll stay out of the technical >> >>> discussion for now, but I want to share a story. >> >>> >> >>> Maybe a year ago now, Jonathan Taylor and I were debating what the >> >>> best API for describing statistical models would be -- whether we >> >>> wanted something like R's "formulas" (which I supported), or another >> >>> approach based on sympy (his idea). To summarize, I thought his API >> >>> was confusing, pointlessly complicated, and didn't actually solve the >> >>> problem; he thought R-style formulas were superficially simpler but >> >>> hopelessly confused and inconsistent underneath. Now, obviously, I was >> >>> right and he was wrong. Well, obvious to me, anyway... ;-) But it >> >>> wasn't like I could just wave a wand and make his arguments go away, >> >>> no matter how annoying and wrong-headed I thought they were... I could >> >>> write all the code I wanted but no-one would use it unless I could >> >>> convince them it's actually the right solution, so I had to engage >> >>> with him, and dig deep into his arguments. >> >>> >> >>> What I discovered was that (as I thought) R-style formulas *do* have a >> >>> solid theoretical basis -- but (as he thought) all the existing >> >>> implementations *are* broken and inconsistent! I'm still not sure I >> >>> can actually convince Jonathan to go my way, but, because of his >> >>> stubbornness, I had to invent a better way of handling these formulas, >> >>> and so my library[1] is actually the first implementation of these >> >>> things that has a rigorous theory behind it, and in the process it >> >>> avoids two fundamental, decades-old bugs in R. (And I'm not sure the R >> >>> folks can fix either of them at this point without breaking a ton of >> >>> code, since they both have API consequences.) >> >>> >> >>> -- >> >>> >> >>> It's extremely common for healthy FOSS projects to insist on consensus >> >>> for almost all decisions, where consensus means something like "every >> >>> interested party has a veto"[2]. This seems counterintuitive, because >> >>> if everyone's vetoing all the time, how does anything get done? The >> >>> trick is that if anyone *can* veto, then vetoes turn out to actually >> >>> be very rare. Everyone knows that they can't just ignore alternative >> >>> points of view -- they have to engage with them if they want to get >> >>> anything done. So you get buy-in on features early, and no vetoes are >> >>> necessary. And by forcing people to engage with each other, like me >> >>> with Jonathan, you get better designs. >> >>> >> >>> But what about the cost of all that code that doesn't get merged, or >> >>> written, because everyone's spending all this time debating instead? >> >>> Better designs are nice and all, but how does that justify letting >> >>> working code languish? >> >>> >> >>> The greatest risk for a FOSS project is that people will ignore you. >> >>> Projects and features live and die by community buy-in. Consider the >> >>> "NA mask" feature right now. It works (at least the parts of it that >> >>> are implemented). It's in mainline. But IIRC, Pierre said last time >> >>> that he doesn't think the current design will help him improve or >> >>> replace numpy.ma. Up-thread, Wes McKinney is leaning towards ignoring >> >>> this feature in favor of his library pandas' current hacky NA support. >> >>> Members of the neuroimaging crowd are saying that the memory overhead >> >>> is too high and the benefits too marginal, so they'll stick with NaNs. >> >>> Together these folk a huge proportion of the this feature's target >> >>> audience. So what have we actually accomplished by merging this to >> >>> mainline? Are we going to be stuck supporting a feature that only a >> >>> fraction of the target audience actually uses? (Maybe they're being >> >>> dumb, but if people are ignoring your code for dumb reasons... they're >> >>> still ignoring your code.) >> >>> >> >>> The consensus rule forces everyone to do the hardest and riskiest part >> >>> -- building buy-in -- up front. Because you *have* to do it sooner or >> >>> later, and doing it sooner doesn't just generate better designs. It >> >>> drastically reduces the risk of ending up in a huge trainwreck. >> >>> >> >>> -- >> >>> >> >>> In my story at the beginning, I wished I had a magic wand to skip this >> >>> annoying debate and political stuff. But giving it to me would have >> >>> been a bad idea. I think that's went wrong with the NA discussion in >> >>> the first place. Mark's an excellent programmer, and he tried his best >> >>> to act in the good of everyone in the project -- but in the end, he >> >>> did have a wand like that. He didn't have that sense that he *had* to >> >>> get everyone on board (even the people who were saying dumb things), >> >>> or he'd just be wasting his time. He didn't ask Pierre if the NA >> >>> design would actually work for numpy.ma's purposes -- I did. >> >>> >> >>> You may have noticed that I do have some ideas for about how NA >> >>> support should work. But my ideas aren't really the important thing. >> >>> The alter-NEP was my attempt to find common ground between the >> >>> different needs people were bringing up, so we could discuss whether >> >>> it would work for people or not. I'm not wedded to anything in it. But >> >>> this is a complicated issue with a lot of conflicting interests, and >> >>> we need to find something that actually does work for everyone (or as >> >>> large a subset as is practical). >> >>> >> >>> So here's what I think we should do: >> >>> ?1) I will submit a pull request backing Mark's NA work out of >> >>> mainline, for now. (This is more or less done, I just need to get it >> >>> onto github, see above re: connectivity) >> >>> ?2) I will also put together a new branch containing that work, >> >>> rebased against current mainline, so it doesn't get lost. (Ditto.) >> >>> ?3) And we'll decide what to do with it *after* we hammer out a >> >>> design that the various NA-supporting groups all find convincing. Or >> >>> at least a design for some of the less controversial pieces (like the >> >>> 'where=' ufunc argument?), get those merged, and then iterate >> >>> incrementally. >> >>> >> >>> What do you all think? >> >>> >> >> >> >> Why don't you and Matthew work up an alternative implementation so we >> >> can >> >> compare the two? >> > >> > Do you have comments on the changes I suggested? >> >> Sorry - this was too short and a little rude. ?I'm sorry. >> >> I was reacting to what I perceived as intolerance for discussing the >> issues, and I may be wrong in that perception. >> >> I think what Nathaniel is saying, is that it is not in the best >> interests of numpy to push through code where there is not good >> agreement. ?In reverting the change, he is, I think, appealing for a >> commitment to that process, for the good of numpy. >> >> I have in the past taken some of your remarks to imply that if someone >> is prepared to write code then that overrides most potential >> disagreement. >> >> The reason I think Nathaniel is the more right, is because most of us, >> I believe, do honestly have the interests of numpy at heart, and, want >> to fully understand the problem, and are prepared to be proven wrong. >> In that situation, in my experience of writing code at least, by far >> the most fruitful way to proceed is by letting all voices be heard. >> On the other hand, if the rule becomes 'unless I see an implementation >> I'm not listening to you' - then we lose the great benefits, to the >> code, of having what is fundamentally a good and strong community. >> > > Matthew, the problem I have is that it seems that you and Nathaniel won't be > satisfied unless things are done *your* way. To use your terminology, that > comes across as a lack of respect for the rest of us. In order to reach > consensus, some folks are going to have to give. No, that's not what Nathaniel and I are saying at all. Nathaniel was pointing to links for projects that care that everyone agrees before they go ahead. In saying that we are insisting on our way, you are saying, implicitly, 'I am not going to negotiate'. What Nathaniel is asking for (I agree) - is a commitment to negotiate to agreement when there is substantial disagreement. Best, Matthew From stefan at sun.ac.za Fri Oct 28 18:43:20 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Fri, 28 Oct 2011 15:43:20 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Fri, Oct 28, 2011 at 3:21 PM, Benjamin Root wrote: > The space issues was never ignored and Mark left room for that to be > addressed. ?Parameterized dtypes can still be added (and isn't all that > different from multi-na). Perhaps I could be convinced of a having np.MA > assignments mean "ignore" and np.NA mean "absent". ?How far off are we > really from consensus? Do you know whether Mark is around? I think his feedback would be useful at this point; having written the code, he'll be able to evaluate some of the technical suggestions made. St?fan From charlesr.harris at gmail.com Fri Oct 28 18:49:11 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 28 Oct 2011 16:49:11 -0600 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: 2011/10/28 St?fan van der Walt > On Fri, Oct 28, 2011 at 3:21 PM, Benjamin Root wrote: > > The space issues was never ignored and Mark left room for that to be > > addressed. Parameterized dtypes can still be added (and isn't all that > > different from multi-na). Perhaps I could be convinced of a having np.MA > > assignments mean "ignore" and np.NA mean "absent". How far off are we > > really from consensus? > > Do you know whether Mark is around? I think his feedback would be > useful at this point; having written the code, he'll be able to > evaluate some of the technical suggestions made. > > Yes, Mark is around, but I assume he is interested in his school work at this point. And he might not be inclined to get back into this particular discussion. I don't feel he was treated very well by some last time around. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Fri Oct 28 19:05:43 2011 From: Chris.Barker at noaa.gov (Chris.Barker) Date: Fri, 28 Oct 2011 16:05:43 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> Message-ID: <4EAB3547.5080009@noaa.gov> On 10/28/11 11:37 AM, Matthew Brett wrote: > The main motivation for the alterNEP was our strong feeling that > separating ABSENT and IGNORE was easier to comprehend and cleaner. I don't know about easier to comprehend, or cleaner, but it is more feature-full. I see two issues here: 1) being able to distinguish between "ignore" and "not valid" -- and being able to stop ignoring an ignored value. This could quite easily be accomplished with a mask approach -- indeed with 8 bits, you could have 8 different possible masked states (not that I'm suggesting that, at least not in core numpy.) However, with a bit-pattern approach, you simply can't implement "ignore". Once it's been set, the previous value is lost. 2) data size: A full mask takes extra space, sometimes a substantial amount -- so a bit-pattern approach would be nice. I like the idea (that I think Mark attempted to implement) that the implementation should be hidden from the user - not necessarily entirely hidden, but subtle enough that that casual user wouldn't need to care about it. In that case, I think if we could decide that we want both "ignore" and "not valid" (and it seems there is a fair bit of interest in that), then we can proceed with a mask-based approach, and develop an API that makes as little reference to the mask as possible. Then a bit-pattern approach could be developed that uses the same API -- it would not have the "ignore" option at all, but would be the same for the "not valid" option. When I write this it seem entirely too complicated for both the developers and users, but maybe it's not -- it could be analogous to what we have now: arrays can be Fortran or C ordered, contiguous or not, be views on other arrays or not. To really make numpy dance, you need to understand all that, but you can also do a whole lot, and write a lot of generic code, without digging into that. If we do all that, maybe there could be a sparse mask implementation, etc. as well. Maybe I'm dreaming, though... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From matthew.brett at gmail.com Fri Oct 28 19:09:49 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 28 Oct 2011 16:09:49 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Fri, Oct 28, 2011 at 3:49 PM, Charles R Harris wrote: > > > 2011/10/28 St?fan van der Walt >> >> On Fri, Oct 28, 2011 at 3:21 PM, Benjamin Root wrote: >> > The space issues was never ignored and Mark left room for that to be >> > addressed. ?Parameterized dtypes can still be added (and isn't all that >> > different from multi-na). Perhaps I could be convinced of a having np.MA >> > assignments mean "ignore" and np.NA mean "absent". ?How far off are we >> > really from consensus? >> >> Do you know whether Mark is around? ?I think his feedback would be >> useful at this point; having written the code, he'll be able to >> evaluate some of the technical suggestions made. >> > > Yes, Mark is around, but I assume he is interested in his school work at > this point. And he might not be inclined to get back into this particular > discussion. I don't feel he was treated very well by some last time around. We have not always been good at separating the concept of disagreement from that of rudeness. As I've said before, one form of rudeness (and not disagreement) is ignoring people. We should all be careful to point out - respectfully, and with reasons - when we find our colleagues replies (or non-replies) to be rude, because rudeness is very bad for the spirit of open discussion. Best, Matthew From charlesr.harris at gmail.com Fri Oct 28 19:19:23 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 28 Oct 2011 17:19:23 -0600 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: <4EAB3547.5080009@noaa.gov> References: <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> <4EAB3547.5080009@noaa.gov> Message-ID: On Fri, Oct 28, 2011 at 5:05 PM, Chris.Barker wrote: > On 10/28/11 11:37 AM, Matthew Brett wrote: > > The main motivation for the alterNEP was our strong feeling that > > separating ABSENT and IGNORE was easier to comprehend and cleaner. > > I don't know about easier to comprehend, or cleaner, but it is more > feature-full. > > I see two issues here: > > 1) being able to distinguish between "ignore" and "not valid" > -- and being able to stop ignoring an ignored value. > > This could quite easily be accomplished with a mask approach -- indeed > with 8 bits, you could have 8 different possible masked states (not that > I'm suggesting that, at least not in core numpy.) > > However, with a bit-pattern approach, you simply can't implement > "ignore". Once it's been set, the previous value is lost. > > > 2) data size: A full mask takes extra space, sometimes a substantial > amount -- so a bit-pattern approach would be nice. > > > I like the idea (that I think Mark attempted to implement) that the > implementation should be hidden from the user - not necessarily entirely > hidden, but subtle enough that that casual user wouldn't need to care > about it. > > I believe the main reason it is hidden from the user is so that the implementation can be changed without impacting existing applications. What I would like to see at this point is folks trying out the software and asking questions on the list like: "I want to do A and tried B, which didn't work. Any suggestions?" In short, I want people to actually use the software to see what issues arise so that we can fix things up. Memory use is a known problem. One way to start addressing it might be to implement a "bit" arraytype. It might even be possible to prototype that on top of the existing types. Views make bit arrays a bit more interesting ;) In that case, I think if we could decide that we want both "ignore" and > "not valid" (and it seems there is a fair bit of interest in that), then > we can proceed with a mask-based approach, and develop an API that makes > as little reference to the mask as possible. > > Then a bit-pattern approach could be developed that uses the same API -- > it would not have the "ignore" option at all, but would be the same for > the "not valid" option. > > When I write this it seem entirely too complicated for both the > developers and users, but maybe it's not -- it could be analogous to > what we have now: arrays can be Fortran or C ordered, contiguous or not, > be views on other arrays or not. To really make numpy dance, you need to > understand all that, but you can also do a whole lot, and write a lot of > generic code, without digging into that. > > If we do all that, maybe there could be a sparse mask implementation, > etc. as well. > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Oct 28 19:21:21 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 28 Oct 2011 17:21:21 -0600 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Fri, Oct 28, 2011 at 5:09 PM, Matthew Brett wrote: > On Fri, Oct 28, 2011 at 3:49 PM, Charles R Harris > wrote: > > > > > > 2011/10/28 St?fan van der Walt > >> > >> On Fri, Oct 28, 2011 at 3:21 PM, Benjamin Root wrote: > >> > The space issues was never ignored and Mark left room for that to be > >> > addressed. Parameterized dtypes can still be added (and isn't all > that > >> > different from multi-na). Perhaps I could be convinced of a having > np.MA > >> > assignments mean "ignore" and np.NA mean "absent". How far off are we > >> > really from consensus? > >> > >> Do you know whether Mark is around? I think his feedback would be > >> useful at this point; having written the code, he'll be able to > >> evaluate some of the technical suggestions made. > >> > > > > Yes, Mark is around, but I assume he is interested in his school work at > > this point. And he might not be inclined to get back into this particular > > discussion. I don't feel he was treated very well by some last time > around. > > We have not always been good at separating the concept of disagreement > from that of rudeness. > > As I've said before, one form of rudeness (and not disagreement) is > ignoring people. > > We should all be careful to point out - respectfully, and with reasons > - when we find our colleagues replies (or non-replies) to be rude, > because rudeness is very bad for the spirit of open discussion. > > Trying things out in preparation for discussion is also a mark of respect. Have you worked with the current implementation? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Fri Oct 28 19:21:44 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 29 Oct 2011 01:21:44 +0200 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett wrote: > Hi, > > On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > wrote: > > > > > > On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett > > wrote: > >> > >> Hi, > >> > >> On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett > > >> wrote: > >> > Hi, > >> > > >> > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris > >> > wrote: > >> >> > >> >> > >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith > wrote: > >> >>> > >> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant > >> >>> > >> >>> wrote: > >> >>> > I think Nathaniel and Matthew provided very > >> >>> > specific feedback that was helpful in understanding other > >> >>> > perspectives > >> >>> > of a > >> >>> > difficult problem. In particular, I really wanted bit-patterns > >> >>> > implemented. However, I also understand that Mark did quite a > bit > >> >>> > of > >> >>> > work > >> >>> > and altered his original designs quite a bit in response to > >> >>> > community > >> >>> > feedback. I wasn't a major part of the pull request discussion, > >> >>> > nor > >> >>> > did I > >> >>> > merge the changes, but I support Charles if he reviewed the code > and > >> >>> > felt > >> >>> > like it was the right thing to do. I likely would have done the > >> >>> > same > >> >>> > thing > >> >>> > rather than let Mark Wiebe's work languish. > >> >>> > >> >>> My connectivity is spotty this week, so I'll stay out of the > technical > >> >>> discussion for now, but I want to share a story. > >> >>> > >> >>> Maybe a year ago now, Jonathan Taylor and I were debating what the > >> >>> best API for describing statistical models would be -- whether we > >> >>> wanted something like R's "formulas" (which I supported), or another > >> >>> approach based on sympy (his idea). To summarize, I thought his API > >> >>> was confusing, pointlessly complicated, and didn't actually solve > the > >> >>> problem; he thought R-style formulas were superficially simpler but > >> >>> hopelessly confused and inconsistent underneath. Now, obviously, I > was > >> >>> right and he was wrong. Well, obvious to me, anyway... ;-) But it > >> >>> wasn't like I could just wave a wand and make his arguments go away, > >> >>> no matter how annoying and wrong-headed I thought they were... I > could > >> >>> write all the code I wanted but no-one would use it unless I could > >> >>> convince them it's actually the right solution, so I had to engage > >> >>> with him, and dig deep into his arguments. > >> >>> > >> >>> What I discovered was that (as I thought) R-style formulas *do* have > a > >> >>> solid theoretical basis -- but (as he thought) all the existing > >> >>> implementations *are* broken and inconsistent! I'm still not sure I > >> >>> can actually convince Jonathan to go my way, but, because of his > >> >>> stubbornness, I had to invent a better way of handling these > formulas, > >> >>> and so my library[1] is actually the first implementation of these > >> >>> things that has a rigorous theory behind it, and in the process it > >> >>> avoids two fundamental, decades-old bugs in R. (And I'm not sure the > R > >> >>> folks can fix either of them at this point without breaking a ton of > >> >>> code, since they both have API consequences.) > >> >>> > >> >>> -- > >> >>> > >> >>> It's extremely common for healthy FOSS projects to insist on > consensus > >> >>> for almost all decisions, where consensus means something like > "every > >> >>> interested party has a veto"[2]. This seems counterintuitive, > because > >> >>> if everyone's vetoing all the time, how does anything get done? The > >> >>> trick is that if anyone *can* veto, then vetoes turn out to actually > >> >>> be very rare. Everyone knows that they can't just ignore alternative > >> >>> points of view -- they have to engage with them if they want to get > >> >>> anything done. So you get buy-in on features early, and no vetoes > are > >> >>> necessary. And by forcing people to engage with each other, like me > >> >>> with Jonathan, you get better designs. > >> >>> > >> >>> But what about the cost of all that code that doesn't get merged, or > >> >>> written, because everyone's spending all this time debating instead? > >> >>> Better designs are nice and all, but how does that justify letting > >> >>> working code languish? > >> >>> > >> >>> The greatest risk for a FOSS project is that people will ignore you. > >> >>> Projects and features live and die by community buy-in. Consider the > >> >>> "NA mask" feature right now. It works (at least the parts of it that > >> >>> are implemented). It's in mainline. But IIRC, Pierre said last time > >> >>> that he doesn't think the current design will help him improve or > >> >>> replace numpy.ma. Up-thread, Wes McKinney is leaning towards > ignoring > >> >>> this feature in favor of his library pandas' current hacky NA > support. > >> >>> Members of the neuroimaging crowd are saying that the memory > overhead > >> >>> is too high and the benefits too marginal, so they'll stick with > NaNs. > >> >>> Together these folk a huge proportion of the this feature's target > >> >>> audience. So what have we actually accomplished by merging this to > >> >>> mainline? Are we going to be stuck supporting a feature that only a > >> >>> fraction of the target audience actually uses? (Maybe they're being > >> >>> dumb, but if people are ignoring your code for dumb reasons... > they're > >> >>> still ignoring your code.) > >> >>> > >> >>> The consensus rule forces everyone to do the hardest and riskiest > part > >> >>> -- building buy-in -- up front. Because you *have* to do it sooner > or > >> >>> later, and doing it sooner doesn't just generate better designs. It > >> >>> drastically reduces the risk of ending up in a huge trainwreck. > >> >>> > >> >>> -- > >> >>> > >> >>> In my story at the beginning, I wished I had a magic wand to skip > this > >> >>> annoying debate and political stuff. But giving it to me would have > >> >>> been a bad idea. I think that's went wrong with the NA discussion in > >> >>> the first place. Mark's an excellent programmer, and he tried his > best > >> >>> to act in the good of everyone in the project -- but in the end, he > >> >>> did have a wand like that. He didn't have that sense that he *had* > to > >> >>> get everyone on board (even the people who were saying dumb things), > >> >>> or he'd just be wasting his time. He didn't ask Pierre if the NA > >> >>> design would actually work for numpy.ma's purposes -- I did. > >> >>> > >> >>> You may have noticed that I do have some ideas for about how NA > >> >>> support should work. But my ideas aren't really the important thing. > >> >>> The alter-NEP was my attempt to find common ground between the > >> >>> different needs people were bringing up, so we could discuss whether > >> >>> it would work for people or not. I'm not wedded to anything in it. > But > >> >>> this is a complicated issue with a lot of conflicting interests, and > >> >>> we need to find something that actually does work for everyone (or > as > >> >>> large a subset as is practical). > >> >>> > >> >>> So here's what I think we should do: > >> >>> 1) I will submit a pull request backing Mark's NA work out of > >> >>> mainline, for now. (This is more or less done, I just need to get it > >> >>> onto github, see above re: connectivity) > >> >>> 2) I will also put together a new branch containing that work, > >> >>> rebased against current mainline, so it doesn't get lost. (Ditto.) > >> >>> 3) And we'll decide what to do with it *after* we hammer out a > >> >>> design that the various NA-supporting groups all find convincing. Or > >> >>> at least a design for some of the less controversial pieces (like > the > >> >>> 'where=' ufunc argument?), get those merged, and then iterate > >> >>> incrementally. > >> >>> > >> >>> What do you all think? > >> >>> > >> >> > >> >> Why don't you and Matthew work up an alternative implementation so we > >> >> can > >> >> compare the two? > >> > > >> > Do you have comments on the changes I suggested? > >> > >> Sorry - this was too short and a little rude. I'm sorry. > >> > >> I was reacting to what I perceived as intolerance for discussing the > >> issues, and I may be wrong in that perception. > >> > >> I think what Nathaniel is saying, is that it is not in the best > >> interests of numpy to push through code where there is not good > >> agreement. In reverting the change, he is, I think, appealing for a > >> commitment to that process, for the good of numpy. > >> > >> I have in the past taken some of your remarks to imply that if someone > >> is prepared to write code then that overrides most potential > >> disagreement. > >> > >> The reason I think Nathaniel is the more right, is because most of us, > >> I believe, do honestly have the interests of numpy at heart, and, want > >> to fully understand the problem, and are prepared to be proven wrong. > >> In that situation, in my experience of writing code at least, by far > >> the most fruitful way to proceed is by letting all voices be heard. > >> On the other hand, if the rule becomes 'unless I see an implementation > >> I'm not listening to you' - then we lose the great benefits, to the > >> code, of having what is fundamentally a good and strong community. > >> > > > > Matthew, the problem I have is that it seems that you and Nathaniel won't > be > > satisfied unless things are done *your* way. To use your terminology, > that > > comes across as a lack of respect for the rest of us. In order to reach > > consensus, some folks are going to have to give. > > No, that's not what Nathaniel and I are saying at all. Nathaniel was > pointing to links for projects that care that everyone agrees before > they go ahead. It looked to me like there was a serious intent to come to an agreement, or at least closer together. The discussion in the summer was going around in circles though, and was too abstract and complex to follow. Therefore Mark's choice of implementing something and then asking for feedback made sense to me. > In saying that we are insisting on our way, you are saying, implicitly, 'I > am not going to negotiate'. That is only your interpretation. The observation that Mark compromised quite a bit while you didn't seems largely correct to me. > What Nathaniel is asking for (I agree) - is a commitment to negotiate to > agreement when > there is substantial disagreement. > > That commitment would of course be good. However, even if that were possible before writing code and everyone agreed that the ideas of you and Nathaniel should be implemented in full, it's still not clear that either of you would be willing to write any code. Agreement without code still doesn't help us very much. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Fri Oct 28 19:25:23 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Fri, 28 Oct 2011 16:25:23 -0700 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> <4EAB3547.5080009@noaa.gov> Message-ID: On Fri, Oct 28, 2011 at 4:19 PM, Charles R Harris wrote: > Memory use is a known problem. One way to start addressing it might be to > implement a "bit" arraytype. It might even be possible to prototype that on > top of the existing types. Views make bit arrays a bit more interesting ;) Since 1/8 can be represented exactly in floating point, I guess it's technically possible to support non-integer strides? St?fan From matthew.brett at gmail.com Fri Oct 28 19:26:13 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 28 Oct 2011 16:26:13 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Fri, Oct 28, 2011 at 4:21 PM, Charles R Harris wrote: > > > On Fri, Oct 28, 2011 at 5:09 PM, Matthew Brett > wrote: >> >> On Fri, Oct 28, 2011 at 3:49 PM, Charles R Harris >> wrote: >> > >> > >> > 2011/10/28 St?fan van der Walt >> >> >> >> On Fri, Oct 28, 2011 at 3:21 PM, Benjamin Root wrote: >> >> > The space issues was never ignored and Mark left room for that to be >> >> > addressed. ?Parameterized dtypes can still be added (and isn't all >> >> > that >> >> > different from multi-na). Perhaps I could be convinced of a having >> >> > np.MA >> >> > assignments mean "ignore" and np.NA mean "absent". ?How far off are >> >> > we >> >> > really from consensus? >> >> >> >> Do you know whether Mark is around? ?I think his feedback would be >> >> useful at this point; having written the code, he'll be able to >> >> evaluate some of the technical suggestions made. >> >> >> > >> > Yes, Mark is around, but I assume he is interested in his school work at >> > this point. And he might not be inclined to get back into this >> > particular >> > discussion. I don't feel he was treated very well by some last time >> > around. >> >> We have not always been good at separating the concept of disagreement >> from that of rudeness. >> >> As I've said before, one form of rudeness (and not disagreement) is >> ignoring people. >> >> We should all be careful to point out - respectfully, and with reasons >> - when we find our colleagues replies (or non-replies) to be rude, >> because rudeness is very bad for the spirit of open discussion. >> > > Trying things out in preparation for discussion is also a mark of respect. > Have you worked with the current implementation? OK - this seems to me to be rude. Why? Because you have presumably already read what my concerns were, and my discussion of the current implementation in my reply to Travis. You haven't made any effort to point out to me where I may be wrong or failing to understand. I infer that you are merely saying 'go away and come back later'. And that is rude. Best, Matthew From charlesr.harris at gmail.com Fri Oct 28 19:30:06 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 28 Oct 2011 17:30:06 -0600 Subject: [Numpy-discussion] NA masks in the next numpy release? In-Reply-To: References: <87hb2xf6yo.fsf@ginnungagap.bsc.es> <0BFEB008-245E-40F2-8EE8-2246CF6A9EA4@enthought.com> <7D748F56-4F6D-408A-9716-F16F0029ED11@enthought.com> <4EAB3547.5080009@noaa.gov> Message-ID: 2011/10/28 St?fan van der Walt > On Fri, Oct 28, 2011 at 4:19 PM, Charles R Harris > wrote: > > Memory use is a known problem. One way to start addressing it might be to > > implement a "bit" arraytype. It might even be possible to prototype that > on > > top of the existing types. Views make bit arrays a bit more interesting > ;) > > Since 1/8 can be represented exactly in floating point, I guess it's > technically possible to support non-integer strides? > I think the same effect could be obtained with fixed point integers, i.e., the last three bits are the fractional part. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Oct 28 19:37:42 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 28 Oct 2011 16:37:42 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers wrote: > > > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett > wrote: >> >> Hi, >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> wrote: >> > >> > >> > On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett >> > wrote: >> >> >> >> Hi, >> >> >> >> On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett >> >> >> >> wrote: >> >> > Hi, >> >> > >> >> > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris >> >> > wrote: >> >> >> >> >> >> >> >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith >> >> >> wrote: >> >> >>> >> >> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant >> >> >>> >> >> >>> wrote: >> >> >>> > I think Nathaniel and Matthew provided very >> >> >>> > specific feedback that was helpful in understanding other >> >> >>> > perspectives >> >> >>> > of a >> >> >>> > difficult problem. ? ? In particular, I really wanted >> >> >>> > bit-patterns >> >> >>> > implemented. ? ?However, I also understand that Mark did quite a >> >> >>> > bit >> >> >>> > of >> >> >>> > work >> >> >>> > and altered his original designs quite a bit in response to >> >> >>> > community >> >> >>> > feedback. ? I wasn't a major part of the pull request discussion, >> >> >>> > nor >> >> >>> > did I >> >> >>> > merge the changes, but I support Charles if he reviewed the code >> >> >>> > and >> >> >>> > felt >> >> >>> > like it was the right thing to do. ?I likely would have done the >> >> >>> > same >> >> >>> > thing >> >> >>> > rather than let Mark Wiebe's work languish. >> >> >>> >> >> >>> My connectivity is spotty this week, so I'll stay out of the >> >> >>> technical >> >> >>> discussion for now, but I want to share a story. >> >> >>> >> >> >>> Maybe a year ago now, Jonathan Taylor and I were debating what the >> >> >>> best API for describing statistical models would be -- whether we >> >> >>> wanted something like R's "formulas" (which I supported), or >> >> >>> another >> >> >>> approach based on sympy (his idea). To summarize, I thought his API >> >> >>> was confusing, pointlessly complicated, and didn't actually solve >> >> >>> the >> >> >>> problem; he thought R-style formulas were superficially simpler but >> >> >>> hopelessly confused and inconsistent underneath. Now, obviously, I >> >> >>> was >> >> >>> right and he was wrong. Well, obvious to me, anyway... ;-) But it >> >> >>> wasn't like I could just wave a wand and make his arguments go >> >> >>> away, >> >> >>> no matter how annoying and wrong-headed I thought they were... I >> >> >>> could >> >> >>> write all the code I wanted but no-one would use it unless I could >> >> >>> convince them it's actually the right solution, so I had to engage >> >> >>> with him, and dig deep into his arguments. >> >> >>> >> >> >>> What I discovered was that (as I thought) R-style formulas *do* >> >> >>> have a >> >> >>> solid theoretical basis -- but (as he thought) all the existing >> >> >>> implementations *are* broken and inconsistent! I'm still not sure I >> >> >>> can actually convince Jonathan to go my way, but, because of his >> >> >>> stubbornness, I had to invent a better way of handling these >> >> >>> formulas, >> >> >>> and so my library[1] is actually the first implementation of these >> >> >>> things that has a rigorous theory behind it, and in the process it >> >> >>> avoids two fundamental, decades-old bugs in R. (And I'm not sure >> >> >>> the R >> >> >>> folks can fix either of them at this point without breaking a ton >> >> >>> of >> >> >>> code, since they both have API consequences.) >> >> >>> >> >> >>> -- >> >> >>> >> >> >>> It's extremely common for healthy FOSS projects to insist on >> >> >>> consensus >> >> >>> for almost all decisions, where consensus means something like >> >> >>> "every >> >> >>> interested party has a veto"[2]. This seems counterintuitive, >> >> >>> because >> >> >>> if everyone's vetoing all the time, how does anything get done? The >> >> >>> trick is that if anyone *can* veto, then vetoes turn out to >> >> >>> actually >> >> >>> be very rare. Everyone knows that they can't just ignore >> >> >>> alternative >> >> >>> points of view -- they have to engage with them if they want to get >> >> >>> anything done. So you get buy-in on features early, and no vetoes >> >> >>> are >> >> >>> necessary. And by forcing people to engage with each other, like me >> >> >>> with Jonathan, you get better designs. >> >> >>> >> >> >>> But what about the cost of all that code that doesn't get merged, >> >> >>> or >> >> >>> written, because everyone's spending all this time debating >> >> >>> instead? >> >> >>> Better designs are nice and all, but how does that justify letting >> >> >>> working code languish? >> >> >>> >> >> >>> The greatest risk for a FOSS project is that people will ignore >> >> >>> you. >> >> >>> Projects and features live and die by community buy-in. Consider >> >> >>> the >> >> >>> "NA mask" feature right now. It works (at least the parts of it >> >> >>> that >> >> >>> are implemented). It's in mainline. But IIRC, Pierre said last time >> >> >>> that he doesn't think the current design will help him improve or >> >> >>> replace numpy.ma. Up-thread, Wes McKinney is leaning towards >> >> >>> ignoring >> >> >>> this feature in favor of his library pandas' current hacky NA >> >> >>> support. >> >> >>> Members of the neuroimaging crowd are saying that the memory >> >> >>> overhead >> >> >>> is too high and the benefits too marginal, so they'll stick with >> >> >>> NaNs. >> >> >>> Together these folk a huge proportion of the this feature's target >> >> >>> audience. So what have we actually accomplished by merging this to >> >> >>> mainline? Are we going to be stuck supporting a feature that only a >> >> >>> fraction of the target audience actually uses? (Maybe they're being >> >> >>> dumb, but if people are ignoring your code for dumb reasons... >> >> >>> they're >> >> >>> still ignoring your code.) >> >> >>> >> >> >>> The consensus rule forces everyone to do the hardest and riskiest >> >> >>> part >> >> >>> -- building buy-in -- up front. Because you *have* to do it sooner >> >> >>> or >> >> >>> later, and doing it sooner doesn't just generate better designs. It >> >> >>> drastically reduces the risk of ending up in a huge trainwreck. >> >> >>> >> >> >>> -- >> >> >>> >> >> >>> In my story at the beginning, I wished I had a magic wand to skip >> >> >>> this >> >> >>> annoying debate and political stuff. But giving it to me would have >> >> >>> been a bad idea. I think that's went wrong with the NA discussion >> >> >>> in >> >> >>> the first place. Mark's an excellent programmer, and he tried his >> >> >>> best >> >> >>> to act in the good of everyone in the project -- but in the end, he >> >> >>> did have a wand like that. He didn't have that sense that he *had* >> >> >>> to >> >> >>> get everyone on board (even the people who were saying dumb >> >> >>> things), >> >> >>> or he'd just be wasting his time. He didn't ask Pierre if the NA >> >> >>> design would actually work for numpy.ma's purposes -- I did. >> >> >>> >> >> >>> You may have noticed that I do have some ideas for about how NA >> >> >>> support should work. But my ideas aren't really the important >> >> >>> thing. >> >> >>> The alter-NEP was my attempt to find common ground between the >> >> >>> different needs people were bringing up, so we could discuss >> >> >>> whether >> >> >>> it would work for people or not. I'm not wedded to anything in it. >> >> >>> But >> >> >>> this is a complicated issue with a lot of conflicting interests, >> >> >>> and >> >> >>> we need to find something that actually does work for everyone (or >> >> >>> as >> >> >>> large a subset as is practical). >> >> >>> >> >> >>> So here's what I think we should do: >> >> >>> ?1) I will submit a pull request backing Mark's NA work out of >> >> >>> mainline, for now. (This is more or less done, I just need to get >> >> >>> it >> >> >>> onto github, see above re: connectivity) >> >> >>> ?2) I will also put together a new branch containing that work, >> >> >>> rebased against current mainline, so it doesn't get lost. (Ditto.) >> >> >>> ?3) And we'll decide what to do with it *after* we hammer out a >> >> >>> design that the various NA-supporting groups all find convincing. >> >> >>> Or >> >> >>> at least a design for some of the less controversial pieces (like >> >> >>> the >> >> >>> 'where=' ufunc argument?), get those merged, and then iterate >> >> >>> incrementally. >> >> >>> >> >> >>> What do you all think? >> >> >>> >> >> >> >> >> >> Why don't you and Matthew work up an alternative implementation so >> >> >> we >> >> >> can >> >> >> compare the two? >> >> > >> >> > Do you have comments on the changes I suggested? >> >> >> >> Sorry - this was too short and a little rude. ?I'm sorry. >> >> >> >> I was reacting to what I perceived as intolerance for discussing the >> >> issues, and I may be wrong in that perception. >> >> >> >> I think what Nathaniel is saying, is that it is not in the best >> >> interests of numpy to push through code where there is not good >> >> agreement. ?In reverting the change, he is, I think, appealing for a >> >> commitment to that process, for the good of numpy. >> >> >> >> I have in the past taken some of your remarks to imply that if someone >> >> is prepared to write code then that overrides most potential >> >> disagreement. >> >> >> >> The reason I think Nathaniel is the more right, is because most of us, >> >> I believe, do honestly have the interests of numpy at heart, and, want >> >> to fully understand the problem, and are prepared to be proven wrong. >> >> In that situation, in my experience of writing code at least, by far >> >> the most fruitful way to proceed is by letting all voices be heard. >> >> On the other hand, if the rule becomes 'unless I see an implementation >> >> I'm not listening to you' - then we lose the great benefits, to the >> >> code, of having what is fundamentally a good and strong community. >> >> >> > >> > Matthew, the problem I have is that it seems that you and Nathaniel >> > won't be >> > satisfied unless things are done *your* way. To use your terminology, >> > that >> > comes across as a lack of respect for the rest of us. In order to reach >> > consensus, some folks are going to have to give. >> >> No, that's not what Nathaniel and I are saying at all. Nathaniel was >> pointing to links for projects that care that everyone agrees before >> they go ahead. > > It looked to me like there was a serious intent to come to an agreement, or > at least closer together. The discussion in the summer was going around in > circles though, and was too abstract and complex to follow. Therefore Mark's > choice of implementing something and then asking for feedback made sense to > me. I should point out that the implementation hasn't - as far as I can see - changed the discussion. The discussion was about the API. Implementations are useful for agreed APIs because they can point out where the API does not make sense or cannot be implemented. In this case, the API Mark said he was going to implement - he did implement - at least as far as I can see. Again, I'm happy to be corrected. >> In saying that we are insisting on our way, you are saying, implicitly, 'I >> am not going to negotiate'. > > That is only your interpretation. The observation that Mark compromised > quite a bit while you didn't seems largely correct to me. The problem here stems from our inability to work towards agreement, rather than standing on set positions. I set out what changes I think would make the current implementation OK. Can we please, please have a discussion about those points instead of trying to argue about who has given more ground. > That commitment would of course be good. However, even if that were possible > before writing code and everyone agreed that the ideas of you and Nathaniel > should be implemented in full, it's still not clear that either of you would > be willing to write any code. Agreement without code still doesn't help us > very much. I'm going to return to Nathaniel's point - it is a highly valuable thing to set ourselves the target of resolving substantial discussions by consensus. The route you are endorsing here is 'implementor wins'. We don't need to do it that way. We're a mature sensible bunch of adults who can talk out the issues until we agree they are ready for implementation, and then implement. That's all Nathaniel is saying. I think he's obviously right, and I'm sad that it isn't as clear to y'all as it is to me. Best, Matthew From ben.root at ou.edu Fri Oct 28 19:53:35 2011 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 28 Oct 2011 18:53:35 -0500 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Friday, October 28, 2011, Matthew Brett wrote: > Hi, > > On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers > wrote: >> >> >> On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >> wrote: >>> >>> Hi, >>> >>> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >>> wrote: >>> > >>> > >>> > On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett < matthew.brett at gmail.com> >>> > wrote: >>> >> >>> >> Hi, >>> >> >>> >> On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett >>> >> >>> >> wrote: >>> >> > Hi, >>> >> > >>> >> > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris >>> >> > wrote: >>> >> >> >>> >> >> >>> >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith >>> >> >> wrote: >>> >> >>> >>> >> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant >>> >> >>> >>> >> >>> wrote: >>> >> >>> > I think Nathaniel and Matthew provided very >>> >> >>> > specific feedback that was helpful in understanding other >>> >> >>> > perspectives >>> >> >>> > of a >>> >> >>> > difficult problem. In particular, I really wanted >>> >> >>> > bit-patterns >>> >> >>> > implemented. However, I also understand that Mark did quite a >>> >> >>> > bit >>> >> >>> > of >>> >> >>> > work >>> >> >>> > and altered his original designs quite a bit in response to >>> >> >>> > community >>> >> >>> > feedback. I wasn't a major part of the pull request discussion, >>> >> >>> > nor >>> >> >>> > did I >>> >> >>> > merge the changes, but I support Charles if he reviewed the code >>> >> >>> > and >>> >> >>> > felt >>> >> >>> > like it was the right thing to do. I likely would have done the >>> >> >>> > same >>> >> >>> > thing >>> >> >>> > rather than let Mark Wiebe's work languish. >>> >> >>> >>> >> >>> My connectivity is spotty this week, so I'll stay out of the >>> >> >>> technical >>> >> >>> discussion for now, but I want to share a story. >>> >> >>> >>> >> >>> Maybe a year ago now, Jonathan Taylor and I were debating what the >>> >> >>> best API for describing statistical models would be -- whether we >>> >> >>> wanted something like R's "formulas" (which I supported), or >>> >> >>> another >>> >> >>> approach based on sympy (his idea). To summarize, I thought his API >>> >> >>> was confusing, pointlessly complicated, and didn't actually solve >>> >> >>> the >>> >> >>> problem; he thought R-style formulas were superficially simpler but >>> >> >>> hopelessly confused and inconsistent underneath. Now, obviously, I >>> >> >>> was >>> >> >>> right and he was wrong. Well, obvious to me, anyway... ;-) But it >>> >> >>> wasn't like I could just wave a wand and make his arguments go >>> >> >>> away, >>> >> >>> no I should point out that the implementation hasn't - as far as I can > see - changed the discussion. The discussion was about the API. > Implementations are useful for agreed APIs because they can point out > where the API does not make sense or cannot be implemented. In this > case, the API Mark said he was going to implement - he did implement - > at least as far as I can see. Again, I'm happy to be corrected. > >>> In saying that we are insisting on our way, you are saying, implicitly, 'I >>> am not going to negotiate'. >> >> That is only your interpretation. The observation that Mark compromised >> quite a bit while you didn't seems largely correct to me. > > The problem here stems from our inability to work towards agreement, > rather than standing on set positions. I set out what changes I think > would make the current implementation OK. Can we please, please have > a discussion about those points instead of trying to argue about who > has given more ground. > >> That commitment would of course be good. However, even if that were possible >> before writing code and everyone agreed that the ideas of you and Nathaniel >> should be implemented in full, it's still not clear that either of you would >> be willing to write any code. Agreement without code still doesn't help us >> very much. > > I'm going to return to Nathaniel's point - it is a highly valuable > thing to set ourselves the target of resolving substantial discussions > by consensus. The route you are endorsing here is 'implementor > wins'. We don't need to do it that way. We're a mature sensible > bunch of adults who can talk out the issues until we agree they are > ready for implementation, and then implement. That's all Nathaniel is > saying. I think he's obviously right, and I'm sad that it isn't as > clear to y'all as it is to me. > > Best, > > Matthew > Everyone, can we please not do this?! I had enough of adults doing finger pointing back over the summer during the whole debt ceiling debate. I think we can all agree that we are better than the US congress? Forget about rudeness or decision processes. I will start by saying that I am willing to separate ignore and absent, but only on the write side of things. On read, I want a single way to identify the missing values. I also want only a single way to perform calculations (either skip or propagate). An indicator of success would be that people stop using NaNs and magic numbers (-9999, anyone?) and we could even deprecate nansum(), or at least strongly suggest in its docs to use NA. Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Fri Oct 28 20:45:55 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Fri, 28 Oct 2011 20:45:55 -0400 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Fri, Oct 28, 2011 at 7:53 PM, Benjamin Root wrote: > > > On Friday, October 28, 2011, Matthew Brett wrote: >> Hi, >> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >> wrote: >>> >>> >>> On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >>> wrote: >>>> >>>> Hi, >>>> >>>> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >>>> wrote: >>>> > >>>> > >>>> > On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett >>>> > >>>> > wrote: >>>> >> >>>> >> Hi, >>>> >> >>>> >> On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett >>>> >> >>>> >> wrote: >>>> >> > Hi, >>>> >> > >>>> >> > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris >>>> >> > wrote: >>>> >> >> >>>> >> >> >>>> >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith >>>> >> >> wrote: >>>> >> >>> >>>> >> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant >>>> >> >>> >>>> >> >>> wrote: >>>> >> >>> > I think Nathaniel and Matthew provided very >>>> >> >>> > specific feedback that was helpful in understanding other >>>> >> >>> > perspectives >>>> >> >>> > of a >>>> >> >>> > difficult problem. ? ? In particular, I really wanted >>>> >> >>> > bit-patterns >>>> >> >>> > implemented. ? ?However, I also understand that Mark did quite >>>> >> >>> > a >>>> >> >>> > bit >>>> >> >>> > of >>>> >> >>> > work >>>> >> >>> > and altered his original designs quite a bit in response to >>>> >> >>> > community >>>> >> >>> > feedback. ? I wasn't a major part of the pull request >>>> >> >>> > discussion, >>>> >> >>> > nor >>>> >> >>> > did I >>>> >> >>> > merge the changes, but I support Charles if he reviewed the >>>> >> >>> > code >>>> >> >>> > and >>>> >> >>> > felt >>>> >> >>> > like it was the right thing to do. ?I likely would have done >>>> >> >>> > the >>>> >> >>> > same >>>> >> >>> > thing >>>> >> >>> > rather than let Mark Wiebe's work languish. >>>> >> >>> >>>> >> >>> My connectivity is spotty this week, so I'll stay out of the >>>> >> >>> technical >>>> >> >>> discussion for now, but I want to share a story. >>>> >> >>> >>>> >> >>> Maybe a year ago now, Jonathan Taylor and I were debating what >>>> >> >>> the >>>> >> >>> best API for describing statistical models would be -- whether we >>>> >> >>> wanted something like R's "formulas" (which I supported), or >>>> >> >>> another >>>> >> >>> approach based on sympy (his idea). To summarize, I thought his >>>> >> >>> API >>>> >> >>> was confusing, pointlessly complicated, and didn't actually solve >>>> >> >>> the >>>> >> >>> problem; he thought R-style formulas were superficially simpler >>>> >> >>> but >>>> >> >>> hopelessly confused and inconsistent underneath. Now, obviously, >>>> >> >>> I >>>> >> >>> was >>>> >> >>> right and he was wrong. Well, obvious to me, anyway... ;-) But it >>>> >> >>> wasn't like I could just wave a wand and make his arguments go >>>> >> >>> away, >>>> >> >>> no I should point out that the implementation hasn't - as far as >>>> >> >>> I can >> see - changed the discussion. ?The discussion was about the API. >> Implementations are useful for agreed APIs because they can point out >> where the API does not make sense or cannot be implemented. ?In this >> case, the API Mark said he was going to implement - he did implement - >> at least as far as I can see. ?Again, I'm happy to be corrected. >> >>>> In saying that we are insisting on our way, you are saying, implicitly, >>>> 'I >>>> am not going to negotiate'. >>> >>> That is only your interpretation. The observation that Mark compromised >>> quite a bit while you didn't seems largely correct to me. >> >> The problem here stems from our inability to work towards agreement, >> rather than standing on set positions. ?I set out what changes I think >> would make the current implementation OK. ?Can we please, please have >> a discussion about those points instead of trying to argue about who >> has given more ground. >> >>> That commitment would of course be good. However, even if that were >>> possible >>> before writing code and everyone agreed that the ideas of you and >>> Nathaniel >>> should be implemented in full, it's still not clear that either of you >>> would >>> be willing to write any code. Agreement without code still doesn't help >>> us >>> very much. >> >> I'm going to return to Nathaniel's point - it is a highly valuable >> thing to set ourselves the target of resolving substantial discussions >> by consensus. ? The route you are endorsing here is 'implementor >> wins'. ? We don't need to do it that way. ?We're a mature sensible >> bunch of adults who can talk out the issues until we agree they are >> ready for implementation, and then implement. ?That's all Nathaniel is >> saying. ?I think he's obviously right, and I'm sad that it isn't as >> clear to y'all as it is to me. >> >> Best, >> >> Matthew >> > > Everyone, can we please not do this?! I had enough of adults doing finger > pointing back over the summer during the whole debt ceiling debate. ?I think > we can all agree that we are better than the US congress? > > Forget about rudeness or decision processes. > > I will start by saying that I am willing to separate ignore and absent, but > only on the write side of things. ?On read, I want a single way to identify > the missing values. ?I also want only a single way to perform calculations > (either skip or propagate). > > An indicator of success would be that people stop using NaNs and magic > numbers (-9999, anyone?) and we could even deprecate nansum(), or at least > strongly suggest in its docs to use NA. Well, I haven't completely made up my mind yet, will have to do some more prototyping and playing (and potentially have some of my users eat the differently-flavored dogfood), but I'm really not very satisfied with the API at the moment. I'm mainly worried about the abstraction leaking through to pandas users (this is a pretty large group of people judging by # of downloads). The basic position I'm in is that I'm trying to push Python into a new space, namely mainstream data analysis and statistical computing, one that is solidly occupied by R and other such well-known players. My target users are not computer scientists. They are not going to invest in understanding dtypes very deeply or the internals of ndarray. In fact I've spent a great deal of effort making it so that pandas users can be productive and successful while having very little understanding of NumPy. Yes, I essentially "protect" my users from NumPy because using it well requires a certain level of sophistication that I think is unfair to demand of people. This might seem totally bizarre to some of you but it is simply the state of affairs. So far I have been successful because more people are using Python and pandas to do things that they used to do in R. The NA concept in R is dead simple and I don't see why we are incapable of also implementing something that is just as dead simple. To we, the scipy elite let's call us, it seems simple: "oh, just pass an extra flag to all my array constructors!" But this along with the masked array concept is going to have two likely outcomes: 1) Create a great deal more complication in my already very large codebase and/or 2) force pandas users to understand the new masked arrays after I've carefully made it so they can be largely ignorant of NumPy The mostly-NaN-based solution I've cobbled together and tweaked over the last 42 months actually *works really well*, amazingly, with relatively little cost in code complexity. Having found a reasonably stable equilibrium I'm extremely resistant to upset the balance. So I don't know. After watching these threads bounce back and forth I'm frankly not all that hopeful about a solution arising that actually addresses my needs. best, Wes > Cheers! > Ben Root > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From matthew.brett at gmail.com Fri Oct 28 20:47:01 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 28 Oct 2011 17:47:01 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Fri, Oct 28, 2011 at 4:53 PM, Benjamin Root wrote: > > > On Friday, October 28, 2011, Matthew Brett wrote: >> Hi, >> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >> wrote: >>> >>> >>> On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >>> wrote: >>>> >>>> Hi, >>>> >>>> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >>>> wrote: >>>> > >>>> > >>>> > On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett >>>> > >>>> > wrote: >>>> >> >>>> >> Hi, >>>> >> >>>> >> On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett >>>> >> >>>> >> wrote: >>>> >> > Hi, >>>> >> > >>>> >> > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris >>>> >> > wrote: >>>> >> >> >>>> >> >> >>>> >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith >>>> >> >> wrote: >>>> >> >>> >>>> >> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant >>>> >> >>> >>>> >> >>> wrote: >>>> >> >>> > I think Nathaniel and Matthew provided very >>>> >> >>> > specific feedback that was helpful in understanding other >>>> >> >>> > perspectives >>>> >> >>> > of a >>>> >> >>> > difficult problem. In particular, I really wanted >>>> >> >>> > bit-patterns >>>> >> >>> > implemented. However, I also understand that Mark did quite >>>> >> >>> > a >>>> >> >>> > bit >>>> >> >>> > of >>>> >> >>> > work >>>> >> >>> > and altered his original designs quite a bit in response to >>>> >> >>> > community >>>> >> >>> > feedback. I wasn't a major part of the pull request >>>> >> >>> > discussion, >>>> >> >>> > nor >>>> >> >>> > did I >>>> >> >>> > merge the changes, but I support Charles if he reviewed the >>>> >> >>> > code >>>> >> >>> > and >>>> >> >>> > felt >>>> >> >>> > like it was the right thing to do. I likely would have done >>>> >> >>> > the >>>> >> >>> > same >>>> >> >>> > thing >>>> >> >>> > rather than let Mark Wiebe's work languish. >>>> >> >>> >>>> >> >>> My connectivity is spotty this week, so I'll stay out of the >>>> >> >>> technical >>>> >> >>> discussion for now, but I want to share a story. >>>> >> >>> >>>> >> >>> Maybe a year ago now, Jonathan Taylor and I were debating what >>>> >> >>> the >>>> >> >>> best API for describing statistical models would be -- whether we >>>> >> >>> wanted something like R's "formulas" (which I supported), or >>>> >> >>> another >>>> >> >>> approach based on sympy (his idea). To summarize, I thought his >>>> >> >>> API >>>> >> >>> was confusing, pointlessly complicated, and didn't actually solve >>>> >> >>> the >>>> >> >>> problem; he thought R-style formulas were superficially simpler >>>> >> >>> but >>>> >> >>> hopelessly confused and inconsistent underneath. Now, obviously, >>>> >> >>> I >>>> >> >>> was >>>> >> >>> right and he was wrong. Well, obvious to me, anyway... ;-) But it >>>> >> >>> wasn't like I could just wave a wand and make his arguments go >>>> >> >>> away, >>>> >> >>> no I should point out that the implementation hasn't - as far as >>>> >> >>> I can >> see - changed the discussion. The discussion was about the API. >> Implementations are useful for agreed APIs because they can point out >> where the API does not make sense or cannot be implemented. In this >> case, the API Mark said he was going to implement - he did implement - >> at least as far as I can see. Again, I'm happy to be corrected. >> >>>> In saying that we are insisting on our way, you are saying, implicitly, >>>> 'I >>>> am not going to negotiate'. >>> >>> That is only your interpretation. The observation that Mark compromised >>> quite a bit while you didn't seems largely correct to me. >> >> The problem here stems from our inability to work towards agreement, >> rather than standing on set positions. I set out what changes I think >> would make the current implementation OK. Can we please, please have >> a discussion about those points instead of trying to argue about who >> has given more ground. >> >>> That commitment would of course be good. However, even if that were >>> possible >>> before writing code and everyone agreed that the ideas of you and >>> Nathaniel >>> should be implemented in full, it's still not clear that either of you >>> would >>> be willing to write any code. Agreement without code still doesn't help >>> us >>> very much. >> >> I'm going to return to Nathaniel's point - it is a highly valuable >> thing to set ourselves the target of resolving substantial discussions >> by consensus. The route you are endorsing here is 'implementor >> wins'. We don't need to do it that way. We're a mature sensible >> bunch of adults who can talk out the issues until we agree they are >> ready for implementation, and then implement. That's all Nathaniel is >> saying. I think he's obviously right, and I'm sad that it isn't as >> clear to y'all as it is to me. >> >> Best, >> >> Matthew >> > > Everyone, can we please not do this?! I had enough of adults doing finger > pointing back over the summer during the whole debt ceiling debate. I think > we can all agree that we are better than the US congress? Yes, please. > Forget about rudeness or decision processes. No, that's a common mistake, which is to assume that any conversation about things which aren't technical, is not important. Nathaniel's point is important. Rudeness is important. The reason we've got into this mess is because we clearly don't have an agreed way of making decisions. That's why countries and open-source projects have constitutions, so this doesn't happen. > I will start by saying that I am willing to separate ignore and absent, but > only on the write side of things. On read, I want a single way to identify > the missing values. I also want only a single way to perform calculations > (either skip or propagate). Thank you - that is very helpful. Are you saying that you'd be OK setting missing values like this? >>> a.mask[0:2] = False For the read side, do you mean you're OK with this >>> a.isna() To identify the missing values, as is currently the case? Or something else? If so, then I think we're very close, it's just a discussion about names. > An indicator of success would be that people stop using NaNs and magic > numbers (-9999, anyone?) and we could even deprecate nansum(), or at least > strongly suggest in its docs to use NA. That is an excellent benchmark, Best, Matthew From charlesr.harris at gmail.com Fri Oct 28 21:32:19 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 28 Oct 2011 19:32:19 -0600 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Fri, Oct 28, 2011 at 6:45 PM, Wes McKinney wrote: > On Fri, Oct 28, 2011 at 7:53 PM, Benjamin Root wrote: > > > > > > On Friday, October 28, 2011, Matthew Brett > wrote: > >> Hi, > >> > >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers > >> wrote: > >>> > >>> > >>> On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett < > matthew.brett at gmail.com> > >>> wrote: > >>>> > >>>> Hi, > >>>> > >>>> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > >>>> wrote: > >>>> > > >>>> > > >>>> > On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett > >>>> > > >>>> > wrote: > >>>> >> > >>>> >> Hi, > >>>> >> > >>>> >> On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett > >>>> >> > >>>> >> wrote: > >>>> >> > Hi, > >>>> >> > > >>>> >> > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris > >>>> >> > wrote: > >>>> >> >> > >>>> >> >> > >>>> >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith > > >>>> >> >> wrote: > >>>> >> >>> > >>>> >> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant > >>>> >> >>> > >>>> >> >>> wrote: > >>>> >> >>> > I think Nathaniel and Matthew provided very > >>>> >> >>> > specific feedback that was helpful in understanding other > >>>> >> >>> > perspectives > >>>> >> >>> > of a > >>>> >> >>> > difficult problem. In particular, I really wanted > >>>> >> >>> > bit-patterns > >>>> >> >>> > implemented. However, I also understand that Mark did > quite > >>>> >> >>> > a > >>>> >> >>> > bit > >>>> >> >>> > of > >>>> >> >>> > work > >>>> >> >>> > and altered his original designs quite a bit in response to > >>>> >> >>> > community > >>>> >> >>> > feedback. I wasn't a major part of the pull request > >>>> >> >>> > discussion, > >>>> >> >>> > nor > >>>> >> >>> > did I > >>>> >> >>> > merge the changes, but I support Charles if he reviewed the > >>>> >> >>> > code > >>>> >> >>> > and > >>>> >> >>> > felt > >>>> >> >>> > like it was the right thing to do. I likely would have done > >>>> >> >>> > the > >>>> >> >>> > same > >>>> >> >>> > thing > >>>> >> >>> > rather than let Mark Wiebe's work languish. > >>>> >> >>> > >>>> >> >>> My connectivity is spotty this week, so I'll stay out of the > >>>> >> >>> technical > >>>> >> >>> discussion for now, but I want to share a story. > >>>> >> >>> > >>>> >> >>> Maybe a year ago now, Jonathan Taylor and I were debating what > >>>> >> >>> the > >>>> >> >>> best API for describing statistical models would be -- whether > we > >>>> >> >>> wanted something like R's "formulas" (which I supported), or > >>>> >> >>> another > >>>> >> >>> approach based on sympy (his idea). To summarize, I thought his > >>>> >> >>> API > >>>> >> >>> was confusing, pointlessly complicated, and didn't actually > solve > >>>> >> >>> the > >>>> >> >>> problem; he thought R-style formulas were superficially simpler > >>>> >> >>> but > >>>> >> >>> hopelessly confused and inconsistent underneath. Now, > obviously, > >>>> >> >>> I > >>>> >> >>> was > >>>> >> >>> right and he was wrong. Well, obvious to me, anyway... ;-) But > it > >>>> >> >>> wasn't like I could just wave a wand and make his arguments go > >>>> >> >>> away, > >>>> >> >>> no I should point out that the implementation hasn't - as far > as > >>>> >> >>> I can > >> see - changed the discussion. The discussion was about the API. > >> Implementations are useful for agreed APIs because they can point out > >> where the API does not make sense or cannot be implemented. In this > >> case, the API Mark said he was going to implement - he did implement - > >> at least as far as I can see. Again, I'm happy to be corrected. > >> > >>>> In saying that we are insisting on our way, you are saying, > implicitly, > >>>> 'I > >>>> am not going to negotiate'. > >>> > >>> That is only your interpretation. The observation that Mark compromised > >>> quite a bit while you didn't seems largely correct to me. > >> > >> The problem here stems from our inability to work towards agreement, > >> rather than standing on set positions. I set out what changes I think > >> would make the current implementation OK. Can we please, please have > >> a discussion about those points instead of trying to argue about who > >> has given more ground. > >> > >>> That commitment would of course be good. However, even if that were > >>> possible > >>> before writing code and everyone agreed that the ideas of you and > >>> Nathaniel > >>> should be implemented in full, it's still not clear that either of you > >>> would > >>> be willing to write any code. Agreement without code still doesn't help > >>> us > >>> very much. > >> > >> I'm going to return to Nathaniel's point - it is a highly valuable > >> thing to set ourselves the target of resolving substantial discussions > >> by consensus. The route you are endorsing here is 'implementor > >> wins'. We don't need to do it that way. We're a mature sensible > >> bunch of adults who can talk out the issues until we agree they are > >> ready for implementation, and then implement. That's all Nathaniel is > >> saying. I think he's obviously right, and I'm sad that it isn't as > >> clear to y'all as it is to me. > >> > >> Best, > >> > >> Matthew > >> > > > > Everyone, can we please not do this?! I had enough of adults doing finger > > pointing back over the summer during the whole debt ceiling debate. I > think > > we can all agree that we are better than the US congress? > > > > Forget about rudeness or decision processes. > > > > I will start by saying that I am willing to separate ignore and absent, > but > > only on the write side of things. On read, I want a single way to > identify > > the missing values. I also want only a single way to perform > calculations > > (either skip or propagate). > > > > An indicator of success would be that people stop using NaNs and magic > > numbers (-9999, anyone?) and we could even deprecate nansum(), or at > least > > strongly suggest in its docs to use NA. > > Well, I haven't completely made up my mind yet, will have to do some > more prototyping and playing (and potentially have some of my users > eat the differently-flavored dogfood), but I'm really not very > satisfied with the API at the moment. I'm mainly worried about the > abstraction leaking through to pandas users (this is a pretty large > group of people judging by # of downloads). > > The basic position I'm in is that I'm trying to push Python into a new > space, namely mainstream data analysis and statistical computing, one > that is solidly occupied by R and other such well-known players. My > target users are not computer scientists. They are not going to invest > in understanding dtypes very deeply or the internals of ndarray. In > fact I've spent a great deal of effort making it so that pandas users > can be productive and successful while having very little > understanding of NumPy. Yes, I essentially "protect" my users from > NumPy because using it well requires a certain level of sophistication > that I think is unfair to demand of people. This might seem totally > bizarre to some of you but it is simply the state of affairs. So far I > have been successful because more people are using Python and pandas > to do things that they used to do in R. The NA concept in R is dead > simple and I don't see why we are incapable of also implementing > something that is just as dead simple. To we, the scipy elite let's > call us, it seems simple: "oh, just pass an extra flag to all my array > constructors!" But this along with the masked array concept is going > to have two likely outcomes: > > 1) Create a great deal more complication in my already very large codebase > > and/or > > 2) force pandas users to understand the new masked arrays after I've > carefully made it so they can be largely ignorant of NumPy > > The mostly-NaN-based solution I've cobbled together and tweaked over > the last 42 months actually *works really well*, amazingly, with > relatively little cost in code complexity. Having found a reasonably > stable equilibrium I'm extremely resistant to upset the balance. > > So I don't know. After watching these threads bounce back and forth > I'm frankly not all that hopeful about a solution arising that > actually addresses my needs. > But Wes, what *are* your needs? You keep saying this, but we need examples of how you want to operate and how numpy fails. As to dtypes, internals, and all that, I don't see any of that in the current implementation, unless you mean the maskna and skipna keywords. I believe someone on the previous thread mentioned a way to deal with that. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Oct 28 22:49:05 2011 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 28 Oct 2011 19:49:05 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris wrote: > Matthew, the problem I have is that it seems that you and Nathaniel won't be > satisfied unless things are done *your* way. Hi Charles, I'm sorry if I've given this impression, and I know it's easy to feel this way in a contentious discussion. I've even been tempted to conclude the same about you, based on some of those emails in the last discussion where you told us that we should only give feedback by critiquing specific paragraphs of Mark's docs, even though our issues were with the whole architecture he was suggesting. But I'd like to believe that that isn't true of you, and in return, I can only point out the following things: 1) I've actually made a number of different suggestions and attempts to find ways to compromise (e.g., the "NA concepts" discussion, the alter-NEP that folded in a design for "ignored" values to try and satisfy that constituency even though I wouldn't use them myself, and on the conference call trying to find a subset of features that we could all agree on to implement first). I don't *want* my proposals implemented unless everyone else finds them persuasive. 2) This is why in my message I'm *not* advocating that we implement NAs according to my proposals; I'm advocating that you get just as much of a veto power on my proposals as I do on yours. Let's be honest: we both know all else being equal, we'd both rather not deal with the other right now, and might prefer not to hang out socially. But if we want this NA stuff to actually work and be used, then we need to find a way to work together despite that. Peace? -- Nathaniel From ben.root at ou.edu Fri Oct 28 23:38:45 2011 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 28 Oct 2011 22:38:45 -0500 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Matt, On Friday, October 28, 2011, Matthew Brett wrote: > >> Forget about rudeness or decision processes. > > No, that's a common mistake, which is to assume that any conversation > about things which aren't technical, is not important. Nathaniel's > point is important. Rudeness is important. The reason we've got into > this mess is because we clearly don't have an agreed way of making > decisions. That's why countries and open-source projects have > constitutions, so this doesn't happen. Don't get me wrong. In general, you are right. And maybe we all should discuss something to that effect for numpy. But I would rather do that when there isn't such contention and tempers. As for allegations of rudeness, I believe that we are actually very close to consensus that I immediately wanted to squelch any sort of meta-meta-disagreements about who was being rude to who. As a quick band-aide, anybody who felt slighted by me gets a drink on me at the next scipy conference. From this point on, let's institute a 10 minute rule -- write your email, wait ten minutes, read it again and edit it. > >> I will start by saying that I am willing to separate ignore and absent, but >> only on the write side of things. On read, I want a single way to identify >> the missing values. I also want only a single way to perform calculations >> (either skip or propagate). > > Thank you - that is very helpful. > > Are you saying that you'd be OK setting missing values like this? > >>>> a.mask[0:2] = False > Probably not that far, because that would be an attribute that may or may not exist. Rather, I might like the idea of a NA to "always" mean absent (and destroys - even through views), and MA (or some other name) which always means ignore (and has the masking behavior with views). This makes specific behaviors tied distinctly to specific objects. > For the read side, do you mean you're OK with this > >>>> a.isna() > > To identify the missing values, as is currently the case? Or something else? > Yes. A missing value is a missing value, regardless of it being absent or marked as ignored. But it is a bit more subtle than that. I should just be able to add two arrays together and the "data should know what to do". When the core ufuncs get this right (like min, max, sum, cumsum, diff, etc), then I don't have to do much to prepare higher level funcs for missing data. > If so, then I think we're very close, it's just a discussion about names. > And what does ignore + absent equals. ;-) >> An indicator of success would be that people stop using NaNs and magic >> numbers (-9999, anyone?) and we could even deprecate nansum(), or at least >> strongly suggest in its docs to use NA. > > That is an excellent benchmark, > > Best, > > Matthew > Cheers, Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason-sage at creativetrax.com Sat Oct 29 00:08:06 2011 From: jason-sage at creativetrax.com (Jason Grout) Date: Fri, 28 Oct 2011 23:08:06 -0500 Subject: [Numpy-discussion] consensus In-Reply-To: References: Message-ID: <4EAB7C26.1030106@creativetrax.com> On 10/28/11 10:38 PM, Benjamin Root wrote: > I might like the idea of a NA to "always" mean absent (and destroys - > even through views), and MA (or some other name) which always means > ignore (and has the masking behavior with views). I should point out that if I'm dictating code to someone (e.g., teaching, or helping someone verbally), it's going to be hard to distinguish between the verbal sounds of "NA" and "MA". And from a lurker (me), thanks for the discussion. I find it very interesting to read. Thanks, Jason Grout From hangenuit at gmail.com Sat Oct 29 02:11:27 2011 From: hangenuit at gmail.com (Han Genuit) Date: Sat, 29 Oct 2011 08:11:27 +0200 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, instead of putting up a pull request that reverts all the 25000 lines of code than have been written to support an NA mask, why won't you set up a pull request that uses the current code base to implement your own ideas on how it should work? From ralf.gommers at googlemail.com Sat Oct 29 05:22:17 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 29 Oct 2011 11:22:17 +0200 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Sat, Oct 29, 2011 at 3:32 AM, Charles R Harris wrote: > > > On Fri, Oct 28, 2011 at 6:45 PM, Wes McKinney wrote: > >> On Fri, Oct 28, 2011 at 7:53 PM, Benjamin Root wrote: >> > >> > >> > On Friday, October 28, 2011, Matthew Brett >> wrote: >> >> Hi, >> >> >> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >> >> wrote: >> >>> >> >>> >> >>> On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett < >> matthew.brett at gmail.com> >> >>> wrote: >> >>>> >> >>>> Hi, >> >>>> >> >>>> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> >>>> wrote: >> >>>> > >> >>>> > >> >>>> > On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett >> >>>> > >> >>>> > wrote: >> >>>> >> >> >>>> >> Hi, >> >>>> >> >> >>>> >> On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett >> >>>> >> >> >>>> >> wrote: >> >>>> >> > Hi, >> >>>> >> > >> >>>> >> > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris >> >>>> >> > wrote: >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith < >> njs at pobox.com> >> >>>> >> >> wrote: >> >>>> >> >>> >> >>>> >> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant >> >>>> >> >>> >> >>>> >> >>> wrote: >> >>>> >> >>> > I think Nathaniel and Matthew provided very >> >>>> >> >>> > specific feedback that was helpful in understanding other >> >>>> >> >>> > perspectives >> >>>> >> >>> > of a >> >>>> >> >>> > difficult problem. In particular, I really wanted >> >>>> >> >>> > bit-patterns >> >>>> >> >>> > implemented. However, I also understand that Mark did >> quite >> >>>> >> >>> > a >> >>>> >> >>> > bit >> >>>> >> >>> > of >> >>>> >> >>> > work >> >>>> >> >>> > and altered his original designs quite a bit in response to >> >>>> >> >>> > community >> >>>> >> >>> > feedback. I wasn't a major part of the pull request >> >>>> >> >>> > discussion, >> >>>> >> >>> > nor >> >>>> >> >>> > did I >> >>>> >> >>> > merge the changes, but I support Charles if he reviewed the >> >>>> >> >>> > code >> >>>> >> >>> > and >> >>>> >> >>> > felt >> >>>> >> >>> > like it was the right thing to do. I likely would have done >> >>>> >> >>> > the >> >>>> >> >>> > same >> >>>> >> >>> > thing >> >>>> >> >>> > rather than let Mark Wiebe's work languish. >> >>>> >> >>> >> >>>> >> >>> My connectivity is spotty this week, so I'll stay out of the >> >>>> >> >>> technical >> >>>> >> >>> discussion for now, but I want to share a story. >> >>>> >> >>> >> >>>> >> >>> Maybe a year ago now, Jonathan Taylor and I were debating what >> >>>> >> >>> the >> >>>> >> >>> best API for describing statistical models would be -- whether >> we >> >>>> >> >>> wanted something like R's "formulas" (which I supported), or >> >>>> >> >>> another >> >>>> >> >>> approach based on sympy (his idea). To summarize, I thought >> his >> >>>> >> >>> API >> >>>> >> >>> was confusing, pointlessly complicated, and didn't actually >> solve >> >>>> >> >>> the >> >>>> >> >>> problem; he thought R-style formulas were superficially >> simpler >> >>>> >> >>> but >> >>>> >> >>> hopelessly confused and inconsistent underneath. Now, >> obviously, >> >>>> >> >>> I >> >>>> >> >>> was >> >>>> >> >>> right and he was wrong. Well, obvious to me, anyway... ;-) But >> it >> >>>> >> >>> wasn't like I could just wave a wand and make his arguments go >> >>>> >> >>> away, >> >>>> >> >>> no I should point out that the implementation hasn't - as far >> as >> >>>> >> >>> I can >> >> see - changed the discussion. The discussion was about the API. >> >> Implementations are useful for agreed APIs because they can point out >> >> where the API does not make sense or cannot be implemented. In this >> >> case, the API Mark said he was going to implement - he did implement - >> >> at least as far as I can see. Again, I'm happy to be corrected. >> >> >> >>>> In saying that we are insisting on our way, you are saying, >> implicitly, >> >>>> 'I >> >>>> am not going to negotiate'. >> >>> >> >>> That is only your interpretation. The observation that Mark >> compromised >> >>> quite a bit while you didn't seems largely correct to me. >> >> >> >> The problem here stems from our inability to work towards agreement, >> >> rather than standing on set positions. I set out what changes I think >> >> would make the current implementation OK. Can we please, please have >> >> a discussion about those points instead of trying to argue about who >> >> has given more ground. >> >> >> >>> That commitment would of course be good. However, even if that were >> >>> possible >> >>> before writing code and everyone agreed that the ideas of you and >> >>> Nathaniel >> >>> should be implemented in full, it's still not clear that either of you >> >>> would >> >>> be willing to write any code. Agreement without code still doesn't >> help >> >>> us >> >>> very much. >> >> >> >> I'm going to return to Nathaniel's point - it is a highly valuable >> >> thing to set ourselves the target of resolving substantial discussions >> >> by consensus. The route you are endorsing here is 'implementor >> >> wins'. We don't need to do it that way. We're a mature sensible >> >> bunch of adults who can talk out the issues until we agree they are >> >> ready for implementation, and then implement. That's all Nathaniel is >> >> saying. I think he's obviously right, and I'm sad that it isn't as >> >> clear to y'all as it is to me. >> >> >> >> Best, >> >> >> >> Matthew >> >> >> > >> > Everyone, can we please not do this?! I had enough of adults doing >> finger >> > pointing back over the summer during the whole debt ceiling debate. I >> think >> > we can all agree that we are better than the US congress? >> > >> > Forget about rudeness or decision processes. >> > >> > I will start by saying that I am willing to separate ignore and absent, >> but >> > only on the write side of things. On read, I want a single way to >> identify >> > the missing values. I also want only a single way to perform >> calculations >> > (either skip or propagate). >> > >> > An indicator of success would be that people stop using NaNs and magic >> > numbers (-9999, anyone?) and we could even deprecate nansum(), or at >> least >> > strongly suggest in its docs to use NA. >> >> Well, I haven't completely made up my mind yet, will have to do some >> more prototyping and playing (and potentially have some of my users >> eat the differently-flavored dogfood), but I'm really not very >> satisfied with the API at the moment. I'm mainly worried about the >> abstraction leaking through to pandas users (this is a pretty large >> group of people judging by # of downloads). >> >> The basic position I'm in is that I'm trying to push Python into a new >> space, namely mainstream data analysis and statistical computing, one >> that is solidly occupied by R and other such well-known players. My >> target users are not computer scientists. They are not going to invest >> in understanding dtypes very deeply or the internals of ndarray. In >> fact I've spent a great deal of effort making it so that pandas users >> can be productive and successful while having very little >> understanding of NumPy. Yes, I essentially "protect" my users from >> NumPy because using it well requires a certain level of sophistication >> that I think is unfair to demand of people. This might seem totally >> bizarre to some of you but it is simply the state of affairs. So far I >> have been successful because more people are using Python and pandas >> to do things that they used to do in R. The NA concept in R is dead >> simple and I don't see why we are incapable of also implementing >> something that is just as dead simple. To we, the scipy elite let's >> call us, it seems simple: "oh, just pass an extra flag to all my array >> constructors!" But this along with the masked array concept is going >> to have two likely outcomes: >> >> 1) Create a great deal more complication in my already very large codebase >> >> and/or >> >> 2) force pandas users to understand the new masked arrays after I've >> carefully made it so they can be largely ignorant of NumPy >> >> The mostly-NaN-based solution I've cobbled together and tweaked over >> the last 42 months actually *works really well*, amazingly, with >> relatively little cost in code complexity. Having found a reasonably >> stable equilibrium I'm extremely resistant to upset the balance. >> >> So I don't know. After watching these threads bounce back and forth >> I'm frankly not all that hopeful about a solution arising that >> actually addresses my needs. >> > > But Wes, what *are* your needs? You keep saying this, but we need examples > of how you want to operate and how numpy fails. As to dtypes, internals, and > all that, I don't see any of that in the current implementation, unless you > mean the maskna and skipna keywords. I believe someone on the previous > thread mentioned a way to deal with that. > >From the release notes I just learned that skipna is basically the same as in R: "R's parameter rm.na=T is spelled skipna=True in NumPy." It provides a good summary of the current status in master: https://github.com/numpy/numpy/blob/master/doc/release/2.0.0-notes.rst Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sat Oct 29 06:26:19 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 29 Oct 2011 12:26:19 +0200 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett wrote: > Hi, > > On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers > wrote: > > > > > > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett > > > wrote: > >> > >> Hi, > >> > >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > >> wrote: > >> >> > >> > >> No, that's not what Nathaniel and I are saying at all. Nathaniel was > >> pointing to links for projects that care that everyone agrees before > >> they go ahead. > > > > It looked to me like there was a serious intent to come to an agreement, > or > > at least closer together. The discussion in the summer was going around > in > > circles though, and was too abstract and complex to follow. Therefore > Mark's > > choice of implementing something and then asking for feedback made sense > to > > me. > > I should point out that the implementation hasn't - as far as I can > see - changed the discussion. The discussion was about the API. > Implementations are useful for agreed APIs because they can point out > where the API does not make sense or cannot be implemented. In this > case, the API Mark said he was going to implement - he did implement - > at least as far as I can see. Again, I'm happy to be corrected. > Implementations can also help the discussion along, by allowing people to try out some of the proposed changes. It also allows to construct examples that show weaknesses, possibly to be solved by an alternative API. Maybe you can hold the complete history of this topic in your head and comprehend it, but for me it would be very helpful if someone said: - here's my dataset - this is what I want to do with it - this is the best I can do with the current implementation - here's how API X would allow me to solve this better or simpler This can be done much better with actual data and an actual implementation than with a design proposal. You seem to disagree with this statement. That's fine. I would hope though that you recognize that concrete examples help people like me, and construct one or two to help us out. > > >> In saying that we are insisting on our way, you are saying, implicitly, > 'I > >> am not going to negotiate'. > > > > That is only your interpretation. The observation that Mark compromised > > quite a bit while you didn't seems largely correct to me. > > The problem here stems from our inability to work towards agreement, > rather than standing on set positions. I set out what changes I think > would make the current implementation OK. Can we please, please have > a discussion about those points instead of trying to argue about who > has given more ground. > > > That commitment would of course be good. However, even if that were > possible > > before writing code and everyone agreed that the ideas of you and > Nathaniel > > should be implemented in full, it's still not clear that either of you > would > > be willing to write any code. Agreement without code still doesn't help > us > > very much. > > I'm going to return to Nathaniel's point - it is a highly valuable > thing to set ourselves the target of resolving substantial discussions > by consensus. The route you are endorsing here is 'implementor > wins'. I'm not. All I want to point out is is that design and implementation are not completely separated either. > We don't need to do it that way. We're a mature sensible > bunch of adults Agreed:) > who can talk out the issues until we agree they are > ready for implementation, and then implement. The history of this discussion doesn't suggest it straightforward to get a design right first time. It's a complex subject. The second part of your statement, "and then implement", sounds so simple. The reality is that there are only a handful of developers who have done a significant amount of work on the numpy core in the last two years. I haven't seen anyone saying they are planning to implement (part of) whatever design the outcome of this discussion will be. I don't think it's strange to keep this in mind to some extent. Regards, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From antonio.valentino at tiscali.it Sat Oct 29 13:26:35 2011 From: antonio.valentino at tiscali.it (Antonio Valentino) Date: Sat, 29 Oct 2011 19:26:35 +0200 Subject: [Numpy-discussion] ANN: PyTables 2.3.1 released Message-ID: <4EAC374B.3000607@tiscali.it> =========================== Announcing PyTables 2.3.1 =========================== We are happy to announce PyTables 2.3.1. This is a bugfix release. Upgrading is recommended for users that are running PyTables in production environments. What's new ========== This release includes a small number of changes. It only fixes a couple of bugs that are considered serious even if they should not impact a large number of users: - :issue:`113` caused installation of PyTables 2.3 to fail on hosts with multiple python versions installed. - :issue:`111` prevented to read scalar datasets of UnImplemented types. In case you want to know more in detail what has changed in this version, have a look at: http://pytables.github.com/release_notes.html You can download a source package with generated PDF and HTML docs, as well as binaries for Windows, from: http://sourceforge.net/projects/pytables/files/pytables/@VERSION@ For an on-line version of the manual, visit: http://pytables.github.com/usersguide/index.html What it is? =========== PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. PyTables includes OPSI, a new indexing technology, allowing to perform data lookups in tables exceeding 10 gigarows (10**10 rows) in less than 1 tenth of a second. Resources ========= About PyTables: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments =============== Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy (and numarray!) makers. Without them, PyTables simply would not exist. Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. ---- **Enjoy data!** -- The PyTables Team From efiring at hawaii.edu Sat Oct 29 14:14:47 2011 From: efiring at hawaii.edu (Eric Firing) Date: Sat, 29 Oct 2011 08:14:47 -1000 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: <4EAC4297.3010004@hawaii.edu> On 10/29/2011 12:26 AM, Ralf Gommers wrote: > The history of this discussion doesn't suggest it straightforward to get > a design right first time. It's a complex subject. > > The second part of your statement, "and then implement", sounds so > simple. The reality is that there are only a handful of developers who > have done a significant amount of work on the numpy core in the last two > years. I haven't seen anyone saying they are planning to implement (part > of) whatever design the outcome of this discussion will be. I don't > think it's strange to keep this in mind to some extent. ...including the fact that last summer, Mark had a brief one-time opportunity to contribute major NA code. I expect that even if some modifications are made to what he contributed, letting him get on with it will turn out to have been the right move. Apparently Travis hopes to put in a burst of coding in 2012: http://technicaldiscovery.blogspot.com/2011/10/thoughts-on-porting-numpy-to-pypy.html Go to the section "NumPy will be evolving rapidly over the coming years". Note that "missing data bit-patterns" is on his list, consistent with his most recent messages. Eric From wesmckinn at gmail.com Sat Oct 29 14:14:54 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Sat, 29 Oct 2011 14:14:54 -0400 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Fri, Oct 28, 2011 at 9:32 PM, Charles R Harris wrote: > > > On Fri, Oct 28, 2011 at 6:45 PM, Wes McKinney wrote: >> >> On Fri, Oct 28, 2011 at 7:53 PM, Benjamin Root wrote: >> > >> > >> > On Friday, October 28, 2011, Matthew Brett >> > wrote: >> >> Hi, >> >> >> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >> >> wrote: >> >>> >> >>> >> >>> On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >> >>> >> >>> wrote: >> >>>> >> >>>> Hi, >> >>>> >> >>>> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> >>>> wrote: >> >>>> > >> >>>> > >> >>>> > On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett >> >>>> > >> >>>> > wrote: >> >>>> >> >> >>>> >> Hi, >> >>>> >> >> >>>> >> On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett >> >>>> >> >> >>>> >> wrote: >> >>>> >> > Hi, >> >>>> >> > >> >>>> >> > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris >> >>>> >> > wrote: >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith >> >>>> >> >> >> >>>> >> >> wrote: >> >>>> >> >>> >> >>>> >> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant >> >>>> >> >>> >> >>>> >> >>> wrote: >> >>>> >> >>> > I think Nathaniel and Matthew provided very >> >>>> >> >>> > specific feedback that was helpful in understanding other >> >>>> >> >>> > perspectives >> >>>> >> >>> > of a >> >>>> >> >>> > difficult problem. ? ? In particular, I really wanted >> >>>> >> >>> > bit-patterns >> >>>> >> >>> > implemented. ? ?However, I also understand that Mark did >> >>>> >> >>> > quite >> >>>> >> >>> > a >> >>>> >> >>> > bit >> >>>> >> >>> > of >> >>>> >> >>> > work >> >>>> >> >>> > and altered his original designs quite a bit in response to >> >>>> >> >>> > community >> >>>> >> >>> > feedback. ? I wasn't a major part of the pull request >> >>>> >> >>> > discussion, >> >>>> >> >>> > nor >> >>>> >> >>> > did I >> >>>> >> >>> > merge the changes, but I support Charles if he reviewed the >> >>>> >> >>> > code >> >>>> >> >>> > and >> >>>> >> >>> > felt >> >>>> >> >>> > like it was the right thing to do. ?I likely would have done >> >>>> >> >>> > the >> >>>> >> >>> > same >> >>>> >> >>> > thing >> >>>> >> >>> > rather than let Mark Wiebe's work languish. >> >>>> >> >>> >> >>>> >> >>> My connectivity is spotty this week, so I'll stay out of the >> >>>> >> >>> technical >> >>>> >> >>> discussion for now, but I want to share a story. >> >>>> >> >>> >> >>>> >> >>> Maybe a year ago now, Jonathan Taylor and I were debating what >> >>>> >> >>> the >> >>>> >> >>> best API for describing statistical models would be -- whether >> >>>> >> >>> we >> >>>> >> >>> wanted something like R's "formulas" (which I supported), or >> >>>> >> >>> another >> >>>> >> >>> approach based on sympy (his idea). To summarize, I thought >> >>>> >> >>> his >> >>>> >> >>> API >> >>>> >> >>> was confusing, pointlessly complicated, and didn't actually >> >>>> >> >>> solve >> >>>> >> >>> the >> >>>> >> >>> problem; he thought R-style formulas were superficially >> >>>> >> >>> simpler >> >>>> >> >>> but >> >>>> >> >>> hopelessly confused and inconsistent underneath. Now, >> >>>> >> >>> obviously, >> >>>> >> >>> I >> >>>> >> >>> was >> >>>> >> >>> right and he was wrong. Well, obvious to me, anyway... ;-) But >> >>>> >> >>> it >> >>>> >> >>> wasn't like I could just wave a wand and make his arguments go >> >>>> >> >>> away, >> >>>> >> >>> no I should point out that the implementation hasn't - as far >> >>>> >> >>> as >> >>>> >> >>> I can >> >> see - changed the discussion. ?The discussion was about the API. >> >> Implementations are useful for agreed APIs because they can point out >> >> where the API does not make sense or cannot be implemented. ?In this >> >> case, the API Mark said he was going to implement - he did implement - >> >> at least as far as I can see. ?Again, I'm happy to be corrected. >> >> >> >>>> In saying that we are insisting on our way, you are saying, >> >>>> implicitly, >> >>>> 'I >> >>>> am not going to negotiate'. >> >>> >> >>> That is only your interpretation. The observation that Mark >> >>> compromised >> >>> quite a bit while you didn't seems largely correct to me. >> >> >> >> The problem here stems from our inability to work towards agreement, >> >> rather than standing on set positions. ?I set out what changes I think >> >> would make the current implementation OK. ?Can we please, please have >> >> a discussion about those points instead of trying to argue about who >> >> has given more ground. >> >> >> >>> That commitment would of course be good. However, even if that were >> >>> possible >> >>> before writing code and everyone agreed that the ideas of you and >> >>> Nathaniel >> >>> should be implemented in full, it's still not clear that either of you >> >>> would >> >>> be willing to write any code. Agreement without code still doesn't >> >>> help >> >>> us >> >>> very much. >> >> >> >> I'm going to return to Nathaniel's point - it is a highly valuable >> >> thing to set ourselves the target of resolving substantial discussions >> >> by consensus. ? The route you are endorsing here is 'implementor >> >> wins'. ? We don't need to do it that way. ?We're a mature sensible >> >> bunch of adults who can talk out the issues until we agree they are >> >> ready for implementation, and then implement. ?That's all Nathaniel is >> >> saying. ?I think he's obviously right, and I'm sad that it isn't as >> >> clear to y'all as it is to me. >> >> >> >> Best, >> >> >> >> Matthew >> >> >> > >> > Everyone, can we please not do this?! I had enough of adults doing >> > finger >> > pointing back over the summer during the whole debt ceiling debate. ?I >> > think >> > we can all agree that we are better than the US congress? >> > >> > Forget about rudeness or decision processes. >> > >> > I will start by saying that I am willing to separate ignore and absent, >> > but >> > only on the write side of things. ?On read, I want a single way to >> > identify >> > the missing values. ?I also want only a single way to perform >> > calculations >> > (either skip or propagate). >> > >> > An indicator of success would be that people stop using NaNs and magic >> > numbers (-9999, anyone?) and we could even deprecate nansum(), or at >> > least >> > strongly suggest in its docs to use NA. >> >> Well, I haven't completely made up my mind yet, will have to do some >> more prototyping and playing (and potentially have some of my users >> eat the differently-flavored dogfood), but I'm really not very >> satisfied with the API at the moment. I'm mainly worried about the >> abstraction leaking through to pandas users (this is a pretty large >> group of people judging by # of downloads). >> >> The basic position I'm in is that I'm trying to push Python into a new >> space, namely mainstream data analysis and statistical computing, one >> that is solidly occupied by R and other such well-known players. My >> target users are not computer scientists. They are not going to invest >> in understanding dtypes very deeply or the internals of ndarray. In >> fact I've spent a great deal of effort making it so that pandas users >> can be productive and successful while having very little >> understanding of NumPy. Yes, I essentially "protect" my users from >> NumPy because using it well requires a certain level of sophistication >> that I think is unfair to demand of people. This might seem totally >> bizarre to some of you but it is simply the state of affairs. So far I >> have been successful because more people are using Python and pandas >> to do things that they used to do in R. The NA concept in R is dead >> simple and I don't see why we are incapable of also implementing >> something that is just as dead simple. To we, the scipy elite let's >> call us, it seems simple: "oh, just pass an extra flag to all my array >> constructors!" But this along with the masked array concept is going >> to have two likely outcomes: >> >> 1) Create a great deal more complication in my already very large codebase >> >> and/or >> >> 2) force pandas users to understand the new masked arrays after I've >> carefully made it so they can be largely ignorant of NumPy >> >> The mostly-NaN-based solution I've cobbled together and tweaked over >> the last 42 months actually *works really well*, amazingly, with >> relatively little cost in code complexity. Having found a reasonably >> stable equilibrium I'm extremely resistant to upset the balance. >> >> So I don't know. After watching these threads bounce back and forth >> I'm frankly not all that hopeful about a solution arising that >> actually addresses my needs. > > But Wes, what *are* your needs? You keep saying this, but we need examples > of how you want to operate and how numpy fails. As to dtypes, internals, and > all that, I don't see any of that in the current implementation, unless you > mean the maskna and skipna keywords. I believe someone on the previous > thread mentioned a way to deal with that. > > Chuck > Here are my needs: 1) How NAs are implemented cannot be end user visible. Having to pass maskna=True is a problem. I suppose a solution is to set the flag to true on every array inside of pandas so the user never knows (you mentioned someone else had some other solution, i could go back and dig it up?) 2) Performance: I can't accept more than say 2x overhead in floating point array operations (binary ops or reductions). Last time I checked we were a long way away from that 3) Implementation of NA-aware algorithms in Cython. A lot of pandas is about moving data around. Bit patterns would make life a lot easier because the code wouldn't have to change (much). But with masked arrays I'll have to move both data and mask values. Not the end of the world but is just the price you pay, I guess. Things in R are a bit simpler re: bit patterns because there's only double, integer, string (character), and boolean dtypes, whereas NumPy has the whole C type hierarchy. So I can appreciate that doing bit patterns across all the dtypes would be really hard. In any case, I recognize that the current implementation will be useful to a lot of people, but it may not meet my performance and usability requirements. As I said, the solution I've cooked up has worked well so far, and since it isn't a major pain point I may just adopt the "ain't broke, don't fix" attitude and focus my efforts on building new features. "Practicality beats purity", I suppose - W > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From matthew.brett at gmail.com Sat Oct 29 14:43:15 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 29 Oct 2011 11:43:15 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Fri, Oct 28, 2011 at 8:38 PM, Benjamin Root wrote: > Matt, > > On Friday, October 28, 2011, Matthew Brett wrote: >> >>> Forget about rudeness or decision processes. >> >> No, that's a common mistake, which is to assume that any conversation >> about things which aren't technical, is not important. ? Nathaniel's >> point is important. ?Rudeness is important. The reason we've got into >> this mess is because we clearly don't have an agreed way of making >> decisions. ?That's why countries and open-source projects have >> constitutions, so this doesn't happen. > > Don't get me wrong. In general, you are right. ?And maybe we all should > discuss something to that effect for numpy. ?But I would rather do that when > there isn't such contention and tempers. That's a reasonable point. > As for allegations of rudeness, I believe that we are actually very close to > consensus that I immediately wanted to squelch any sort of > meta-meta-disagreements about who was being rude to who. ?As a quick > band-aide, anybody who felt slighted by me gets a drink on me at the next > scipy conference. ?From this point on, let's institute a 10 minute rule -- > write your email, wait ten minutes, read it again and edit it. Good offer. I make the same one. >>> I will start by saying that I am willing to separate ignore and absent, >>> but >>> only on the write side of things. ?On read, I want a single way to >>> identify >>> the missing values. ?I also want only a single way to perform >>> calculations >>> (either skip or propagate). >> >> Thank you - that is very helpful. >> >> Are you saying that you'd be OK setting missing values like this? >> >>>>> a.mask[0:2] = False >> > > Probably not that far, because that would be an attribute that may or may > not exist. ?Rather, I might like the idea of a NA to "always" mean absent > (and destroys - even through views), and MA (or some other name) which > always means ignore (and has the masking behavior with views). This makes > specific behaviors tied distinctly to specific objects. Ah - yes - thank you. I think you and I at least have somewhere to go for agreement, but, I don't know how to work towards a numpy-wide agreement. Do you have any thoughts? >> For the read side, do you mean you're OK with this >> >>>>> a.isna() >> >> To identify the missing values, as is currently the case? ?Or something >> else? >> > > Yes. ?A missing value is a missing value, regardless of it being absent or > marked as ignored. ?But it is a bit more subtle than that. ?I should just be > able to add two arrays together and the "data should know what to do". When > the core ufuncs get this right (like min, max, sum, cumsum, diff, etc), then > I don't have to do much to prepare higher level funcs for missing data. > >> If so, then I think we're very close, it's just a discussion about names. >> > > And what does ignore + absent equals. ;-) ignore + absent == special_value_of_some_sort :) Just joking, See you, Matthew From charlesr.harris at gmail.com Sat Oct 29 14:43:41 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 29 Oct 2011 12:43:41 -0600 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Sat, Oct 29, 2011 at 12:14 PM, Wes McKinney wrote: > On Fri, Oct 28, 2011 at 9:32 PM, Charles R Harris > wrote: > > > > > > On Fri, Oct 28, 2011 at 6:45 PM, Wes McKinney > wrote: > >> > >> On Fri, Oct 28, 2011 at 7:53 PM, Benjamin Root wrote: > >> > > >> > > >> > On Friday, October 28, 2011, Matthew Brett > >> > wrote: > >> >> Hi, > >> >> > >> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers > >> >> wrote: > >> >>> > >> >>> > >> >>> On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett > >> >>> > >> >>> wrote: > >> >>>> > >> >>>> Hi, > >> >>>> > >> >>>> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > >> >>>> wrote: > >> >>>> > > >> >>>> > > >> >>>> > On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett > >> >>>> > > >> >>>> > wrote: > >> >>>> >> > >> >>>> >> Hi, > >> >>>> >> > >> >>>> >> On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett > >> >>>> >> > >> >>>> >> wrote: > >> >>>> >> > Hi, > >> >>>> >> > > >> >>>> >> > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris > >> >>>> >> > wrote: > >> >>>> >> >> > >> >>>> >> >> > >> >>>> >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith > >> >>>> >> >> > >> >>>> >> >> wrote: > >> >>>> >> >>> > >> >>>> >> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant > >> >>>> >> >>> > >> >>>> >> >>> wrote: > >> >>>> >> >>> > I think Nathaniel and Matthew provided very > >> >>>> >> >>> > specific feedback that was helpful in understanding other > >> >>>> >> >>> > perspectives > >> >>>> >> >>> > of a > >> >>>> >> >>> > difficult problem. In particular, I really wanted > >> >>>> >> >>> > bit-patterns > >> >>>> >> >>> > implemented. However, I also understand that Mark did > >> >>>> >> >>> > quite > >> >>>> >> >>> > a > >> >>>> >> >>> > bit > >> >>>> >> >>> > of > >> >>>> >> >>> > work > >> >>>> >> >>> > and altered his original designs quite a bit in response > to > >> >>>> >> >>> > community > >> >>>> >> >>> > feedback. I wasn't a major part of the pull request > >> >>>> >> >>> > discussion, > >> >>>> >> >>> > nor > >> >>>> >> >>> > did I > >> >>>> >> >>> > merge the changes, but I support Charles if he reviewed > the > >> >>>> >> >>> > code > >> >>>> >> >>> > and > >> >>>> >> >>> > felt > >> >>>> >> >>> > like it was the right thing to do. I likely would have > done > >> >>>> >> >>> > the > >> >>>> >> >>> > same > >> >>>> >> >>> > thing > >> >>>> >> >>> > rather than let Mark Wiebe's work languish. > >> >>>> >> >>> > >> >>>> >> >>> My connectivity is spotty this week, so I'll stay out of the > >> >>>> >> >>> technical > >> >>>> >> >>> discussion for now, but I want to share a story. > >> >>>> >> >>> > >> >>>> >> >>> Maybe a year ago now, Jonathan Taylor and I were debating > what > >> >>>> >> >>> the > >> >>>> >> >>> best API for describing statistical models would be -- > whether > >> >>>> >> >>> we > >> >>>> >> >>> wanted something like R's "formulas" (which I supported), or > >> >>>> >> >>> another > >> >>>> >> >>> approach based on sympy (his idea). To summarize, I thought > >> >>>> >> >>> his > >> >>>> >> >>> API > >> >>>> >> >>> was confusing, pointlessly complicated, and didn't actually > >> >>>> >> >>> solve > >> >>>> >> >>> the > >> >>>> >> >>> problem; he thought R-style formulas were superficially > >> >>>> >> >>> simpler > >> >>>> >> >>> but > >> >>>> >> >>> hopelessly confused and inconsistent underneath. Now, > >> >>>> >> >>> obviously, > >> >>>> >> >>> I > >> >>>> >> >>> was > >> >>>> >> >>> right and he was wrong. Well, obvious to me, anyway... ;-) > But > >> >>>> >> >>> it > >> >>>> >> >>> wasn't like I could just wave a wand and make his arguments > go > >> >>>> >> >>> away, > >> >>>> >> >>> no I should point out that the implementation hasn't - as > far > >> >>>> >> >>> as > >> >>>> >> >>> I can > >> >> see - changed the discussion. The discussion was about the API. > >> >> Implementations are useful for agreed APIs because they can point out > >> >> where the API does not make sense or cannot be implemented. In this > >> >> case, the API Mark said he was going to implement - he did implement > - > >> >> at least as far as I can see. Again, I'm happy to be corrected. > >> >> > >> >>>> In saying that we are insisting on our way, you are saying, > >> >>>> implicitly, > >> >>>> 'I > >> >>>> am not going to negotiate'. > >> >>> > >> >>> That is only your interpretation. The observation that Mark > >> >>> compromised > >> >>> quite a bit while you didn't seems largely correct to me. > >> >> > >> >> The problem here stems from our inability to work towards agreement, > >> >> rather than standing on set positions. I set out what changes I > think > >> >> would make the current implementation OK. Can we please, please have > >> >> a discussion about those points instead of trying to argue about who > >> >> has given more ground. > >> >> > >> >>> That commitment would of course be good. However, even if that were > >> >>> possible > >> >>> before writing code and everyone agreed that the ideas of you and > >> >>> Nathaniel > >> >>> should be implemented in full, it's still not clear that either of > you > >> >>> would > >> >>> be willing to write any code. Agreement without code still doesn't > >> >>> help > >> >>> us > >> >>> very much. > >> >> > >> >> I'm going to return to Nathaniel's point - it is a highly valuable > >> >> thing to set ourselves the target of resolving substantial > discussions > >> >> by consensus. The route you are endorsing here is 'implementor > >> >> wins'. We don't need to do it that way. We're a mature sensible > >> >> bunch of adults who can talk out the issues until we agree they are > >> >> ready for implementation, and then implement. That's all Nathaniel > is > >> >> saying. I think he's obviously right, and I'm sad that it isn't as > >> >> clear to y'all as it is to me. > >> >> > >> >> Best, > >> >> > >> >> Matthew > >> >> > >> > > >> > Everyone, can we please not do this?! I had enough of adults doing > >> > finger > >> > pointing back over the summer during the whole debt ceiling debate. I > >> > think > >> > we can all agree that we are better than the US congress? > >> > > >> > Forget about rudeness or decision processes. > >> > > >> > I will start by saying that I am willing to separate ignore and > absent, > >> > but > >> > only on the write side of things. On read, I want a single way to > >> > identify > >> > the missing values. I also want only a single way to perform > >> > calculations > >> > (either skip or propagate). > >> > > >> > An indicator of success would be that people stop using NaNs and magic > >> > numbers (-9999, anyone?) and we could even deprecate nansum(), or at > >> > least > >> > strongly suggest in its docs to use NA. > >> > >> Well, I haven't completely made up my mind yet, will have to do some > >> more prototyping and playing (and potentially have some of my users > >> eat the differently-flavored dogfood), but I'm really not very > >> satisfied with the API at the moment. I'm mainly worried about the > >> abstraction leaking through to pandas users (this is a pretty large > >> group of people judging by # of downloads). > >> > >> The basic position I'm in is that I'm trying to push Python into a new > >> space, namely mainstream data analysis and statistical computing, one > >> that is solidly occupied by R and other such well-known players. My > >> target users are not computer scientists. They are not going to invest > >> in understanding dtypes very deeply or the internals of ndarray. In > >> fact I've spent a great deal of effort making it so that pandas users > >> can be productive and successful while having very little > >> understanding of NumPy. Yes, I essentially "protect" my users from > >> NumPy because using it well requires a certain level of sophistication > >> that I think is unfair to demand of people. This might seem totally > >> bizarre to some of you but it is simply the state of affairs. So far I > >> have been successful because more people are using Python and pandas > >> to do things that they used to do in R. The NA concept in R is dead > >> simple and I don't see why we are incapable of also implementing > >> something that is just as dead simple. To we, the scipy elite let's > >> call us, it seems simple: "oh, just pass an extra flag to all my array > >> constructors!" But this along with the masked array concept is going > >> to have two likely outcomes: > >> > >> 1) Create a great deal more complication in my already very large > codebase > >> > >> and/or > >> > >> 2) force pandas users to understand the new masked arrays after I've > >> carefully made it so they can be largely ignorant of NumPy > >> > >> The mostly-NaN-based solution I've cobbled together and tweaked over > >> the last 42 months actually *works really well*, amazingly, with > >> relatively little cost in code complexity. Having found a reasonably > >> stable equilibrium I'm extremely resistant to upset the balance. > >> > >> So I don't know. After watching these threads bounce back and forth > >> I'm frankly not all that hopeful about a solution arising that > >> actually addresses my needs. > > > > But Wes, what *are* your needs? You keep saying this, but we need > examples > > of how you want to operate and how numpy fails. As to dtypes, internals, > and > > all that, I don't see any of that in the current implementation, unless > you > > mean the maskna and skipna keywords. I believe someone on the previous > > thread mentioned a way to deal with that. > > > > Chuck > > > > Here are my needs: > > 1) How NAs are implemented cannot be end user visible. Having to pass > maskna=True is a problem. I suppose a solution is to set the flag to > true on every array inside of pandas so the user never knows (you > mentioned someone else had some other solution, i could go back and > dig it up?) > I believe it was Eric Firing who mentioned that he raised this question during development and Mark offered a potential solution. What ever that solution was, we should take a look at implementing it. > 2) Performance: I can't accept more than say 2x overhead in floating > point array operations (binary ops or reductions). Last time I checked > we were a long way away from that > > Known problem, and probably fixable by pushing things down into the inner ufunc loops. What we have at the moment is a prototype for testing the API and that is what we need feedback on. > 3) Implementation of NA-aware algorithms in Cython. A lot of pandas is > about moving data around. Bit patterns would make life a lot easier > because the code wouldn't have to change (much). But with masked > arrays I'll have to move both data and mask values. Not the end of the > world but is just the price you pay, I guess. > > Agree that this is a problem, along with memory usage. One solution is to have a way to translate to bit patterns for export/import. Note that in the wild some data sets come with separate masks, sometimes several for different conditions, so the current implementation would work better for those. We need to support several options here. > Things in R are a bit simpler re: bit patterns because there's only > double, integer, string (character), and boolean dtypes, whereas NumPy > has the whole C type hierarchy. So I can appreciate that doing bit > patterns across all the dtypes would be really hard. > > We could maybe limit it to float types, strings, and booleans, maybe dates also. I think integers are problematical, for instance a uint8 255 turns up in 8 bit images and means saturated, not missing. > In any case, I recognize that the current implementation will be > useful to a lot of people, but it may not meet my performance and > usability requirements. As I said, the solution I've cooked up has > worked well so far, and since it isn't a major pain point I may just > adopt the "ain't broke, don't fix" attitude and focus my efforts on > building new features. "Practicality beats purity", I suppose > > That's perfectly reasonable. It would still help if you gave examples of use cases where the current API doesn't work for you. I don't see much difference between code using nan's and code using NA at the API level apart from the maskna/skipna keywords. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Oct 29 15:04:29 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 29 Oct 2011 12:04:29 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers wrote: > > > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett > wrote: >> >> Hi, >> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >> wrote: >> > >> > >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >> > >> > wrote: >> >> >> >> Hi, >> >> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> >> wrote: >> >> >> >> >> >> >> No, that's not what Nathaniel and I are saying at all. Nathaniel was >> >> pointing to links for projects that care that everyone agrees before >> >> they go ahead. >> > >> > It looked to me like there was a serious intent to come to an agreement, >> > or >> > at least closer together. The discussion in the summer was going around >> > in >> > circles though, and was too abstract and complex to follow. Therefore >> > Mark's >> > choice of implementing something and then asking for feedback made sense >> > to >> > me. >> >> I should point out that the implementation hasn't - as far as I can >> see - changed the discussion. ?The discussion was about the API. >> >> Implementations are useful for agreed APIs because they can point out >> where the API does not make sense or cannot be implemented. ?In this >> case, the API Mark said he was going to implement - he did implement - >> at least as far as I can see. ?Again, I'm happy to be corrected. > > Implementations can also help the discussion along, by allowing people to > try out some of the proposed changes. It also allows to construct examples > that show weaknesses, possibly to be solved by an alternative API. Maybe you > can hold the complete history of this topic in your head and comprehend it, > but for me it would be very helpful if someone said: > - here's my dataset > - this is what I want to do with it > - this is the best I can do with the current implementation > - here's how API X would allow me to solve this better or simpler > This can be done much better with actual data and an actual implementation > than with a design proposal. You seem to disagree with this statement. > That's fine. I would hope though that you recognize that concrete examples > help people like me, and construct one or two to help us out. That's what use-cases are for in designing APIs. There are examples of use in the NEP: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst the alterNEP: https://gist.github.com/1056379 and my longer email to Travis: http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored Mark has done a nice job of documentation: http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html If you want to understand what the alterNEP case is, I'd suggest the email, just because it's the most recent and I think the terminology is slightly clearer. Doing the same examples on a larger array won't make the point easier to understand. The discussion is about what the right concepts are, and you can help by looking at the snippets of code in those documents, and deciding for yourself whether you think the current masking / NA implementation seems natural and easy to explain, or rather forced and difficult to explain, and then email back trying to explain your impression (which is not always easy). >> >> In saying that we are insisting on our way, you are saying, implicitly, >> >> 'I >> >> am not going to negotiate'. >> > >> > That is only your interpretation. The observation that Mark compromised >> > quite a bit while you didn't seems largely correct to me. >> >> The problem here stems from our inability to work towards agreement, >> rather than standing on set positions. ?I set out what changes I think >> would make the current implementation OK. ?Can we please, please have >> a discussion about those points instead of trying to argue about who >> has given more ground. >> >> > That commitment would of course be good. However, even if that were >> > possible >> > before writing code and everyone agreed that the ideas of you and >> > Nathaniel >> > should be implemented in full, it's still not clear that either of you >> > would >> > be willing to write any code. Agreement without code still doesn't help >> > us >> > very much. >> >> I'm going to return to Nathaniel's point - it is a highly valuable >> thing to set ourselves the target of resolving substantial discussions >> by consensus. ? The route you are endorsing here is 'implementor >> wins'. > > I'm not. All I want to point out is is that design and implementation are > not completely separated either. No, they often interact. I was trying to explain why, in this case, the implementation hasn't changed the issues substantially, as far as I can see. If you think otherwise, then that is helpful information, because you can feed back about where the initial discussion has been overtaken by the implementation, and so we can strip down the discussion to its essential parts. >> We don't need to do it that way. ?We're a mature sensible >> bunch of adults > > Agreed:) Ah - if only it was that easy :) >> who can talk out the issues until we agree they are >> ready for implementation, and then implement. > > The history of this discussion doesn't suggest it straightforward to get a > design right first time. It's a complex subject. Right - and it's more complex when only some of the people involved are interested in the discussion coming to a resolution. That's Nathaniel's point - that although it seems inefficient, working towards a good resolution of big issues like this is very valuable in getting the ideas right. > The second part of your statement, "and then implement", sounds so simple. > The reality is that there are only a handful of developers who have done a > significant amount of work on the numpy core in the last two years. I > haven't seen anyone saying they are planning to implement (part of) whatever > design the outcome of this discussion will be. I don't think it's strange to > keep this in mind to some extent. No, but consensus building is a little bit all or none. I guess we'd all like consensus, but then sometimes, as Nathaniel points out, it is inconvenient and annoying. If we have no stated commitment to consensus, at some unpredictable point in the discussion, those who are implementing will - obviously - just duck out and do the implementation. I would do that, I guess. Maybe I have done in the projects I'm involved in. The question Nathaniel is raising, and me too, in a less coherent way, is - is that fine? Does it matter that we are short-cutting through substantial discussions? Is that really - in the long term - a more efficient way of building both the code and the community? Best, Matthew From charlesr.harris at gmail.com Sat Oct 29 15:19:04 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 29 Oct 2011 13:19:04 -0600 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Sat, Oct 29, 2011 at 1:04 PM, Matthew Brett wrote: > Hi, > > On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers > wrote: > > > > > > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett > > wrote: > >> > >> Hi, > >> > >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers > >> wrote: > >> > > >> > > >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett > >> > > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > >> >> wrote: > >> >> >> > >> >> > >> >> No, that's not what Nathaniel and I are saying at all. Nathaniel was > >> >> pointing to links for projects that care that everyone agrees before > >> >> they go ahead. > >> > > >> > It looked to me like there was a serious intent to come to an > agreement, > >> > or > >> > at least closer together. The discussion in the summer was going > around > >> > in > >> > circles though, and was too abstract and complex to follow. Therefore > >> > Mark's > >> > choice of implementing something and then asking for feedback made > sense > >> > to > >> > me. > >> > >> I should point out that the implementation hasn't - as far as I can > >> see - changed the discussion. The discussion was about the API. > >> > >> Implementations are useful for agreed APIs because they can point out > >> where the API does not make sense or cannot be implemented. In this > >> case, the API Mark said he was going to implement - he did implement - > >> at least as far as I can see. Again, I'm happy to be corrected. > > > > Implementations can also help the discussion along, by allowing people to > > try out some of the proposed changes. It also allows to construct > examples > > that show weaknesses, possibly to be solved by an alternative API. Maybe > you > > can hold the complete history of this topic in your head and comprehend > it, > > but for me it would be very helpful if someone said: > > - here's my dataset > > - this is what I want to do with it > > - this is the best I can do with the current implementation > > - here's how API X would allow me to solve this better or simpler > > This can be done much better with actual data and an actual > implementation > > than with a design proposal. You seem to disagree with this statement. > > That's fine. I would hope though that you recognize that concrete > examples > > help people like me, and construct one or two to help us out. > That's what use-cases are for in designing APIs. There are examples > of use in the NEP: > > https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst > > the alterNEP: > > https://gist.github.com/1056379 > > and my longer email to Travis: > > > http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored > > Mark has done a nice job of documentation: > > http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html > > If you want to understand what the alterNEP case is, I'd suggest the > email, just because it's the most recent and I think the terminology > is slightly clearer. > > Doing the same examples on a larger array won't make the point easier > to understand. The discussion is about what the right concepts are, > and you can help by looking at the snippets of code in those > documents, and deciding for yourself whether you think the current > masking / NA implementation seems natural and easy to explain, or > rather forced and difficult to explain, and then email back trying to > explain your impression (which is not always easy). > > >> >> In saying that we are insisting on our way, you are saying, > implicitly, > >> >> 'I > >> >> am not going to negotiate'. > >> > > >> > That is only your interpretation. The observation that Mark > compromised > >> > quite a bit while you didn't seems largely correct to me. > >> > >> The problem here stems from our inability to work towards agreement, > >> rather than standing on set positions. I set out what changes I think > >> would make the current implementation OK. Can we please, please have > >> a discussion about those points instead of trying to argue about who > >> has given more ground. > >> > >> > That commitment would of course be good. However, even if that were > >> > possible > >> > before writing code and everyone agreed that the ideas of you and > >> > Nathaniel > >> > should be implemented in full, it's still not clear that either of you > >> > would > >> > be willing to write any code. Agreement without code still doesn't > help > >> > us > >> > very much. > >> > >> I'm going to return to Nathaniel's point - it is a highly valuable > >> thing to set ourselves the target of resolving substantial discussions > >> by consensus. The route you are endorsing here is 'implementor > >> wins'. > > > > I'm not. All I want to point out is is that design and implementation are > > not completely separated either. > > No, they often interact. I was trying to explain why, in this case, > the implementation hasn't changed the issues substantially, as far as > I can see. If you think otherwise, then that is helpful information, > because you can feed back about where the initial discussion has been > overtaken by the implementation, and so we can strip down the > discussion to its essential parts. > > >> We don't need to do it that way. We're a mature sensible > >> bunch of adults > > > > Agreed:) > > Ah - if only it was that easy :) > > >> who can talk out the issues until we agree they are > >> ready for implementation, and then implement. > > > > The history of this discussion doesn't suggest it straightforward to get > a > > design right first time. It's a complex subject. > > Right - and it's more complex when only some of the people involved > are interested in the discussion coming to a resolution. That's > Nathaniel's point - that although it seems inefficient, working > towards a good resolution of big issues like this is very valuable in > getting the ideas right. > > > The second part of your statement, "and then implement", sounds so > simple. > > The reality is that there are only a handful of developers who have done > a > > significant amount of work on the numpy core in the last two years. I > > haven't seen anyone saying they are planning to implement (part of) > whatever > > design the outcome of this discussion will be. I don't think it's strange > to > > keep this in mind to some extent. > > No, but consensus building is a little bit all or none. I guess we'd > all like consensus, but then sometimes, as Nathaniel points out, it is > inconvenient and annoying. If we have no stated commitment to > consensus, at some unpredictable point in the discussion, those who > are implementing will - obviously - just duck out and do the > implementation. I would do that, I guess. Maybe I have done in the > projects I'm involved in. The question Nathaniel is raising, and me > too, in a less coherent way, is - is that fine? Does it matter that > we are short-cutting through substantial discussions? Is that really > - in the long term - a more efficient way of building both the code > and the community? > > Who is counted in building a consensus? I tend to pay attention to those who have made consistent contributions over the years, reviewed code, fixed bugs, and have generally been active in numpy development. In any group participation is important, people who just walk in the door and demand things be done their way aren't going to get a lot of respect. I'll happily listen to politely expressed feedback, especially if the feedback comes from someone who shows up to work, but that hasn't been my impression of the disagreements in this case. Heck, Nathaniel wasn't even tracking the Numpy pull requests or Mark's repository. That doesn't spell "participant" in my dictionary. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Oct 29 15:26:48 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 29 Oct 2011 12:26:48 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Sat, Oct 29, 2011 at 12:19 PM, Charles R Harris wrote: > > > On Sat, Oct 29, 2011 at 1:04 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers >> wrote: >> > >> > >> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett >> > wrote: >> >> >> >> Hi, >> >> >> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >> >> wrote: >> >> > >> >> > >> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >> >> > >> >> > wrote: >> >> >> >> >> >> Hi, >> >> >> >> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> >> >> wrote: >> >> >> >> >> >> >> >> >> >> No, that's not what Nathaniel and I are saying at all. Nathaniel was >> >> >> pointing to links for projects that care that everyone agrees before >> >> >> they go ahead. >> >> > >> >> > It looked to me like there was a serious intent to come to an >> >> > agreement, >> >> > or >> >> > at least closer together. The discussion in the summer was going >> >> > around >> >> > in >> >> > circles though, and was too abstract and complex to follow. Therefore >> >> > Mark's >> >> > choice of implementing something and then asking for feedback made >> >> > sense >> >> > to >> >> > me. >> >> >> >> I should point out that the implementation hasn't - as far as I can >> >> see - changed the discussion. ?The discussion was about the API. >> >> >> >> Implementations are useful for agreed APIs because they can point out >> >> where the API does not make sense or cannot be implemented. ?In this >> >> case, the API Mark said he was going to implement - he did implement - >> >> at least as far as I can see. ?Again, I'm happy to be corrected. >> > >> > Implementations can also help the discussion along, by allowing people >> > to >> > try out some of the proposed changes. It also allows to construct >> > examples >> > that show weaknesses, possibly to be solved by an alternative API. Maybe >> > you >> > can hold the complete history of this topic in your head and comprehend >> > it, >> > but for me it would be very helpful if someone said: >> > - here's my dataset >> > - this is what I want to do with it >> > - this is the best I can do with the current implementation >> > - here's how API X would allow me to solve this better or simpler >> > This can be done much better with actual data and an actual >> > implementation >> > than with a design proposal. You seem to disagree with this statement. >> > That's fine. I would hope though that you recognize that concrete >> > examples >> > help people like me, and construct one or two to help us out. >> That's what use-cases are for in designing APIs. ?There are examples >> of use in the NEP: >> >> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst >> >> the alterNEP: >> >> https://gist.github.com/1056379 >> >> and my longer email to Travis: >> >> >> http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored >> >> Mark has done a nice job of documentation: >> >> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html >> >> If you want to understand what the alterNEP case is, I'd suggest the >> email, just because it's the most recent and I think the terminology >> is slightly clearer. >> >> Doing the same examples on a larger array won't make the point easier >> to understand. ?The discussion is about what the right concepts are, >> and you can help by looking at the snippets of code in those >> documents, and deciding for yourself whether you think the current >> masking / NA implementation seems natural and easy to explain, or >> rather forced and difficult to explain, and then email back trying to >> explain your impression (which is not always easy). >> >> >> >> In saying that we are insisting on our way, you are saying, >> >> >> implicitly, >> >> >> 'I >> >> >> am not going to negotiate'. >> >> > >> >> > That is only your interpretation. The observation that Mark >> >> > compromised >> >> > quite a bit while you didn't seems largely correct to me. >> >> >> >> The problem here stems from our inability to work towards agreement, >> >> rather than standing on set positions. ?I set out what changes I think >> >> would make the current implementation OK. ?Can we please, please have >> >> a discussion about those points instead of trying to argue about who >> >> has given more ground. >> >> >> >> > That commitment would of course be good. However, even if that were >> >> > possible >> >> > before writing code and everyone agreed that the ideas of you and >> >> > Nathaniel >> >> > should be implemented in full, it's still not clear that either of >> >> > you >> >> > would >> >> > be willing to write any code. Agreement without code still doesn't >> >> > help >> >> > us >> >> > very much. >> >> >> >> I'm going to return to Nathaniel's point - it is a highly valuable >> >> thing to set ourselves the target of resolving substantial discussions >> >> by consensus. ? The route you are endorsing here is 'implementor >> >> wins'. >> > >> > I'm not. All I want to point out is is that design and implementation >> > are >> > not completely separated either. >> >> No, they often interact. ?I was trying to explain why, in this case, >> the implementation hasn't changed the issues substantially, as far as >> I can see. ? If you think otherwise, then that is helpful information, >> because you can feed back about where the initial discussion has been >> overtaken by the implementation, and so we can strip down the >> discussion to its essential parts. >> >> >> We don't need to do it that way. ?We're a mature sensible >> >> bunch of adults >> > >> > Agreed:) >> >> Ah - if only it was that easy :) >> >> >> who can talk out the issues until we agree they are >> >> ready for implementation, and then implement. >> > >> > The history of this discussion doesn't suggest it straightforward to get >> > a >> > design right first time. It's a complex subject. >> >> Right - and it's more complex when only some of the people involved >> are interested in the discussion coming to a resolution. ? That's >> Nathaniel's point - that although it seems inefficient, working >> towards a good resolution of big issues like this is very valuable in >> getting the ideas right. >> >> > The second part of your statement, "and then implement", sounds so >> > simple. >> > The reality is that there are only a handful of developers who have done >> > a >> > significant amount of work on the numpy core in the last two years. I >> > haven't seen anyone saying they are planning to implement (part of) >> > whatever >> > design the outcome of this discussion will be. I don't think it's >> > strange to >> > keep this in mind to some extent. >> >> No, but consensus building is a little bit all or none. ? I guess we'd >> all like consensus, but then sometimes, as Nathaniel points out, it is >> inconvenient and annoying. ?If we have no stated commitment to >> consensus, at some unpredictable point in the discussion, those who >> are implementing will - obviously - just duck out and do the >> implementation. ?I would do that, I guess. ?Maybe I have done in the >> projects I'm involved in. ? The question Nathaniel is raising, and me >> too, in a less coherent way, is - is that fine? ? ?Does it matter that >> we are short-cutting through substantial discussions? ? Is that really >> - in the long term - a more efficient way of building both the code >> and the community? >> > > Who is counted in building a consensus? I tend to pay attention to those who > have made consistent contributions over the years, reviewed code, fixed > bugs, and have generally been active in numpy development. In any group > participation is important, people who just walk in the door and demand > things be done their way aren't going to get a lot of respect. I'll happily > listen to politely expressed feedback, especially if the feedback comes from > someone who shows up to work, but that hasn't been my impression of the > disagreements in this case. Heck, Nathaniel wasn't even tracking the Numpy > pull requests or Mark's repository. That doesn't spell "participant" in my > dictionary. I'm sorry, I am not obeying Ben's 10 minute rule. This is a very important point you are making, which is that those who write the code have the final say. Is it fair to say that your responses show that you don't think either Nathaniel or I have much of a say? It's fair to say I haven't contributed much code to numpy. I could imagine some sort of voting system for which the voting is weighted by lines of code contributed. I suspect you are thinking of an implicit version of such a system, continuously employed. But Nathaniel's point is that other projects have gone out of their way to avoid voting. To quote from: http://producingoss.com/en/consensus-democracy.html "In general, taking a vote should be very rare?a last resort for when all other options have failed. Don't think of voting as a great way to resolve debates. It isn't. It ends discussion, and thereby ends creative thinking about the problem. As long as discussion continues, there is the possibility that someone will come up with a new solution everyone likes. " Best, Matthew From ben.root at ou.edu Sat Oct 29 15:41:04 2011 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 29 Oct 2011 14:41:04 -0500 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Saturday, October 29, 2011, Charles R Harris wrote: > > Who is counted in building a consensus? I tend to pay attention to those who have made consistent contributions over the years, reviewed code, fixed bugs, and have generally been active in numpy development. In any group participation is important, people who just walk in the door and demand things be done their way aren't going to get a lot of respect. I'll happily listen to politely expressed feedback, especially if the feedback comes from someone who shows up to work, but that hasn't been my impression of the disagreements in this case. Heck, Nathaniel wasn't even tracking the Numpy pull requests or Mark's repository. That doesn't spell "participant" in my dictionary. > > Chuck > This is a very good point, but I would highly caution against alienating anybody here. Frankly, I am surprised how much my opinion has been taken here given the very little numpy code I have submitted (I think maybe two or three patches). The Numpy community is far more than just those who use the core library. There is pandas, bottleneck, mpl, the scikits, and much more. Numpy would be nearly useless without them, and certainly vice versa. We are all indebted to each other for our works. We must never lose that perspective. We all seem to have a different set of assumptions of how development should work. Each project follows its own workflow. Numpy should be free to adopt their own procedures, and we are free to discuss them. I do agree with chuck that he shouldn't have to make a written invitation to each and every person to review each pull. However, maybe some work can be done to bring the pull request and issues discussion down to the mailing list. I would like to do something similar with mpl. As for voting rights, let's make that a separate discussion. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Oct 29 15:41:48 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 29 Oct 2011 13:41:48 -0600 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Sat, Oct 29, 2011 at 1:26 PM, Matthew Brett wrote: > Hi, > > On Sat, Oct 29, 2011 at 12:19 PM, Charles R Harris > wrote: > > > > > > On Sat, Oct 29, 2011 at 1:04 PM, Matthew Brett > > wrote: > >> > >> Hi, > >> > >> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers > >> wrote: > >> > > >> > > >> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett < > matthew.brett at gmail.com> > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers > >> >> wrote: > >> >> > > >> >> > > >> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett > >> >> > > >> >> > wrote: > >> >> >> > >> >> >> Hi, > >> >> >> > >> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > >> >> >> wrote: > >> >> >> >> > >> >> >> > >> >> >> No, that's not what Nathaniel and I are saying at all. Nathaniel > was > >> >> >> pointing to links for projects that care that everyone agrees > before > >> >> >> they go ahead. > >> >> > > >> >> > It looked to me like there was a serious intent to come to an > >> >> > agreement, > >> >> > or > >> >> > at least closer together. The discussion in the summer was going > >> >> > around > >> >> > in > >> >> > circles though, and was too abstract and complex to follow. > Therefore > >> >> > Mark's > >> >> > choice of implementing something and then asking for feedback made > >> >> > sense > >> >> > to > >> >> > me. > >> >> > >> >> I should point out that the implementation hasn't - as far as I can > >> >> see - changed the discussion. The discussion was about the API. > >> >> > >> >> Implementations are useful for agreed APIs because they can point out > >> >> where the API does not make sense or cannot be implemented. In this > >> >> case, the API Mark said he was going to implement - he did implement > - > >> >> at least as far as I can see. Again, I'm happy to be corrected. > >> > > >> > Implementations can also help the discussion along, by allowing people > >> > to > >> > try out some of the proposed changes. It also allows to construct > >> > examples > >> > that show weaknesses, possibly to be solved by an alternative API. > Maybe > >> > you > >> > can hold the complete history of this topic in your head and > comprehend > >> > it, > >> > but for me it would be very helpful if someone said: > >> > - here's my dataset > >> > - this is what I want to do with it > >> > - this is the best I can do with the current implementation > >> > - here's how API X would allow me to solve this better or simpler > >> > This can be done much better with actual data and an actual > >> > implementation > >> > than with a design proposal. You seem to disagree with this statement. > >> > That's fine. I would hope though that you recognize that concrete > >> > examples > >> > help people like me, and construct one or two to help us out. > >> That's what use-cases are for in designing APIs. There are examples > >> of use in the NEP: > >> > >> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst > >> > >> the alterNEP: > >> > >> https://gist.github.com/1056379 > >> > >> and my longer email to Travis: > >> > >> > >> > http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored > >> > >> Mark has done a nice job of documentation: > >> > >> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html > >> > >> If you want to understand what the alterNEP case is, I'd suggest the > >> email, just because it's the most recent and I think the terminology > >> is slightly clearer. > >> > >> Doing the same examples on a larger array won't make the point easier > >> to understand. The discussion is about what the right concepts are, > >> and you can help by looking at the snippets of code in those > >> documents, and deciding for yourself whether you think the current > >> masking / NA implementation seems natural and easy to explain, or > >> rather forced and difficult to explain, and then email back trying to > >> explain your impression (which is not always easy). > >> > >> >> >> In saying that we are insisting on our way, you are saying, > >> >> >> implicitly, > >> >> >> 'I > >> >> >> am not going to negotiate'. > >> >> > > >> >> > That is only your interpretation. The observation that Mark > >> >> > compromised > >> >> > quite a bit while you didn't seems largely correct to me. > >> >> > >> >> The problem here stems from our inability to work towards agreement, > >> >> rather than standing on set positions. I set out what changes I > think > >> >> would make the current implementation OK. Can we please, please have > >> >> a discussion about those points instead of trying to argue about who > >> >> has given more ground. > >> >> > >> >> > That commitment would of course be good. However, even if that were > >> >> > possible > >> >> > before writing code and everyone agreed that the ideas of you and > >> >> > Nathaniel > >> >> > should be implemented in full, it's still not clear that either of > >> >> > you > >> >> > would > >> >> > be willing to write any code. Agreement without code still doesn't > >> >> > help > >> >> > us > >> >> > very much. > >> >> > >> >> I'm going to return to Nathaniel's point - it is a highly valuable > >> >> thing to set ourselves the target of resolving substantial > discussions > >> >> by consensus. The route you are endorsing here is 'implementor > >> >> wins'. > >> > > >> > I'm not. All I want to point out is is that design and implementation > >> > are > >> > not completely separated either. > >> > >> No, they often interact. I was trying to explain why, in this case, > >> the implementation hasn't changed the issues substantially, as far as > >> I can see. If you think otherwise, then that is helpful information, > >> because you can feed back about where the initial discussion has been > >> overtaken by the implementation, and so we can strip down the > >> discussion to its essential parts. > >> > >> >> We don't need to do it that way. We're a mature sensible > >> >> bunch of adults > >> > > >> > Agreed:) > >> > >> Ah - if only it was that easy :) > >> > >> >> who can talk out the issues until we agree they are > >> >> ready for implementation, and then implement. > >> > > >> > The history of this discussion doesn't suggest it straightforward to > get > >> > a > >> > design right first time. It's a complex subject. > >> > >> Right - and it's more complex when only some of the people involved > >> are interested in the discussion coming to a resolution. That's > >> Nathaniel's point - that although it seems inefficient, working > >> towards a good resolution of big issues like this is very valuable in > >> getting the ideas right. > >> > >> > The second part of your statement, "and then implement", sounds so > >> > simple. > >> > The reality is that there are only a handful of developers who have > done > >> > a > >> > significant amount of work on the numpy core in the last two years. I > >> > haven't seen anyone saying they are planning to implement (part of) > >> > whatever > >> > design the outcome of this discussion will be. I don't think it's > >> > strange to > >> > keep this in mind to some extent. > >> > >> No, but consensus building is a little bit all or none. I guess we'd > >> all like consensus, but then sometimes, as Nathaniel points out, it is > >> inconvenient and annoying. If we have no stated commitment to > >> consensus, at some unpredictable point in the discussion, those who > >> are implementing will - obviously - just duck out and do the > >> implementation. I would do that, I guess. Maybe I have done in the > >> projects I'm involved in. The question Nathaniel is raising, and me > >> too, in a less coherent way, is - is that fine? Does it matter that > >> we are short-cutting through substantial discussions? Is that really > >> - in the long term - a more efficient way of building both the code > >> and the community? > >> > > > > Who is counted in building a consensus? I tend to pay attention to those > who > > have made consistent contributions over the years, reviewed code, fixed > > bugs, and have generally been active in numpy development. In any group > > participation is important, people who just walk in the door and demand > > things be done their way aren't going to get a lot of respect. I'll > happily > > listen to politely expressed feedback, especially if the feedback comes > from > > someone who shows up to work, but that hasn't been my impression of the > > disagreements in this case. Heck, Nathaniel wasn't even tracking the > Numpy > > pull requests or Mark's repository. That doesn't spell "participant" in > my > > dictionary. > > I'm sorry, I am not obeying Ben's 10 minute rule. > > This is a very important point you are making, which is that those who > write the code have the final say. > > Is it fair to say that your responses show that you don't think either > Nathaniel or I have much of a say? > > It's fair to say I haven't contributed much code to numpy. > > But you have contributed some, which immediately gives you more credibility. > I could imagine some sort of voting system for which the voting is > weighted by lines of code contributed. > Mark has been the man over the last year. By comparison, the rest of us have just been diddling around. > > I suspect you are thinking of an implicit version of such a system, > continuously employed. > > But Nathaniel's point is that other projects have gone out of their > way to avoid voting. To quote from: > > http://producingoss.com/en/consensus-democracy.html > > "In general, taking a vote should be very rare?a last resort for when > all other options have failed. Don't think of voting as a great way to > resolve debates. It isn't. It ends discussion, and thereby ends > creative thinking about the problem. As long as discussion continues, > there is the possibility that someone will come up with a new solution > everyone likes. " > > As Ralf pointed out, the core developers are a small handful at the moment. Now in one sense that presents an opportunity: anyone who has the time and inclination to contribute code and review pull requests is going to make an impact and rapidly gain influence. In a sense, leadership in the numpy community is up for grabs. But before you can claim the kingdom, there is the small matter of completing a quest or two. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Oct 29 16:05:06 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 29 Oct 2011 14:05:06 -0600 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Sat, Oct 29, 2011 at 1:41 PM, Benjamin Root wrote: > > > On Saturday, October 29, 2011, Charles R Harris > wrote: > > > > Who is counted in building a consensus? I tend to pay attention to those > who have made consistent contributions over the years, reviewed code, fixed > bugs, and have generally been active in numpy development. In any group > participation is important, people who just walk in the door and demand > things be done their way aren't going to get a lot of respect. I'll happily > listen to politely expressed feedback, especially if the feedback comes from > someone who shows up to work, but that hasn't been my impression of the > disagreements in this case. Heck, Nathaniel wasn't even tracking the Numpy > pull requests or Mark's repository. That doesn't spell "participant" in my > dictionary. > > > > Chuck > > > > This is a very good point, but I would highly caution against alienating > anybody here. Frankly, I am surprised how much my opinion has been taken > here given the very little numpy code I have submitted (I think maybe two or > three patches). The Numpy community is far more than just those who use the > core library. There is pandas, bottleneck, mpl, the scikits, and much more. > Numpy would be nearly useless without them, and certainly vice versa. > > I was quite impressed by your comments on Mark's work, I thought they were excellent. It doesn't really take much to make an impact in a small community overburdened by work. > We are all indebted to each other for our works. We must never lose that > perspective. > > We all seem to have a different set of assumptions of how development > should work. Each project follows its own workflow. Numpy should be free > to adopt their own procedures, and we are free to discuss them. > > I do agree with chuck that he shouldn't have to make a written invitation > to each and every person to review each pull. However, maybe some work can > be done to bring the pull request and issues discussion down to the mailing > list. I would like to do something similar with mpl. > > As for voting rights, let's make that a separate discussion. > > With such a small community, I'd rather avoid the whole voting thing if possible. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Oct 29 16:28:22 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 29 Oct 2011 13:28:22 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Sat, Oct 29, 2011 at 12:41 PM, Charles R Harris wrote: > > > On Sat, Oct 29, 2011 at 1:26 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Sat, Oct 29, 2011 at 12:19 PM, Charles R Harris >> wrote: >> > >> > >> > On Sat, Oct 29, 2011 at 1:04 PM, Matthew Brett >> > wrote: >> >> >> >> Hi, >> >> >> >> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers >> >> wrote: >> >> > >> >> > >> >> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett >> >> > >> >> > wrote: >> >> >> >> >> >> Hi, >> >> >> >> >> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >> >> >> wrote: >> >> >> > >> >> >> > >> >> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >> >> >> > >> >> >> > wrote: >> >> >> >> >> >> >> >> Hi, >> >> >> >> >> >> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> >> >> >> wrote: >> >> >> >> >> >> >> >> >> >> >> >> >> No, that's not what Nathaniel and I are saying at all. Nathaniel >> >> >> >> was >> >> >> >> pointing to links for projects that care that everyone agrees >> >> >> >> before >> >> >> >> they go ahead. >> >> >> > >> >> >> > It looked to me like there was a serious intent to come to an >> >> >> > agreement, >> >> >> > or >> >> >> > at least closer together. The discussion in the summer was going >> >> >> > around >> >> >> > in >> >> >> > circles though, and was too abstract and complex to follow. >> >> >> > Therefore >> >> >> > Mark's >> >> >> > choice of implementing something and then asking for feedback made >> >> >> > sense >> >> >> > to >> >> >> > me. >> >> >> >> >> >> I should point out that the implementation hasn't - as far as I can >> >> >> see - changed the discussion. ?The discussion was about the API. >> >> >> >> >> >> Implementations are useful for agreed APIs because they can point >> >> >> out >> >> >> where the API does not make sense or cannot be implemented. ?In this >> >> >> case, the API Mark said he was going to implement - he did implement >> >> >> - >> >> >> at least as far as I can see. ?Again, I'm happy to be corrected. >> >> > >> >> > Implementations can also help the discussion along, by allowing >> >> > people >> >> > to >> >> > try out some of the proposed changes. It also allows to construct >> >> > examples >> >> > that show weaknesses, possibly to be solved by an alternative API. >> >> > Maybe >> >> > you >> >> > can hold the complete history of this topic in your head and >> >> > comprehend >> >> > it, >> >> > but for me it would be very helpful if someone said: >> >> > - here's my dataset >> >> > - this is what I want to do with it >> >> > - this is the best I can do with the current implementation >> >> > - here's how API X would allow me to solve this better or simpler >> >> > This can be done much better with actual data and an actual >> >> > implementation >> >> > than with a design proposal. You seem to disagree with this >> >> > statement. >> >> > That's fine. I would hope though that you recognize that concrete >> >> > examples >> >> > help people like me, and construct one or two to help us out. >> >> That's what use-cases are for in designing APIs. ?There are examples >> >> of use in the NEP: >> >> >> >> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst >> >> >> >> the alterNEP: >> >> >> >> https://gist.github.com/1056379 >> >> >> >> and my longer email to Travis: >> >> >> >> >> >> >> >> http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored >> >> >> >> Mark has done a nice job of documentation: >> >> >> >> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html >> >> >> >> If you want to understand what the alterNEP case is, I'd suggest the >> >> email, just because it's the most recent and I think the terminology >> >> is slightly clearer. >> >> >> >> Doing the same examples on a larger array won't make the point easier >> >> to understand. ?The discussion is about what the right concepts are, >> >> and you can help by looking at the snippets of code in those >> >> documents, and deciding for yourself whether you think the current >> >> masking / NA implementation seems natural and easy to explain, or >> >> rather forced and difficult to explain, and then email back trying to >> >> explain your impression (which is not always easy). >> >> >> >> >> >> In saying that we are insisting on our way, you are saying, >> >> >> >> implicitly, >> >> >> >> 'I >> >> >> >> am not going to negotiate'. >> >> >> > >> >> >> > That is only your interpretation. The observation that Mark >> >> >> > compromised >> >> >> > quite a bit while you didn't seems largely correct to me. >> >> >> >> >> >> The problem here stems from our inability to work towards agreement, >> >> >> rather than standing on set positions. ?I set out what changes I >> >> >> think >> >> >> would make the current implementation OK. ?Can we please, please >> >> >> have >> >> >> a discussion about those points instead of trying to argue about who >> >> >> has given more ground. >> >> >> >> >> >> > That commitment would of course be good. However, even if that >> >> >> > were >> >> >> > possible >> >> >> > before writing code and everyone agreed that the ideas of you and >> >> >> > Nathaniel >> >> >> > should be implemented in full, it's still not clear that either of >> >> >> > you >> >> >> > would >> >> >> > be willing to write any code. Agreement without code still doesn't >> >> >> > help >> >> >> > us >> >> >> > very much. >> >> >> >> >> >> I'm going to return to Nathaniel's point - it is a highly valuable >> >> >> thing to set ourselves the target of resolving substantial >> >> >> discussions >> >> >> by consensus. ? The route you are endorsing here is 'implementor >> >> >> wins'. >> >> > >> >> > I'm not. All I want to point out is is that design and implementation >> >> > are >> >> > not completely separated either. >> >> >> >> No, they often interact. ?I was trying to explain why, in this case, >> >> the implementation hasn't changed the issues substantially, as far as >> >> I can see. ? If you think otherwise, then that is helpful information, >> >> because you can feed back about where the initial discussion has been >> >> overtaken by the implementation, and so we can strip down the >> >> discussion to its essential parts. >> >> >> >> >> We don't need to do it that way. ?We're a mature sensible >> >> >> bunch of adults >> >> > >> >> > Agreed:) >> >> >> >> Ah - if only it was that easy :) >> >> >> >> >> who can talk out the issues until we agree they are >> >> >> ready for implementation, and then implement. >> >> > >> >> > The history of this discussion doesn't suggest it straightforward to >> >> > get >> >> > a >> >> > design right first time. It's a complex subject. >> >> >> >> Right - and it's more complex when only some of the people involved >> >> are interested in the discussion coming to a resolution. ? That's >> >> Nathaniel's point - that although it seems inefficient, working >> >> towards a good resolution of big issues like this is very valuable in >> >> getting the ideas right. >> >> >> >> > The second part of your statement, "and then implement", sounds so >> >> > simple. >> >> > The reality is that there are only a handful of developers who have >> >> > done >> >> > a >> >> > significant amount of work on the numpy core in the last two years. I >> >> > haven't seen anyone saying they are planning to implement (part of) >> >> > whatever >> >> > design the outcome of this discussion will be. I don't think it's >> >> > strange to >> >> > keep this in mind to some extent. >> >> >> >> No, but consensus building is a little bit all or none. ? I guess we'd >> >> all like consensus, but then sometimes, as Nathaniel points out, it is >> >> inconvenient and annoying. ?If we have no stated commitment to >> >> consensus, at some unpredictable point in the discussion, those who >> >> are implementing will - obviously - just duck out and do the >> >> implementation. ?I would do that, I guess. ?Maybe I have done in the >> >> projects I'm involved in. ? The question Nathaniel is raising, and me >> >> too, in a less coherent way, is - is that fine? ? ?Does it matter that >> >> we are short-cutting through substantial discussions? ? Is that really >> >> - in the long term - a more efficient way of building both the code >> >> and the community? >> >> >> > >> > Who is counted in building a consensus? I tend to pay attention to those >> > who >> > have made consistent contributions over the years, reviewed code, fixed >> > bugs, and have generally been active in numpy development. In any group >> > participation is important, people who just walk in the door and demand >> > things be done their way aren't going to get a lot of respect. I'll >> > happily >> > listen to politely expressed feedback, especially if the feedback comes >> > from >> > someone who shows up to work, but that hasn't been my impression of the >> > disagreements in this case. Heck, Nathaniel wasn't even tracking the >> > Numpy >> > pull requests or Mark's repository. That doesn't spell "participant" in >> > my >> > dictionary. >> >> I'm sorry, I am not obeying Ben's 10 minute rule. >> >> This is a very important point you are making, which is that those who >> write the code have the final say. >> >> Is it fair to say that your responses show that you don't think either >> Nathaniel or I have much of a say? >> >> It's fair to say I haven't contributed much code to numpy. >> > > But you have contributed some, which immediately gives you more credibility. > >> >> I could imagine some sort of voting system for which the voting is >> weighted by lines of code contributed. > > Mark has been the man over the last year. By comparison, the rest of us have > just been diddling around. > >> >> I suspect you are thinking of an implicit version of such a system, >> continuously employed. >> >> But Nathaniel's point is that other projects have gone out of their >> way to avoid voting. ?To quote from: >> >> http://producingoss.com/en/consensus-democracy.html >> >> "In general, taking a vote should be very rare?a last resort for when >> all other options have failed. Don't think of voting as a great way to >> resolve debates. It isn't. It ends discussion, and thereby ends >> creative thinking about the problem. As long as discussion continues, >> there is the possibility that someone will come up with a new solution >> everyone likes. " >> > > As Ralf pointed out, the core developers are a small handful at the moment. > Now in one sense that presents an opportunity: anyone who has the time and > inclination to contribute code and review pull requests is going to make an > impact and rapidly gain influence. In a sense, leadership in the numpy > community is up for grabs. But before you can claim the kingdom, there is > the small matter of completing a quest or two. Yes, this is well-put - but I think I am asking for a less feudal model of decision making. The model you are offering is one of power - where power is acquired by code contributions. I suppose this model is attractive if you don't believe that it is generally possible to achieve an agreed solution through general and open discussion. The more effective model is democratic, that is, we have faith in each other to be reasonable and to negotiate in the best interests of the project, and we use measures of influence as an absolute last resort, and even then, this influence should be determined on explicit grounds (such as agreement across the group, number of lines committed or some other thing). Best, Matthew From matthew.brett at gmail.com Sat Oct 29 16:32:04 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 29 Oct 2011 13:32:04 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Sat, Oct 29, 2011 at 1:05 PM, Charles R Harris wrote: > > > On Sat, Oct 29, 2011 at 1:41 PM, Benjamin Root wrote: >> >> >> On Saturday, October 29, 2011, Charles R Harris >> wrote: >> > >> > Who is counted in building a consensus? I tend to pay attention to those >> > who have made consistent contributions over the years, reviewed code, fixed >> > bugs, and have generally been active in numpy development. In any group >> > participation is important, people who just walk in the door and demand >> > things be done their way aren't going to get a lot of respect. I'll happily >> > listen to politely expressed feedback, especially if the feedback comes from >> > someone who shows up to work, but that hasn't been my impression of the >> > disagreements in this case. Heck, Nathaniel wasn't even tracking the Numpy >> > pull requests or Mark's repository. That doesn't spell "participant" in my >> > dictionary. >> > >> > Chuck >> > >> >> This is a very good point, but I would highly caution against alienating >> anybody here. ?Frankly, I am surprised how much my opinion has been taken >> here given the very little numpy code I have submitted (I think maybe two or >> three patches). ?The Numpy community is far more than just those who use the >> core library. There is pandas, bottleneck, mpl, the scikits, and much more. >> ?Numpy would be nearly useless without them, and certainly vice versa. >> > > I was quite impressed by your comments on Mark's work, I thought they were > excellent. It doesn't really take much to make an impact in a small > community overburdened by work. > >> >> We are all indebted to each other for our works. We must never lose that >> perspective. >> >> We all seem to have a different set of assumptions of how development >> should work. ?Each project follows its own workflow. ?Numpy should be free >> to adopt their own procedures, and we are free to discuss them. >> >> I do agree with chuck that he shouldn't have to make a written invitation >> to each and every person to review each pull. ?However, maybe some work can >> be done to bring the pull request and issues discussion down to the mailing >> list. I would like to do something similar with mpl. >> >> As for voting rights, let's make that a separate discussion. >> > > With such a small community, I'd rather avoid the whole voting thing if > possible. But, if there is one thing worse than voting, it is implicit voting. Implicit voting is where you ignore people who you don't think should have a voice. Unless I'm mistaken, that's what you are suggesting should be the norm. Best, Matthew From ralf.gommers at googlemail.com Sat Oct 29 16:44:36 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 29 Oct 2011 22:44:36 +0200 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett wrote: > Hi, > > On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers > wrote: > > > > > > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett > > wrote: > >> > >> Hi, > >> > >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers > >> wrote: > >> > > >> > > >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett > >> > > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > >> >> wrote: > >> >> >> > >> >> > >> >> No, that's not what Nathaniel and I are saying at all. Nathaniel was > >> >> pointing to links for projects that care that everyone agrees before > >> >> they go ahead. > >> > > >> > It looked to me like there was a serious intent to come to an > agreement, > >> > or > >> > at least closer together. The discussion in the summer was going > around > >> > in > >> > circles though, and was too abstract and complex to follow. Therefore > >> > Mark's > >> > choice of implementing something and then asking for feedback made > sense > >> > to > >> > me. > >> > >> I should point out that the implementation hasn't - as far as I can > >> see - changed the discussion. The discussion was about the API. > >> > >> Implementations are useful for agreed APIs because they can point out > >> where the API does not make sense or cannot be implemented. In this > >> case, the API Mark said he was going to implement - he did implement - > >> at least as far as I can see. Again, I'm happy to be corrected. > > > > Implementations can also help the discussion along, by allowing people to > > try out some of the proposed changes. It also allows to construct > examples > > that show weaknesses, possibly to be solved by an alternative API. Maybe > you > > can hold the complete history of this topic in your head and comprehend > it, > > but for me it would be very helpful if someone said: > > - here's my dataset > > - this is what I want to do with it > > - this is the best I can do with the current implementation > > - here's how API X would allow me to solve this better or simpler > > This can be done much better with actual data and an actual > implementation > > than with a design proposal. You seem to disagree with this statement. > > That's fine. I would hope though that you recognize that concrete > examples > > help people like me, and construct one or two to help us out. > That's what use-cases are for in designing APIs. There are examples > of use in the NEP: > > https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst > > the alterNEP: > > https://gist.github.com/1056379 > > and my longer email to Travis: > > > http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored > > Mark has done a nice job of documentation: > > http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html > > If you want to understand what the alterNEP case is, I'd suggest the > email, just because it's the most recent and I think the terminology > is slightly clearer. > > Doing the same examples on a larger array won't make the point easier > to understand. The discussion is about what the right concepts are, > and you can help by looking at the snippets of code in those > documents, and deciding for yourself whether you think the current > masking / NA implementation seems natural and easy to explain, or > rather forced and difficult to explain, and then email back trying to > explain your impression (which is not always easy). > If you seriously believe that looking at a few snippets is as helpful and instructive as being able to play around with them in IPython and modify them, then I guess we won't make progress in this part of the discussion. You're just telling me to go back and re-read things I'd already read. OK, update: I took Ben's 10 minutes to go back and read the reference doc and your email again, just in case. The current implementation still seems natural to me to explain. It fits my use-cases. Perhaps that's different for you because you and I deal with different kinds of data. I don't have to explicitly treat absent and ignored data differently; those two are actually mixed and indistinguishable already in much of my data. Therefore the current implementation works well for me, having to make a distinction would be a needless complication. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Oct 29 16:48:47 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 29 Oct 2011 13:48:47 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers wrote: > > > On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers >> wrote: >> > >> > >> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett >> > wrote: >> >> >> >> Hi, >> >> >> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >> >> wrote: >> >> > >> >> > >> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >> >> > >> >> > wrote: >> >> >> >> >> >> Hi, >> >> >> >> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> >> >> wrote: >> >> >> >> >> >> >> >> >> >> No, that's not what Nathaniel and I are saying at all. Nathaniel was >> >> >> pointing to links for projects that care that everyone agrees before >> >> >> they go ahead. >> >> > >> >> > It looked to me like there was a serious intent to come to an >> >> > agreement, >> >> > or >> >> > at least closer together. The discussion in the summer was going >> >> > around >> >> > in >> >> > circles though, and was too abstract and complex to follow. Therefore >> >> > Mark's >> >> > choice of implementing something and then asking for feedback made >> >> > sense >> >> > to >> >> > me. >> >> >> >> I should point out that the implementation hasn't - as far as I can >> >> see - changed the discussion. ?The discussion was about the API. >> >> >> >> Implementations are useful for agreed APIs because they can point out >> >> where the API does not make sense or cannot be implemented. ?In this >> >> case, the API Mark said he was going to implement - he did implement - >> >> at least as far as I can see. ?Again, I'm happy to be corrected. >> > >> > Implementations can also help the discussion along, by allowing people >> > to >> > try out some of the proposed changes. It also allows to construct >> > examples >> > that show weaknesses, possibly to be solved by an alternative API. Maybe >> > you >> > can hold the complete history of this topic in your head and comprehend >> > it, >> > but for me it would be very helpful if someone said: >> > - here's my dataset >> > - this is what I want to do with it >> > - this is the best I can do with the current implementation >> > - here's how API X would allow me to solve this better or simpler >> > This can be done much better with actual data and an actual >> > implementation >> > than with a design proposal. You seem to disagree with this statement. >> > That's fine. I would hope though that you recognize that concrete >> > examples >> > help people like me, and construct one or two to help us out. >> That's what use-cases are for in designing APIs. ?There are examples >> of use in the NEP: >> >> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst >> >> the alterNEP: >> >> https://gist.github.com/1056379 >> >> and my longer email to Travis: >> >> >> http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored >> >> Mark has done a nice job of documentation: >> >> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html >> >> If you want to understand what the alterNEP case is, I'd suggest the >> email, just because it's the most recent and I think the terminology >> is slightly clearer. >> >> Doing the same examples on a larger array won't make the point easier >> to understand. ?The discussion is about what the right concepts are, >> and you can help by looking at the snippets of code in those >> documents, and deciding for yourself whether you think the current >> masking / NA implementation seems natural and easy to explain, or >> rather forced and difficult to explain, and then email back trying to >> explain your impression (which is not always easy). > > If you seriously believe that looking at a few snippets is as helpful and > instructive as being able to play around with them in IPython and modify > them, then I guess we won't make progress in this part of the discussion. > You're just telling me to go back and re-read things I'd already read. The snippets are in ipython or doctest format - aren't they? > OK, update: I took Ben's 10 minutes to go back and read the reference doc > and your email again, just in case. The current implementation still seems > natural to me to explain. It fits my use-cases. Perhaps that's different for > you because you and I deal with different kinds of data. I don't have to > explicitly treat absent and ignored data differently; those two are actually > mixed and indistinguishable already in much of my data. Therefore the > current implementation works well for me, having to make a distinction would > be a needless complication. OK - I'm not sure that contributes much to the discussion, because the problem is being able to explain to each other in details why one solution is preferable to another. To follow your own advice, you'd post some code snippets showing how you'd see the two ideas playing out and why one is clearer than the other. Best, Matthew From jkramar at gmail.com Sat Oct 29 17:11:40 2011 From: jkramar at gmail.com (Janos) Date: Sat, 29 Oct 2011 21:11:40 +0000 (UTC) Subject: [Numpy-discussion] Conditional random variables References: <4E127333.4010903@theo.to> <4E1320B9.3020007@theo.to> <4E133B29.1020103@theo.to> Message-ID: Ted To theo.to> writes: > > On 07/05/2011 11:07 AM, josef.pktd gmail.com wrote: > > For example sample x>=U and then sample y>=u-x. That's two univariate > > normal samples. > > Ah, that's what I was looking for! Many thanks! > You need to be careful, though - if you just sample x|x>=u and then sample y|y>=u-x then you'll get the wrong distribution unless x|x>=u has the same distribution as x|x+y>=u, which is false. What you should actually do if you want draws from (x,y)|x+y>=u is first sample (x+y)|(x+y)>=u, and then x|x+y, and then compute y=(x+y)-x. If x~N(mu_x, sigma_x^2) and y~N(mu_y, sigma_y^2) with correlation rho, then x+y~N(mu_x+mu_y, sigma_x^2+sigma_y^2+2*rho*sigma_x*sigma_y), and x|x+y~N(mu_x+r*(x+y-mu_x-mu_y), sigma_x^2*(1-r^2)), where r=cor(x,x+y)=(1+(1-rho^2)(rho+sigma_x/sigma_y)^-2)^(-1/2). From matthew.brett at gmail.com Sat Oct 29 17:36:08 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 29 Oct 2011 14:36:08 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett wrote: > Hi, > > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers > wrote: >> >> >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett >> wrote: >>> >>> Hi, >>> >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers >>> wrote: >>> > >>> > >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett >>> > wrote: >>> >> >>> >> Hi, >>> >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >>> >> wrote: >>> >> > >>> >> > >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >>> >> > >>> >> > wrote: >>> >> >> >>> >> >> Hi, >>> >> >> >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >>> >> >> wrote: >>> >> >> >> >>> >> >> >>> >> >> No, that's not what Nathaniel and I are saying at all. Nathaniel was >>> >> >> pointing to links for projects that care that everyone agrees before >>> >> >> they go ahead. >>> >> > >>> >> > It looked to me like there was a serious intent to come to an >>> >> > agreement, >>> >> > or >>> >> > at least closer together. The discussion in the summer was going >>> >> > around >>> >> > in >>> >> > circles though, and was too abstract and complex to follow. Therefore >>> >> > Mark's >>> >> > choice of implementing something and then asking for feedback made >>> >> > sense >>> >> > to >>> >> > me. >>> >> >>> >> I should point out that the implementation hasn't - as far as I can >>> >> see - changed the discussion. ?The discussion was about the API. >>> >> >>> >> Implementations are useful for agreed APIs because they can point out >>> >> where the API does not make sense or cannot be implemented. ?In this >>> >> case, the API Mark said he was going to implement - he did implement - >>> >> at least as far as I can see. ?Again, I'm happy to be corrected. >>> > >>> > Implementations can also help the discussion along, by allowing people >>> > to >>> > try out some of the proposed changes. It also allows to construct >>> > examples >>> > that show weaknesses, possibly to be solved by an alternative API. Maybe >>> > you >>> > can hold the complete history of this topic in your head and comprehend >>> > it, >>> > but for me it would be very helpful if someone said: >>> > - here's my dataset >>> > - this is what I want to do with it >>> > - this is the best I can do with the current implementation >>> > - here's how API X would allow me to solve this better or simpler >>> > This can be done much better with actual data and an actual >>> > implementation >>> > than with a design proposal. You seem to disagree with this statement. >>> > That's fine. I would hope though that you recognize that concrete >>> > examples >>> > help people like me, and construct one or two to help us out. >>> That's what use-cases are for in designing APIs. ?There are examples >>> of use in the NEP: >>> >>> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst >>> >>> the alterNEP: >>> >>> https://gist.github.com/1056379 >>> >>> and my longer email to Travis: >>> >>> >>> http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored >>> >>> Mark has done a nice job of documentation: >>> >>> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html >>> >>> If you want to understand what the alterNEP case is, I'd suggest the >>> email, just because it's the most recent and I think the terminology >>> is slightly clearer. >>> >>> Doing the same examples on a larger array won't make the point easier >>> to understand. ?The discussion is about what the right concepts are, >>> and you can help by looking at the snippets of code in those >>> documents, and deciding for yourself whether you think the current >>> masking / NA implementation seems natural and easy to explain, or >>> rather forced and difficult to explain, and then email back trying to >>> explain your impression (which is not always easy). >> >> If you seriously believe that looking at a few snippets is as helpful and >> instructive as being able to play around with them in IPython and modify >> them, then I guess we won't make progress in this part of the discussion. >> You're just telling me to go back and re-read things I'd already read. > > The snippets are in ipython or doctest format - aren't they? Oops - 10 minute rule. Now I see that you mean that you can't experiment with the alternative implementation without working code. That's true, but I am hoping that the difference between - say: a[0:2] = np.NA and a.mask[0:2] = False would be easy enough to imagine. If it isn't then, let me know, preferably with something like "I can't see exactly how the following [code snippet] would work in your conception of the problem" - and then I can either try and give fake examples, or write a mock up. Best, Matthew From ralf.gommers at googlemail.com Sat Oct 29 17:48:03 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 29 Oct 2011 23:48:03 +0200 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett wrote: > Hi, > > On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett > wrote: > > Hi, > > > > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers > > wrote: > >> > >> > >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett > > >> wrote: > >>> > >>> Hi, > >>> > >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers > >>> wrote: > >>> > > >>> > > >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett < > matthew.brett at gmail.com> > >>> > wrote: > >>> >> > >>> >> Hi, > >>> >> > >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers > >>> >> wrote: > >>> >> > > >>> >> > > >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett > >>> >> > > >>> >> > wrote: > >>> >> >> > >>> >> >> Hi, > >>> >> >> > >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > >>> >> >> wrote: > >>> >> >> >> > >>> >> >> > >>> >> >> No, that's not what Nathaniel and I are saying at all. Nathaniel > was > >>> >> >> pointing to links for projects that care that everyone agrees > before > >>> >> >> they go ahead. > >>> >> > > >>> >> > It looked to me like there was a serious intent to come to an > >>> >> > agreement, > >>> >> > or > >>> >> > at least closer together. The discussion in the summer was going > >>> >> > around > >>> >> > in > >>> >> > circles though, and was too abstract and complex to follow. > Therefore > >>> >> > Mark's > >>> >> > choice of implementing something and then asking for feedback made > >>> >> > sense > >>> >> > to > >>> >> > me. > >>> >> > >>> >> I should point out that the implementation hasn't - as far as I can > >>> >> see - changed the discussion. The discussion was about the API. > >>> >> > >>> >> Implementations are useful for agreed APIs because they can point > out > >>> >> where the API does not make sense or cannot be implemented. In this > >>> >> case, the API Mark said he was going to implement - he did implement > - > >>> >> at least as far as I can see. Again, I'm happy to be corrected. > >>> > > >>> > Implementations can also help the discussion along, by allowing > people > >>> > to > >>> > try out some of the proposed changes. It also allows to construct > >>> > examples > >>> > that show weaknesses, possibly to be solved by an alternative API. > Maybe > >>> > you > >>> > can hold the complete history of this topic in your head and > comprehend > >>> > it, > >>> > but for me it would be very helpful if someone said: > >>> > - here's my dataset > >>> > - this is what I want to do with it > >>> > - this is the best I can do with the current implementation > >>> > - here's how API X would allow me to solve this better or simpler > >>> > This can be done much better with actual data and an actual > >>> > implementation > >>> > than with a design proposal. You seem to disagree with this > statement. > >>> > That's fine. I would hope though that you recognize that concrete > >>> > examples > >>> > help people like me, and construct one or two to help us out. > >>> That's what use-cases are for in designing APIs. There are examples > >>> of use in the NEP: > >>> > >>> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst > >>> > >>> the alterNEP: > >>> > >>> https://gist.github.com/1056379 > >>> > >>> and my longer email to Travis: > >>> > >>> > >>> > http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored > >>> > >>> Mark has done a nice job of documentation: > >>> > >>> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html > >>> > >>> If you want to understand what the alterNEP case is, I'd suggest the > >>> email, just because it's the most recent and I think the terminology > >>> is slightly clearer. > >>> > >>> Doing the same examples on a larger array won't make the point easier > >>> to understand. The discussion is about what the right concepts are, > >>> and you can help by looking at the snippets of code in those > >>> documents, and deciding for yourself whether you think the current > >>> masking / NA implementation seems natural and easy to explain, or > >>> rather forced and difficult to explain, and then email back trying to > >>> explain your impression (which is not always easy). > >> > >> If you seriously believe that looking at a few snippets is as helpful > and > >> instructive as being able to play around with them in IPython and modify > >> them, then I guess we won't make progress in this part of the > discussion. > >> You're just telling me to go back and re-read things I'd already read. > > > > The snippets are in ipython or doctest format - aren't they? > > Oops - 10 minute rule. Now I see that you mean that you can't > experiment with the alternative implementation without working code. > Indeed. > That's true, but I am hoping that the difference between - say: > > a[0:2] = np.NA > > and > > a.mask[0:2] = False > > would be easy enough to imagine. It is in this case. I agree the explicit ``a.mask`` is clearer. This is a quite specific point that could be improved in the current implementation. It doesn't require ripping everything out. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Oct 29 17:55:56 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 29 Oct 2011 14:55:56 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers wrote: > > > On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett >> wrote: >> > Hi, >> > >> > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers >> > wrote: >> >> >> >> >> >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett >> >> >> >> wrote: >> >>> >> >>> Hi, >> >>> >> >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers >> >>> wrote: >> >>> > >> >>> > >> >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett >> >>> > >> >>> > wrote: >> >>> >> >> >>> >> Hi, >> >>> >> >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >> >>> >> wrote: >> >>> >> > >> >>> >> > >> >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >> >>> >> > >> >>> >> > wrote: >> >>> >> >> >> >>> >> >> Hi, >> >>> >> >> >> >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> >>> >> >> wrote: >> >>> >> >> >> >> >>> >> >> >> >>> >> >> No, that's not what Nathaniel and I are saying at all. Nathaniel >> >>> >> >> was >> >>> >> >> pointing to links for projects that care that everyone agrees >> >>> >> >> before >> >>> >> >> they go ahead. >> >>> >> > >> >>> >> > It looked to me like there was a serious intent to come to an >> >>> >> > agreement, >> >>> >> > or >> >>> >> > at least closer together. The discussion in the summer was going >> >>> >> > around >> >>> >> > in >> >>> >> > circles though, and was too abstract and complex to follow. >> >>> >> > Therefore >> >>> >> > Mark's >> >>> >> > choice of implementing something and then asking for feedback >> >>> >> > made >> >>> >> > sense >> >>> >> > to >> >>> >> > me. >> >>> >> >> >>> >> I should point out that the implementation hasn't - as far as I can >> >>> >> see - changed the discussion. ?The discussion was about the API. >> >>> >> >> >>> >> Implementations are useful for agreed APIs because they can point >> >>> >> out >> >>> >> where the API does not make sense or cannot be implemented. ?In >> >>> >> this >> >>> >> case, the API Mark said he was going to implement - he did >> >>> >> implement - >> >>> >> at least as far as I can see. ?Again, I'm happy to be corrected. >> >>> > >> >>> > Implementations can also help the discussion along, by allowing >> >>> > people >> >>> > to >> >>> > try out some of the proposed changes. It also allows to construct >> >>> > examples >> >>> > that show weaknesses, possibly to be solved by an alternative API. >> >>> > Maybe >> >>> > you >> >>> > can hold the complete history of this topic in your head and >> >>> > comprehend >> >>> > it, >> >>> > but for me it would be very helpful if someone said: >> >>> > - here's my dataset >> >>> > - this is what I want to do with it >> >>> > - this is the best I can do with the current implementation >> >>> > - here's how API X would allow me to solve this better or simpler >> >>> > This can be done much better with actual data and an actual >> >>> > implementation >> >>> > than with a design proposal. You seem to disagree with this >> >>> > statement. >> >>> > That's fine. I would hope though that you recognize that concrete >> >>> > examples >> >>> > help people like me, and construct one or two to help us out. >> >>> That's what use-cases are for in designing APIs. ?There are examples >> >>> of use in the NEP: >> >>> >> >>> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst >> >>> >> >>> the alterNEP: >> >>> >> >>> https://gist.github.com/1056379 >> >>> >> >>> and my longer email to Travis: >> >>> >> >>> >> >>> >> >>> http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored >> >>> >> >>> Mark has done a nice job of documentation: >> >>> >> >>> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html >> >>> >> >>> If you want to understand what the alterNEP case is, I'd suggest the >> >>> email, just because it's the most recent and I think the terminology >> >>> is slightly clearer. >> >>> >> >>> Doing the same examples on a larger array won't make the point easier >> >>> to understand. ?The discussion is about what the right concepts are, >> >>> and you can help by looking at the snippets of code in those >> >>> documents, and deciding for yourself whether you think the current >> >>> masking / NA implementation seems natural and easy to explain, or >> >>> rather forced and difficult to explain, and then email back trying to >> >>> explain your impression (which is not always easy). >> >> >> >> If you seriously believe that looking at a few snippets is as helpful >> >> and >> >> instructive as being able to play around with them in IPython and >> >> modify >> >> them, then I guess we won't make progress in this part of the >> >> discussion. >> >> You're just telling me to go back and re-read things I'd already read. >> > >> > The snippets are in ipython or doctest format - aren't they? >> >> Oops - 10 minute rule. ?Now I see that you mean that you can't >> experiment with the alternative implementation without working code. > > Indeed. > >> >> That's true, but I am hoping that the difference between - say: >> >> a[0:2] = np.NA >> >> and >> >> a.mask[0:2] = False >> >> would be easy enough to imagine. > > It is in this case. I agree the explicit ``a.mask`` is clearer. This is a > quite specific point that could be improved in the current implementation. Thanks - this is helpful. > It doesn't require ripping everything out. Nathaniel wasn't proposing 'ripping everything out' - but backing off until consensus has been reached. That's different. If you think we should not do that, and you are interested, please say why. Second - I was proposing that we do indeed keep the code in the codebase but discuss adaptations that could achieve consensus. See you, Matthew From charlesr.harris at gmail.com Sat Oct 29 17:59:43 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 29 Oct 2011 15:59:43 -0600 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett wrote: > Hi, > > On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers > wrote: > > > > > > On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett > > > wrote: > >> > >> Hi, > >> > >> On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett > > >> wrote: > >> > Hi, > >> > > >> > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers > >> > wrote: > >> >> > >> >> > >> >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett > >> >> > >> >> wrote: > >> >>> > >> >>> Hi, > >> >>> > >> >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers > >> >>> wrote: > >> >>> > > >> >>> > > >> >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett > >> >>> > > >> >>> > wrote: > >> >>> >> > >> >>> >> Hi, > >> >>> >> > >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers > >> >>> >> wrote: > >> >>> >> > > >> >>> >> > > >> >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett > >> >>> >> > > >> >>> >> > wrote: > >> >>> >> >> > >> >>> >> >> Hi, > >> >>> >> >> > >> >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > >> >>> >> >> wrote: > >> >>> >> >> >> > >> >>> >> >> > >> >>> >> >> No, that's not what Nathaniel and I are saying at all. > Nathaniel > >> >>> >> >> was > >> >>> >> >> pointing to links for projects that care that everyone agrees > >> >>> >> >> before > >> >>> >> >> they go ahead. > >> >>> >> > > >> >>> >> > It looked to me like there was a serious intent to come to an > >> >>> >> > agreement, > >> >>> >> > or > >> >>> >> > at least closer together. The discussion in the summer was > going > >> >>> >> > around > >> >>> >> > in > >> >>> >> > circles though, and was too abstract and complex to follow. > >> >>> >> > Therefore > >> >>> >> > Mark's > >> >>> >> > choice of implementing something and then asking for feedback > >> >>> >> > made > >> >>> >> > sense > >> >>> >> > to > >> >>> >> > me. > >> >>> >> > >> >>> >> I should point out that the implementation hasn't - as far as I > can > >> >>> >> see - changed the discussion. The discussion was about the API. > >> >>> >> > >> >>> >> Implementations are useful for agreed APIs because they can point > >> >>> >> out > >> >>> >> where the API does not make sense or cannot be implemented. In > >> >>> >> this > >> >>> >> case, the API Mark said he was going to implement - he did > >> >>> >> implement - > >> >>> >> at least as far as I can see. Again, I'm happy to be corrected. > >> >>> > > >> >>> > Implementations can also help the discussion along, by allowing > >> >>> > people > >> >>> > to > >> >>> > try out some of the proposed changes. It also allows to construct > >> >>> > examples > >> >>> > that show weaknesses, possibly to be solved by an alternative API. > >> >>> > Maybe > >> >>> > you > >> >>> > can hold the complete history of this topic in your head and > >> >>> > comprehend > >> >>> > it, > >> >>> > but for me it would be very helpful if someone said: > >> >>> > - here's my dataset > >> >>> > - this is what I want to do with it > >> >>> > - this is the best I can do with the current implementation > >> >>> > - here's how API X would allow me to solve this better or simpler > >> >>> > This can be done much better with actual data and an actual > >> >>> > implementation > >> >>> > than with a design proposal. You seem to disagree with this > >> >>> > statement. > >> >>> > That's fine. I would hope though that you recognize that concrete > >> >>> > examples > >> >>> > help people like me, and construct one or two to help us out. > >> >>> That's what use-cases are for in designing APIs. There are examples > >> >>> of use in the NEP: > >> >>> > >> >>> > https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst > >> >>> > >> >>> the alterNEP: > >> >>> > >> >>> https://gist.github.com/1056379 > >> >>> > >> >>> and my longer email to Travis: > >> >>> > >> >>> > >> >>> > >> >>> > http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored > >> >>> > >> >>> Mark has done a nice job of documentation: > >> >>> > >> >>> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html > >> >>> > >> >>> If you want to understand what the alterNEP case is, I'd suggest the > >> >>> email, just because it's the most recent and I think the terminology > >> >>> is slightly clearer. > >> >>> > >> >>> Doing the same examples on a larger array won't make the point > easier > >> >>> to understand. The discussion is about what the right concepts are, > >> >>> and you can help by looking at the snippets of code in those > >> >>> documents, and deciding for yourself whether you think the current > >> >>> masking / NA implementation seems natural and easy to explain, or > >> >>> rather forced and difficult to explain, and then email back trying > to > >> >>> explain your impression (which is not always easy). > >> >> > >> >> If you seriously believe that looking at a few snippets is as helpful > >> >> and > >> >> instructive as being able to play around with them in IPython and > >> >> modify > >> >> them, then I guess we won't make progress in this part of the > >> >> discussion. > >> >> You're just telling me to go back and re-read things I'd already > read. > >> > > >> > The snippets are in ipython or doctest format - aren't they? > >> > >> Oops - 10 minute rule. Now I see that you mean that you can't > >> experiment with the alternative implementation without working code. > > > > Indeed. > > > >> > >> That's true, but I am hoping that the difference between - say: > >> > >> a[0:2] = np.NA > >> > >> and > >> > >> a.mask[0:2] = False > >> > >> would be easy enough to imagine. > > > > It is in this case. I agree the explicit ``a.mask`` is clearer. This is a > > quite specific point that could be improved in the current > implementation. > > Thanks - this is helpful. > > > It doesn't require ripping everything out. > > Nathaniel wasn't proposing 'ripping everything out' - but backing off > until consensus has been reached. That's different. If you think > we should not do that, and you are interested, please say why. > Second - I was proposing that we do indeed keep the code in the > codebase but discuss adaptations that could achieve consensus. > > I'm much opposed to ripping the current code out. It isn't like it is (known to be) buggy, nor has anyone made the case that it isn't a basis on which build other options. It also smacks of gratuitous violence committed by someone yet to make a positive contribution to the project. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Sat Oct 29 18:02:16 2011 From: shish at keba.be (Olivier Delalleau) Date: Sat, 29 Oct 2011 18:02:16 -0400 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: 2011/10/29 Ralf Gommers > > > On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett wrote: > >> Hi, >> >> On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett >> wrote: >> > Hi, >> > >> > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers >> > wrote: >> >> >> >> >> >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett < >> matthew.brett at gmail.com> >> >> wrote: >> >>> >> >>> Hi, >> >>> >> >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers >> >>> wrote: >> >>> > >> >>> > >> >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett < >> matthew.brett at gmail.com> >> >>> > wrote: >> >>> >> >> >>> >> Hi, >> >>> >> >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >> >>> >> wrote: >> >>> >> > >> >>> >> > >> >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >> >>> >> > >> >>> >> > wrote: >> >>> >> >> >> >>> >> >> Hi, >> >>> >> >> >> >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> >>> >> >> wrote: >> >>> >> >> >> >> >>> >> >> >> >>> >> >> No, that's not what Nathaniel and I are saying at all. Nathaniel >> was >> >>> >> >> pointing to links for projects that care that everyone agrees >> before >> >>> >> >> they go ahead. >> >>> >> > >> >>> >> > It looked to me like there was a serious intent to come to an >> >>> >> > agreement, >> >>> >> > or >> >>> >> > at least closer together. The discussion in the summer was going >> >>> >> > around >> >>> >> > in >> >>> >> > circles though, and was too abstract and complex to follow. >> Therefore >> >>> >> > Mark's >> >>> >> > choice of implementing something and then asking for feedback >> made >> >>> >> > sense >> >>> >> > to >> >>> >> > me. >> >>> >> >> >>> >> I should point out that the implementation hasn't - as far as I can >> >>> >> see - changed the discussion. The discussion was about the API. >> >>> >> >> >>> >> Implementations are useful for agreed APIs because they can point >> out >> >>> >> where the API does not make sense or cannot be implemented. In >> this >> >>> >> case, the API Mark said he was going to implement - he did >> implement - >> >>> >> at least as far as I can see. Again, I'm happy to be corrected. >> >>> > >> >>> > Implementations can also help the discussion along, by allowing >> people >> >>> > to >> >>> > try out some of the proposed changes. It also allows to construct >> >>> > examples >> >>> > that show weaknesses, possibly to be solved by an alternative API. >> Maybe >> >>> > you >> >>> > can hold the complete history of this topic in your head and >> comprehend >> >>> > it, >> >>> > but for me it would be very helpful if someone said: >> >>> > - here's my dataset >> >>> > - this is what I want to do with it >> >>> > - this is the best I can do with the current implementation >> >>> > - here's how API X would allow me to solve this better or simpler >> >>> > This can be done much better with actual data and an actual >> >>> > implementation >> >>> > than with a design proposal. You seem to disagree with this >> statement. >> >>> > That's fine. I would hope though that you recognize that concrete >> >>> > examples >> >>> > help people like me, and construct one or two to help us out. >> >>> That's what use-cases are for in designing APIs. There are examples >> >>> of use in the NEP: >> >>> >> >>> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst >> >>> >> >>> the alterNEP: >> >>> >> >>> https://gist.github.com/1056379 >> >>> >> >>> and my longer email to Travis: >> >>> >> >>> >> >>> >> http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored >> >>> >> >>> Mark has done a nice job of documentation: >> >>> >> >>> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html >> >>> >> >>> If you want to understand what the alterNEP case is, I'd suggest the >> >>> email, just because it's the most recent and I think the terminology >> >>> is slightly clearer. >> >>> >> >>> Doing the same examples on a larger array won't make the point easier >> >>> to understand. The discussion is about what the right concepts are, >> >>> and you can help by looking at the snippets of code in those >> >>> documents, and deciding for yourself whether you think the current >> >>> masking / NA implementation seems natural and easy to explain, or >> >>> rather forced and difficult to explain, and then email back trying to >> >>> explain your impression (which is not always easy). >> >> >> >> If you seriously believe that looking at a few snippets is as helpful >> and >> >> instructive as being able to play around with them in IPython and >> modify >> >> them, then I guess we won't make progress in this part of the >> discussion. >> >> You're just telling me to go back and re-read things I'd already read. >> > >> > The snippets are in ipython or doctest format - aren't they? >> >> Oops - 10 minute rule. Now I see that you mean that you can't >> experiment with the alternative implementation without working code. >> > > Indeed. > > >> That's true, but I am hoping that the difference between - say: >> >> a[0:2] = np.NA >> >> and >> >> a.mask[0:2] = False >> >> would be easy enough to imagine. > > > It is in this case. I agree the explicit ``a.mask`` is clearer. This is a > quite specific point that could be improved in the current implementation. > It doesn't require ripping everything out. > > Ralf > I haven't been following the discussion closely, but wouldn't it be instead: a.mask[0:2] = True? It's something that I actually find a bit difficult to get right in the current numpy.ma implementation: I would find more intuitive to have True for "valid" data, and False for invalid / missing / ... I realize how the implementation makes sense (and is appropriate given that the name is "mask"), but I just thought I'd point this out... even if it's just me ;) -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Oct 29 18:10:21 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 29 Oct 2011 16:10:21 -0600 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Sat, Oct 29, 2011 at 4:02 PM, Olivier Delalleau wrote: > 2011/10/29 Ralf Gommers > >> >> >> On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett wrote: >> >>> Hi, >>> >>> On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett >>> wrote: >>> > Hi, >>> > >>> > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers >>> > wrote: >>> >> >>> >> >>> >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett < >>> matthew.brett at gmail.com> >>> >> wrote: >>> >>> >>> >>> Hi, >>> >>> >>> >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers >>> >>> wrote: >>> >>> > >>> >>> > >>> >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett < >>> matthew.brett at gmail.com> >>> >>> > wrote: >>> >>> >> >>> >>> >> Hi, >>> >>> >> >>> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >>> >>> >> wrote: >>> >>> >> > >>> >>> >> > >>> >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >>> >>> >> > >>> >>> >> > wrote: >>> >>> >> >> >>> >>> >> >> Hi, >>> >>> >> >> >>> >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >>> >>> >> >> wrote: >>> >>> >> >> >> >>> >>> >> >> >>> >>> >> >> No, that's not what Nathaniel and I are saying at all. >>> Nathaniel was >>> >>> >> >> pointing to links for projects that care that everyone agrees >>> before >>> >>> >> >> they go ahead. >>> >>> >> > >>> >>> >> > It looked to me like there was a serious intent to come to an >>> >>> >> > agreement, >>> >>> >> > or >>> >>> >> > at least closer together. The discussion in the summer was going >>> >>> >> > around >>> >>> >> > in >>> >>> >> > circles though, and was too abstract and complex to follow. >>> Therefore >>> >>> >> > Mark's >>> >>> >> > choice of implementing something and then asking for feedback >>> made >>> >>> >> > sense >>> >>> >> > to >>> >>> >> > me. >>> >>> >> >>> >>> >> I should point out that the implementation hasn't - as far as I >>> can >>> >>> >> see - changed the discussion. The discussion was about the API. >>> >>> >> >>> >>> >> Implementations are useful for agreed APIs because they can point >>> out >>> >>> >> where the API does not make sense or cannot be implemented. In >>> this >>> >>> >> case, the API Mark said he was going to implement - he did >>> implement - >>> >>> >> at least as far as I can see. Again, I'm happy to be corrected. >>> >>> > >>> >>> > Implementations can also help the discussion along, by allowing >>> people >>> >>> > to >>> >>> > try out some of the proposed changes. It also allows to construct >>> >>> > examples >>> >>> > that show weaknesses, possibly to be solved by an alternative API. >>> Maybe >>> >>> > you >>> >>> > can hold the complete history of this topic in your head and >>> comprehend >>> >>> > it, >>> >>> > but for me it would be very helpful if someone said: >>> >>> > - here's my dataset >>> >>> > - this is what I want to do with it >>> >>> > - this is the best I can do with the current implementation >>> >>> > - here's how API X would allow me to solve this better or simpler >>> >>> > This can be done much better with actual data and an actual >>> >>> > implementation >>> >>> > than with a design proposal. You seem to disagree with this >>> statement. >>> >>> > That's fine. I would hope though that you recognize that concrete >>> >>> > examples >>> >>> > help people like me, and construct one or two to help us out. >>> >>> That's what use-cases are for in designing APIs. There are examples >>> >>> of use in the NEP: >>> >>> >>> >>> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst >>> >>> >>> >>> the alterNEP: >>> >>> >>> >>> https://gist.github.com/1056379 >>> >>> >>> >>> and my longer email to Travis: >>> >>> >>> >>> >>> >>> >>> http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored >>> >>> >>> >>> Mark has done a nice job of documentation: >>> >>> >>> >>> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html >>> >>> >>> >>> If you want to understand what the alterNEP case is, I'd suggest the >>> >>> email, just because it's the most recent and I think the terminology >>> >>> is slightly clearer. >>> >>> >>> >>> Doing the same examples on a larger array won't make the point easier >>> >>> to understand. The discussion is about what the right concepts are, >>> >>> and you can help by looking at the snippets of code in those >>> >>> documents, and deciding for yourself whether you think the current >>> >>> masking / NA implementation seems natural and easy to explain, or >>> >>> rather forced and difficult to explain, and then email back trying to >>> >>> explain your impression (which is not always easy). >>> >> >>> >> If you seriously believe that looking at a few snippets is as helpful >>> and >>> >> instructive as being able to play around with them in IPython and >>> modify >>> >> them, then I guess we won't make progress in this part of the >>> discussion. >>> >> You're just telling me to go back and re-read things I'd already read. >>> > >>> > The snippets are in ipython or doctest format - aren't they? >>> >>> Oops - 10 minute rule. Now I see that you mean that you can't >>> experiment with the alternative implementation without working code. >>> >> >> Indeed. >> >> >>> That's true, but I am hoping that the difference between - say: >>> >>> a[0:2] = np.NA >>> >>> and >>> >>> a.mask[0:2] = False >>> >>> would be easy enough to imagine. >> >> >> It is in this case. I agree the explicit ``a.mask`` is clearer. This is a >> quite specific point that could be improved in the current implementation. >> It doesn't require ripping everything out. >> >> Ralf >> > > I haven't been following the discussion closely, but wouldn't it be > instead: > a.mask[0:2] = True? > > It's something that I actually find a bit difficult to get right in the > current numpy.ma implementation: I would find more intuitive to have True > for "valid" data, and False for invalid / missing / ... I realize how the > implementation makes sense (and is appropriate given that the name is > "mask"), but I just thought I'd point this out... even if it's just me ;) > > Well, there is the problem of replacing an unknown value by a known value, and then you would have to clear the mask also. However, I do appreciate this sort of feedback from actual use. We need more in order to see what are real sticking points and to separate the usual frustrations of learning new stuff from the more serious problem of inadequate API. If enough people start giving feedback we might want to set up some way to track it. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Sat Oct 29 18:47:30 2011 From: efiring at hawaii.edu (Eric Firing) Date: Sat, 29 Oct 2011 12:47:30 -1000 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: <4EAC8282.9070708@hawaii.edu> On 10/29/2011 12:02 PM, Olivier Delalleau wrote: > > I haven't been following the discussion closely, but wouldn't it be instead: > a.mask[0:2] = True? That would be consistent with numpy.ma and the opposite of Mark's implementation. I can live with either, but I much prefer the numpy.ma version because it fits with the use of bit-flags for editing data; set bit 1 if it fails check A, set bit 2 if it fails check B, etc. So, if it evaluates as True, there is a problem, and the value is masked *out*. Similarly, in Marks implementation, 7 bits are available for a payload to describe what kind of masking is meant. This seems more consistent with True as masked (or NA) than with False as masked. Eric > > It's something that I actually find a bit difficult to get right in the > current numpy.ma implementation: I would find more > intuitive to have True for "valid" data, and False for invalid / missing > / ... I realize how the implementation makes sense (and is appropriate > given that the name is "mask"), but I just thought I'd point this out... > even if it's just me ;) > > -=- Olivier > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Sat Oct 29 18:55:21 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 29 Oct 2011 15:55:21 -0700 Subject: [Numpy-discussion] Large numbers into float128 Message-ID: Hi, Can anyone think of a good way to set a float128 value to an arbitrarily large number? As in v = int_to_float128(some_value) ? I'm trying things like v = np.float128(2**64+2) but, because (in other threads) the float128 seems to be going through float64 on assignment, this loses precision, so although 2**64+2 is representable in float128, in fact I get: In [35]: np.float128(2**64+2) Out[35]: 18446744073709551616.0 In [36]: 2**64+2 Out[36]: 18446744073709551618L So - can anyone think of another way to assign values to float128 that will keep the precision? Thanks a lot, Matthew From charlesr.harris at gmail.com Sat Oct 29 18:57:22 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 29 Oct 2011 16:57:22 -0600 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: <4EAC8282.9070708@hawaii.edu> References: <4EAC8282.9070708@hawaii.edu> Message-ID: On Sat, Oct 29, 2011 at 4:47 PM, Eric Firing wrote: > On 10/29/2011 12:02 PM, Olivier Delalleau wrote: > > > > > I haven't been following the discussion closely, but wouldn't it be > instead: > > a.mask[0:2] = True? > > That would be consistent with numpy.ma and the opposite of Mark's > implementation. > > I can live with either, but I much prefer the numpy.ma version because > it fits with the use of bit-flags for editing data; set bit 1 if it > fails check A, set bit 2 if it fails check B, etc. So, if it evaluates > as True, there is a problem, and the value is masked *out*. > > Similarly, in Marks implementation, 7 bits are available for a payload > to describe what kind of masking is meant. This seems more consistent > with True as masked (or NA) than with False as masked. > I wouldn't rely on the 7 bits yet. Mark left them available to keep open possible future use, but didn't implement anything using them yet. If memory use turns out to exclude whole sectors of application we will have to go to bit masks. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Oct 29 19:11:57 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 29 Oct 2011 16:11:57 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris wrote: > > > On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers >> wrote: >> > >> > >> > On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett >> > >> > wrote: >> >> >> >> Hi, >> >> >> >> On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett >> >> >> >> wrote: >> >> > Hi, >> >> > >> >> > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers >> >> > wrote: >> >> >> >> >> >> >> >> >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett >> >> >> >> >> >> wrote: >> >> >>> >> >> >>> Hi, >> >> >>> >> >> >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers >> >> >>> wrote: >> >> >>> > >> >> >>> > >> >> >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett >> >> >>> > >> >> >>> > wrote: >> >> >>> >> >> >> >>> >> Hi, >> >> >>> >> >> >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >> >> >>> >> wrote: >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >> >> >>> >> > >> >> >>> >> > wrote: >> >> >>> >> >> >> >> >>> >> >> Hi, >> >> >>> >> >> >> >> >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> >> >>> >> >> wrote: >> >> >>> >> >> >> >> >> >>> >> >> >> >> >>> >> >> No, that's not what Nathaniel and I are saying at all. >> >> >>> >> >> Nathaniel >> >> >>> >> >> was >> >> >>> >> >> pointing to links for projects that care that everyone agrees >> >> >>> >> >> before >> >> >>> >> >> they go ahead. >> >> >>> >> > >> >> >>> >> > It looked to me like there was a serious intent to come to an >> >> >>> >> > agreement, >> >> >>> >> > or >> >> >>> >> > at least closer together. The discussion in the summer was >> >> >>> >> > going >> >> >>> >> > around >> >> >>> >> > in >> >> >>> >> > circles though, and was too abstract and complex to follow. >> >> >>> >> > Therefore >> >> >>> >> > Mark's >> >> >>> >> > choice of implementing something and then asking for feedback >> >> >>> >> > made >> >> >>> >> > sense >> >> >>> >> > to >> >> >>> >> > me. >> >> >>> >> >> >> >>> >> I should point out that the implementation hasn't - as far as I >> >> >>> >> can >> >> >>> >> see - changed the discussion. ?The discussion was about the API. >> >> >>> >> >> >> >>> >> Implementations are useful for agreed APIs because they can >> >> >>> >> point >> >> >>> >> out >> >> >>> >> where the API does not make sense or cannot be implemented. ?In >> >> >>> >> this >> >> >>> >> case, the API Mark said he was going to implement - he did >> >> >>> >> implement - >> >> >>> >> at least as far as I can see. ?Again, I'm happy to be corrected. >> >> >>> > >> >> >>> > Implementations can also help the discussion along, by allowing >> >> >>> > people >> >> >>> > to >> >> >>> > try out some of the proposed changes. It also allows to construct >> >> >>> > examples >> >> >>> > that show weaknesses, possibly to be solved by an alternative >> >> >>> > API. >> >> >>> > Maybe >> >> >>> > you >> >> >>> > can hold the complete history of this topic in your head and >> >> >>> > comprehend >> >> >>> > it, >> >> >>> > but for me it would be very helpful if someone said: >> >> >>> > - here's my dataset >> >> >>> > - this is what I want to do with it >> >> >>> > - this is the best I can do with the current implementation >> >> >>> > - here's how API X would allow me to solve this better or simpler >> >> >>> > This can be done much better with actual data and an actual >> >> >>> > implementation >> >> >>> > than with a design proposal. You seem to disagree with this >> >> >>> > statement. >> >> >>> > That's fine. I would hope though that you recognize that concrete >> >> >>> > examples >> >> >>> > help people like me, and construct one or two to help us out. >> >> >>> That's what use-cases are for in designing APIs. ?There are >> >> >>> examples >> >> >>> of use in the NEP: >> >> >>> >> >> >>> >> >> >>> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst >> >> >>> >> >> >>> the alterNEP: >> >> >>> >> >> >>> https://gist.github.com/1056379 >> >> >>> >> >> >>> and my longer email to Travis: >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored >> >> >>> >> >> >>> Mark has done a nice job of documentation: >> >> >>> >> >> >>> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html >> >> >>> >> >> >>> If you want to understand what the alterNEP case is, I'd suggest >> >> >>> the >> >> >>> email, just because it's the most recent and I think the >> >> >>> terminology >> >> >>> is slightly clearer. >> >> >>> >> >> >>> Doing the same examples on a larger array won't make the point >> >> >>> easier >> >> >>> to understand. ?The discussion is about what the right concepts >> >> >>> are, >> >> >>> and you can help by looking at the snippets of code in those >> >> >>> documents, and deciding for yourself whether you think the current >> >> >>> masking / NA implementation seems natural and easy to explain, or >> >> >>> rather forced and difficult to explain, and then email back trying >> >> >>> to >> >> >>> explain your impression (which is not always easy). >> >> >> >> >> >> If you seriously believe that looking at a few snippets is as >> >> >> helpful >> >> >> and >> >> >> instructive as being able to play around with them in IPython and >> >> >> modify >> >> >> them, then I guess we won't make progress in this part of the >> >> >> discussion. >> >> >> You're just telling me to go back and re-read things I'd already >> >> >> read. >> >> > >> >> > The snippets are in ipython or doctest format - aren't they? >> >> >> >> Oops - 10 minute rule. ?Now I see that you mean that you can't >> >> experiment with the alternative implementation without working code. >> > >> > Indeed. >> > >> >> >> >> That's true, but I am hoping that the difference between - say: >> >> >> >> a[0:2] = np.NA >> >> >> >> and >> >> >> >> a.mask[0:2] = False >> >> >> >> would be easy enough to imagine. >> > >> > It is in this case. I agree the explicit ``a.mask`` is clearer. This is >> > a >> > quite specific point that could be improved in the current >> > implementation. >> >> Thanks - this is helpful. >> >> > It doesn't require ripping everything out. >> >> Nathaniel wasn't proposing 'ripping everything out' - but backing off >> until consensus has been reached. ?That's different. ? ?If you think >> we should not do that, and you are interested, please say why. >> Second - I was proposing that we do indeed keep the code in the >> codebase but discuss adaptations that could achieve consensus. >> > > I'm much opposed to ripping the current code out. You are repeating the loaded phrase 'ripping the current code out' and thus making the discussion less sensible and more hostile. > It isn't like it is (known > to be) buggy, nor has anyone made the case that it isn't a basis on which > build other options. It also smacks of gratuitous violence committed by > someone yet to make a positive contribution to the project. This is cheap, rude, and silly. All I can see from Nathaniel is a reasonable, fair attempt to discuss the code. He proposed backing off the code in good faith. You are emphatically, and, in my view childishly, ignoring the substantial points he is making, and asserting over and over that he deserves no hearing because he has not contributed code. This is a terribly destructive way to work. If I was a new developer reading this, I would conclude, that I had better be damn careful which side I'm on, before I express my opinion, otherwise I'm going to be made to feel like I don't exist by the other people on the project. That is miserable, it is silly, and it's the wrong way to do business. Best, Matthew From hangenuit at gmail.com Sat Oct 29 19:13:18 2011 From: hangenuit at gmail.com (Han Genuit) Date: Sun, 30 Oct 2011 01:13:18 +0200 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: <4EAC8282.9070708@hawaii.edu> References: <4EAC8282.9070708@hawaii.edu> Message-ID: On Sun, Oct 30, 2011 at 12:47 AM, Eric Firing wrote: > On 10/29/2011 12:02 PM, Olivier Delalleau wrote: > >> >> I haven't been following the discussion closely, but wouldn't it be instead: >> a.mask[0:2] = True? > > That would be consistent with numpy.ma and the opposite of Mark's > implementation. > > I can live with either, but I much prefer the numpy.ma version because > it fits with the use of bit-flags for editing data; set bit 1 if it > fails check A, set bit 2 if it fails check B, etc. ?So, if it evaluates > as True, there is a problem, and the value is masked *out*. > I think in Mark's implementation it works the same: >>> a = np.arange(3, maskna=True) >>> a[1] = np.NA >>> a array([0, NA, 2]) >>> np.isna(a) array([False, True, False], dtype=bool) This is more consistent than using False to represent an NA mask, I agree. From charlesr.harris at gmail.com Sat Oct 29 19:14:25 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 29 Oct 2011 17:14:25 -0600 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Sat, Oct 29, 2011 at 5:11 PM, Matthew Brett wrote: > Hi, > > On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris > wrote: > > > > > > On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett > > wrote: > >> > >> Hi, > >> > >> On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers > >> wrote: > >> > > >> > > >> > On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett > >> > > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett > >> >> > >> >> wrote: > >> >> > Hi, > >> >> > > >> >> > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers > >> >> > wrote: > >> >> >> > >> >> >> > >> >> >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett > >> >> >> > >> >> >> wrote: > >> >> >>> > >> >> >>> Hi, > >> >> >>> > >> >> >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers > >> >> >>> wrote: > >> >> >>> > > >> >> >>> > > >> >> >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett > >> >> >>> > > >> >> >>> > wrote: > >> >> >>> >> > >> >> >>> >> Hi, > >> >> >>> >> > >> >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers > >> >> >>> >> wrote: > >> >> >>> >> > > >> >> >>> >> > > >> >> >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett > >> >> >>> >> > > >> >> >>> >> > wrote: > >> >> >>> >> >> > >> >> >>> >> >> Hi, > >> >> >>> >> >> > >> >> >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > >> >> >>> >> >> wrote: > >> >> >>> >> >> >> > >> >> >>> >> >> > >> >> >>> >> >> No, that's not what Nathaniel and I are saying at all. > >> >> >>> >> >> Nathaniel > >> >> >>> >> >> was > >> >> >>> >> >> pointing to links for projects that care that everyone > agrees > >> >> >>> >> >> before > >> >> >>> >> >> they go ahead. > >> >> >>> >> > > >> >> >>> >> > It looked to me like there was a serious intent to come to > an > >> >> >>> >> > agreement, > >> >> >>> >> > or > >> >> >>> >> > at least closer together. The discussion in the summer was > >> >> >>> >> > going > >> >> >>> >> > around > >> >> >>> >> > in > >> >> >>> >> > circles though, and was too abstract and complex to follow. > >> >> >>> >> > Therefore > >> >> >>> >> > Mark's > >> >> >>> >> > choice of implementing something and then asking for > feedback > >> >> >>> >> > made > >> >> >>> >> > sense > >> >> >>> >> > to > >> >> >>> >> > me. > >> >> >>> >> > >> >> >>> >> I should point out that the implementation hasn't - as far as > I > >> >> >>> >> can > >> >> >>> >> see - changed the discussion. The discussion was about the > API. > >> >> >>> >> > >> >> >>> >> Implementations are useful for agreed APIs because they can > >> >> >>> >> point > >> >> >>> >> out > >> >> >>> >> where the API does not make sense or cannot be implemented. > In > >> >> >>> >> this > >> >> >>> >> case, the API Mark said he was going to implement - he did > >> >> >>> >> implement - > >> >> >>> >> at least as far as I can see. Again, I'm happy to be > corrected. > >> >> >>> > > >> >> >>> > Implementations can also help the discussion along, by allowing > >> >> >>> > people > >> >> >>> > to > >> >> >>> > try out some of the proposed changes. It also allows to > construct > >> >> >>> > examples > >> >> >>> > that show weaknesses, possibly to be solved by an alternative > >> >> >>> > API. > >> >> >>> > Maybe > >> >> >>> > you > >> >> >>> > can hold the complete history of this topic in your head and > >> >> >>> > comprehend > >> >> >>> > it, > >> >> >>> > but for me it would be very helpful if someone said: > >> >> >>> > - here's my dataset > >> >> >>> > - this is what I want to do with it > >> >> >>> > - this is the best I can do with the current implementation > >> >> >>> > - here's how API X would allow me to solve this better or > simpler > >> >> >>> > This can be done much better with actual data and an actual > >> >> >>> > implementation > >> >> >>> > than with a design proposal. You seem to disagree with this > >> >> >>> > statement. > >> >> >>> > That's fine. I would hope though that you recognize that > concrete > >> >> >>> > examples > >> >> >>> > help people like me, and construct one or two to help us out. > >> >> >>> That's what use-cases are for in designing APIs. There are > >> >> >>> examples > >> >> >>> of use in the NEP: > >> >> >>> > >> >> >>> > >> >> >>> > https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst > >> >> >>> > >> >> >>> the alterNEP: > >> >> >>> > >> >> >>> https://gist.github.com/1056379 > >> >> >>> > >> >> >>> and my longer email to Travis: > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored > >> >> >>> > >> >> >>> Mark has done a nice job of documentation: > >> >> >>> > >> >> >>> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html > >> >> >>> > >> >> >>> If you want to understand what the alterNEP case is, I'd suggest > >> >> >>> the > >> >> >>> email, just because it's the most recent and I think the > >> >> >>> terminology > >> >> >>> is slightly clearer. > >> >> >>> > >> >> >>> Doing the same examples on a larger array won't make the point > >> >> >>> easier > >> >> >>> to understand. The discussion is about what the right concepts > >> >> >>> are, > >> >> >>> and you can help by looking at the snippets of code in those > >> >> >>> documents, and deciding for yourself whether you think the > current > >> >> >>> masking / NA implementation seems natural and easy to explain, or > >> >> >>> rather forced and difficult to explain, and then email back > trying > >> >> >>> to > >> >> >>> explain your impression (which is not always easy). > >> >> >> > >> >> >> If you seriously believe that looking at a few snippets is as > >> >> >> helpful > >> >> >> and > >> >> >> instructive as being able to play around with them in IPython and > >> >> >> modify > >> >> >> them, then I guess we won't make progress in this part of the > >> >> >> discussion. > >> >> >> You're just telling me to go back and re-read things I'd already > >> >> >> read. > >> >> > > >> >> > The snippets are in ipython or doctest format - aren't they? > >> >> > >> >> Oops - 10 minute rule. Now I see that you mean that you can't > >> >> experiment with the alternative implementation without working code. > >> > > >> > Indeed. > >> > > >> >> > >> >> That's true, but I am hoping that the difference between - say: > >> >> > >> >> a[0:2] = np.NA > >> >> > >> >> and > >> >> > >> >> a.mask[0:2] = False > >> >> > >> >> would be easy enough to imagine. > >> > > >> > It is in this case. I agree the explicit ``a.mask`` is clearer. This > is > >> > a > >> > quite specific point that could be improved in the current > >> > implementation. > >> > >> Thanks - this is helpful. > >> > >> > It doesn't require ripping everything out. > >> > >> Nathaniel wasn't proposing 'ripping everything out' - but backing off > >> until consensus has been reached. That's different. If you think > >> we should not do that, and you are interested, please say why. > >> Second - I was proposing that we do indeed keep the code in the > >> codebase but discuss adaptations that could achieve consensus. > >> > > > > I'm much opposed to ripping the current code out. > > You are repeating the loaded phrase 'ripping the current code out' and > thus making the discussion less sensible and more hostile. > > > It isn't like it is (known > > to be) buggy, nor has anyone made the case that it isn't a basis on which > > build other options. It also smacks of gratuitous violence committed by > > someone yet to make a positive contribution to the project. > > This is cheap, rude, and silly. All I can see from Nathaniel is a > reasonable, fair attempt to discuss the code. He proposed backing off > the code in good faith. You are emphatically, and, in my view > childishly, ignoring the substantial points he is making, and > asserting over and over that he deserves no hearing because he has not > contributed code. This is a terribly destructive way to work. If I > was a new developer reading this, I would conclude, that I had better > be damn careful which side I'm on, before I express my opinion, > otherwise I'm going to be made to feel like I don't exist by the other > people on the project. That is miserable, it is silly, and it's the > wrong way to do business. > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Oct 29 19:18:29 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 29 Oct 2011 17:18:29 -0600 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Sat, Oct 29, 2011 at 5:11 PM, Matthew Brett wrote: > Hi, > > On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris > wrote: > > > > > > On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett > > wrote: > >> > >> Hi, > >> > >> On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers > >> wrote: > >> > > >> > > >> > On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett > >> > > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett > >> >> > >> >> wrote: > >> >> > Hi, > >> >> > > >> >> > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers > >> >> > wrote: > >> >> >> > >> >> >> > >> >> >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett > >> >> >> > >> >> >> wrote: > >> >> >>> > >> >> >>> Hi, > >> >> >>> > >> >> >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers > >> >> >>> wrote: > >> >> >>> > > >> >> >>> > > >> >> >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett > >> >> >>> > > >> >> >>> > wrote: > >> >> >>> >> > >> >> >>> >> Hi, > >> >> >>> >> > >> >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers > >> >> >>> >> wrote: > >> >> >>> >> > > >> >> >>> >> > > >> >> >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett > >> >> >>> >> > > >> >> >>> >> > wrote: > >> >> >>> >> >> > >> >> >>> >> >> Hi, > >> >> >>> >> >> > >> >> >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > >> >> >>> >> >> wrote: > >> >> >>> >> >> >> > >> >> >>> >> >> > >> >> >>> >> >> No, that's not what Nathaniel and I are saying at all. > >> >> >>> >> >> Nathaniel > >> >> >>> >> >> was > >> >> >>> >> >> pointing to links for projects that care that everyone > agrees > >> >> >>> >> >> before > >> >> >>> >> >> they go ahead. > >> >> >>> >> > > >> >> >>> >> > It looked to me like there was a serious intent to come to > an > >> >> >>> >> > agreement, > >> >> >>> >> > or > >> >> >>> >> > at least closer together. The discussion in the summer was > >> >> >>> >> > going > >> >> >>> >> > around > >> >> >>> >> > in > >> >> >>> >> > circles though, and was too abstract and complex to follow. > >> >> >>> >> > Therefore > >> >> >>> >> > Mark's > >> >> >>> >> > choice of implementing something and then asking for > feedback > >> >> >>> >> > made > >> >> >>> >> > sense > >> >> >>> >> > to > >> >> >>> >> > me. > >> >> >>> >> > >> >> >>> >> I should point out that the implementation hasn't - as far as > I > >> >> >>> >> can > >> >> >>> >> see - changed the discussion. The discussion was about the > API. > >> >> >>> >> > >> >> >>> >> Implementations are useful for agreed APIs because they can > >> >> >>> >> point > >> >> >>> >> out > >> >> >>> >> where the API does not make sense or cannot be implemented. > In > >> >> >>> >> this > >> >> >>> >> case, the API Mark said he was going to implement - he did > >> >> >>> >> implement - > >> >> >>> >> at least as far as I can see. Again, I'm happy to be > corrected. > >> >> >>> > > >> >> >>> > Implementations can also help the discussion along, by allowing > >> >> >>> > people > >> >> >>> > to > >> >> >>> > try out some of the proposed changes. It also allows to > construct > >> >> >>> > examples > >> >> >>> > that show weaknesses, possibly to be solved by an alternative > >> >> >>> > API. > >> >> >>> > Maybe > >> >> >>> > you > >> >> >>> > can hold the complete history of this topic in your head and > >> >> >>> > comprehend > >> >> >>> > it, > >> >> >>> > but for me it would be very helpful if someone said: > >> >> >>> > - here's my dataset > >> >> >>> > - this is what I want to do with it > >> >> >>> > - this is the best I can do with the current implementation > >> >> >>> > - here's how API X would allow me to solve this better or > simpler > >> >> >>> > This can be done much better with actual data and an actual > >> >> >>> > implementation > >> >> >>> > than with a design proposal. You seem to disagree with this > >> >> >>> > statement. > >> >> >>> > That's fine. I would hope though that you recognize that > concrete > >> >> >>> > examples > >> >> >>> > help people like me, and construct one or two to help us out. > >> >> >>> That's what use-cases are for in designing APIs. There are > >> >> >>> examples > >> >> >>> of use in the NEP: > >> >> >>> > >> >> >>> > >> >> >>> > https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst > >> >> >>> > >> >> >>> the alterNEP: > >> >> >>> > >> >> >>> https://gist.github.com/1056379 > >> >> >>> > >> >> >>> and my longer email to Travis: > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored > >> >> >>> > >> >> >>> Mark has done a nice job of documentation: > >> >> >>> > >> >> >>> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html > >> >> >>> > >> >> >>> If you want to understand what the alterNEP case is, I'd suggest > >> >> >>> the > >> >> >>> email, just because it's the most recent and I think the > >> >> >>> terminology > >> >> >>> is slightly clearer. > >> >> >>> > >> >> >>> Doing the same examples on a larger array won't make the point > >> >> >>> easier > >> >> >>> to understand. The discussion is about what the right concepts > >> >> >>> are, > >> >> >>> and you can help by looking at the snippets of code in those > >> >> >>> documents, and deciding for yourself whether you think the > current > >> >> >>> masking / NA implementation seems natural and easy to explain, or > >> >> >>> rather forced and difficult to explain, and then email back > trying > >> >> >>> to > >> >> >>> explain your impression (which is not always easy). > >> >> >> > >> >> >> If you seriously believe that looking at a few snippets is as > >> >> >> helpful > >> >> >> and > >> >> >> instructive as being able to play around with them in IPython and > >> >> >> modify > >> >> >> them, then I guess we won't make progress in this part of the > >> >> >> discussion. > >> >> >> You're just telling me to go back and re-read things I'd already > >> >> >> read. > >> >> > > >> >> > The snippets are in ipython or doctest format - aren't they? > >> >> > >> >> Oops - 10 minute rule. Now I see that you mean that you can't > >> >> experiment with the alternative implementation without working code. > >> > > >> > Indeed. > >> > > >> >> > >> >> That's true, but I am hoping that the difference between - say: > >> >> > >> >> a[0:2] = np.NA > >> >> > >> >> and > >> >> > >> >> a.mask[0:2] = False > >> >> > >> >> would be easy enough to imagine. > >> > > >> > It is in this case. I agree the explicit ``a.mask`` is clearer. This > is > >> > a > >> > quite specific point that could be improved in the current > >> > implementation. > >> > >> Thanks - this is helpful. > >> > >> > It doesn't require ripping everything out. > >> > >> Nathaniel wasn't proposing 'ripping everything out' - but backing off > >> until consensus has been reached. That's different. If you think > >> we should not do that, and you are interested, please say why. > >> Second - I was proposing that we do indeed keep the code in the > >> codebase but discuss adaptations that could achieve consensus. > >> > > > > I'm much opposed to ripping the current code out. > > You are repeating the loaded phrase 'ripping the current code out' and > thus making the discussion less sensible and more hostile. > > > It isn't like it is (known > > to be) buggy, nor has anyone made the case that it isn't a basis on which > > build other options. It also smacks of gratuitous violence committed by > > someone yet to make a positive contribution to the project. > > This is cheap, rude, and silly. All I can see from Nathaniel is a > reasonable, fair attempt to discuss the code. He proposed backing off > the code in good faith. You are emphatically, and, in my view > childishly, ignoring the substantial points he is making, and > asserting over and over that he deserves no hearing because he has not > contributed code. Sorry Matthew, but Nathaniel's interaction comes across to me as arrogant, and your constant use of terms like childish, destructive to the community, etc. come across as manipulative. I can live with the words, but you aren't doing much to get this developer on your side. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Sat Oct 29 19:24:11 2011 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 29 Oct 2011 18:24:11 -0500 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Saturday, October 29, 2011, Matthew Brett wrote: > Hi, > > On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris > wrote: >> >> >> On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett >> wrote: >>> >>> Hi, >>> >>> On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers >>> wrote: >>> > >>> > >>> > On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett >>> > >>> > wrote: >>> >> >>> >> Hi, >>> >> >>> >> On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett >>> >> >>> >> wrote: >>> >> > Hi, >>> >> > >>> >> > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers >>> >> > wrote: >>> >> >> >>> >> >> >>> >> >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett >>> >> >> >>> >> >> wrote: >>> >> >>> >>> >> >>> Hi, >>> >> >>> >>> >> >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers >>> >> >>> wrote: >>> >> >>> > >>> >> >>> > >>> >> >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett >>> >> >>> > >>> >> >>> > wrote: >>> >> >>> >> >>> >> >>> >> Hi, >>> >> >>> >> >>> >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >>> >> >>> >> wrote: >>> >> >>> >> > >>> >> >>> >> > >>> >> >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >>> >> >>> >> > >>> >> >>> >> > wrote: >>> >> >>> >> >> >>> >> >>> >> >> Hi, >>> >> >>> >> >> >>> >> >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >>> >> >>> >> >> wrote: >>> >> >>> >> >> >> >>> >> >>> >> >> >>> >> >>> >> >> No, that's not what Nathaniel and I are saying at all. >>> >> >>> >> >> Nathaniel >>> >> >>> >> >> was >>> >> >>> >> >> pointing to links for projects that care that everyone agrees >>> >> >>> >> >> before >>> >> >>> >> >> they go ahead. >>> >> >>> >> > >>> >> >>> >> > It looked to me like there was a serious intent to come to an >>> >> >>> >> > agreement, >>> >> >>> >> > or >>> >> >>> >> > at least closer together. The discussion in the summer was >>> >> >>> >> > going >>> >> >>> >> > around >>> >> >>> >> > in >>> >> >>> >> > circles though, and was too abstract and complex to follow. >>> You are repeating the loaded phrase 'ripping the current code out' and > thus making the discussion less sensible and more hostile. > >> It isn't like it is (known >> to be) buggy, nor has anyone made the case that it isn't a basis on which >> build other options. It also smacks of gratuitous violence committed by >> someone yet to make a positive contribution to the project. > > This is cheap, rude, and silly. All I can see from Nathaniel is a > reasonable, fair attempt to discuss the code. He proposed backing off > the code in good faith. You are emphatically, and, in my view > childishly, ignoring the substantial points he is making, and > asserting over and over that he deserves no hearing because he has not > contributed code. This is a terribly destructive way to work. If I > was a new developer reading this, I would conclude, that I had better > be damn careful which side I'm on, before I express my opinion, > otherwise I'm going to be made to feel like I don't exist by the other > people on the project. That is miserable, it is silly, and it's the > wrong way to do business. > > Best, > > Matthew > /me blows whistle. Personal foul against defense! Personal foul against offense! Penalties offset! Repeat first down. 10 minute rule, please. Ben Root P.S. - as a bit of evidence against the idea that chuck doesnt consider opinions from non-contributors, I haven't felt ignored during this whole discussion, yet I don't think that anyone had an expectation of me to produce code. However, to have an expectation to produce code for counter-proposals might be a bit unfair because the ones offering counter proposal may not have the resources available, like we did with mark. -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Oct 29 19:24:25 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 29 Oct 2011 16:24:25 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Sat, Oct 29, 2011 at 4:18 PM, Charles R Harris wrote: > > > On Sat, Oct 29, 2011 at 5:11 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris >> wrote: >> > >> > >> > On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett >> > wrote: >> >> >> >> Hi, >> >> >> >> On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers >> >> wrote: >> >> > >> >> > >> >> > On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett >> >> > >> >> > wrote: >> >> >> >> >> >> Hi, >> >> >> >> >> >> On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett >> >> >> >> >> >> wrote: >> >> >> > Hi, >> >> >> > >> >> >> > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers >> >> >> > wrote: >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett >> >> >> >> >> >> >> >> wrote: >> >> >> >>> >> >> >> >>> Hi, >> >> >> >>> >> >> >> >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers >> >> >> >>> wrote: >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett >> >> >> >>> > >> >> >> >>> > wrote: >> >> >> >>> >> >> >> >> >>> >> Hi, >> >> >> >>> >> >> >> >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >> >> >> >>> >> wrote: >> >> >> >>> >> > >> >> >> >>> >> > >> >> >> >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >> >> >> >>> >> > >> >> >> >>> >> > wrote: >> >> >> >>> >> >> >> >> >> >>> >> >> Hi, >> >> >> >>> >> >> >> >> >> >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> >> >> >>> >> >> wrote: >> >> >> >>> >> >> >> >> >> >> >>> >> >> >> >> >> >>> >> >> No, that's not what Nathaniel and I are saying at all. >> >> >> >>> >> >> Nathaniel >> >> >> >>> >> >> was >> >> >> >>> >> >> pointing to links for projects that care that everyone >> >> >> >>> >> >> agrees >> >> >> >>> >> >> before >> >> >> >>> >> >> they go ahead. >> >> >> >>> >> > >> >> >> >>> >> > It looked to me like there was a serious intent to come to >> >> >> >>> >> > an >> >> >> >>> >> > agreement, >> >> >> >>> >> > or >> >> >> >>> >> > at least closer together. The discussion in the summer was >> >> >> >>> >> > going >> >> >> >>> >> > around >> >> >> >>> >> > in >> >> >> >>> >> > circles though, and was too abstract and complex to follow. >> >> >> >>> >> > Therefore >> >> >> >>> >> > Mark's >> >> >> >>> >> > choice of implementing something and then asking for >> >> >> >>> >> > feedback >> >> >> >>> >> > made >> >> >> >>> >> > sense >> >> >> >>> >> > to >> >> >> >>> >> > me. >> >> >> >>> >> >> >> >> >>> >> I should point out that the implementation hasn't - as far as >> >> >> >>> >> I >> >> >> >>> >> can >> >> >> >>> >> see - changed the discussion. ?The discussion was about the >> >> >> >>> >> API. >> >> >> >>> >> >> >> >> >>> >> Implementations are useful for agreed APIs because they can >> >> >> >>> >> point >> >> >> >>> >> out >> >> >> >>> >> where the API does not make sense or cannot be implemented. >> >> >> >>> >> ?In >> >> >> >>> >> this >> >> >> >>> >> case, the API Mark said he was going to implement - he did >> >> >> >>> >> implement - >> >> >> >>> >> at least as far as I can see. ?Again, I'm happy to be >> >> >> >>> >> corrected. >> >> >> >>> > >> >> >> >>> > Implementations can also help the discussion along, by >> >> >> >>> > allowing >> >> >> >>> > people >> >> >> >>> > to >> >> >> >>> > try out some of the proposed changes. It also allows to >> >> >> >>> > construct >> >> >> >>> > examples >> >> >> >>> > that show weaknesses, possibly to be solved by an alternative >> >> >> >>> > API. >> >> >> >>> > Maybe >> >> >> >>> > you >> >> >> >>> > can hold the complete history of this topic in your head and >> >> >> >>> > comprehend >> >> >> >>> > it, >> >> >> >>> > but for me it would be very helpful if someone said: >> >> >> >>> > - here's my dataset >> >> >> >>> > - this is what I want to do with it >> >> >> >>> > - this is the best I can do with the current implementation >> >> >> >>> > - here's how API X would allow me to solve this better or >> >> >> >>> > simpler >> >> >> >>> > This can be done much better with actual data and an actual >> >> >> >>> > implementation >> >> >> >>> > than with a design proposal. You seem to disagree with this >> >> >> >>> > statement. >> >> >> >>> > That's fine. I would hope though that you recognize that >> >> >> >>> > concrete >> >> >> >>> > examples >> >> >> >>> > help people like me, and construct one or two to help us out. >> >> >> >>> That's what use-cases are for in designing APIs. ?There are >> >> >> >>> examples >> >> >> >>> of use in the NEP: >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst >> >> >> >>> >> >> >> >>> the alterNEP: >> >> >> >>> >> >> >> >>> https://gist.github.com/1056379 >> >> >> >>> >> >> >> >>> and my longer email to Travis: >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored >> >> >> >>> >> >> >> >>> Mark has done a nice job of documentation: >> >> >> >>> >> >> >> >>> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html >> >> >> >>> >> >> >> >>> If you want to understand what the alterNEP case is, I'd suggest >> >> >> >>> the >> >> >> >>> email, just because it's the most recent and I think the >> >> >> >>> terminology >> >> >> >>> is slightly clearer. >> >> >> >>> >> >> >> >>> Doing the same examples on a larger array won't make the point >> >> >> >>> easier >> >> >> >>> to understand. ?The discussion is about what the right concepts >> >> >> >>> are, >> >> >> >>> and you can help by looking at the snippets of code in those >> >> >> >>> documents, and deciding for yourself whether you think the >> >> >> >>> current >> >> >> >>> masking / NA implementation seems natural and easy to explain, >> >> >> >>> or >> >> >> >>> rather forced and difficult to explain, and then email back >> >> >> >>> trying >> >> >> >>> to >> >> >> >>> explain your impression (which is not always easy). >> >> >> >> >> >> >> >> If you seriously believe that looking at a few snippets is as >> >> >> >> helpful >> >> >> >> and >> >> >> >> instructive as being able to play around with them in IPython and >> >> >> >> modify >> >> >> >> them, then I guess we won't make progress in this part of the >> >> >> >> discussion. >> >> >> >> You're just telling me to go back and re-read things I'd already >> >> >> >> read. >> >> >> > >> >> >> > The snippets are in ipython or doctest format - aren't they? >> >> >> >> >> >> Oops - 10 minute rule. ?Now I see that you mean that you can't >> >> >> experiment with the alternative implementation without working code. >> >> > >> >> > Indeed. >> >> > >> >> >> >> >> >> That's true, but I am hoping that the difference between - say: >> >> >> >> >> >> a[0:2] = np.NA >> >> >> >> >> >> and >> >> >> >> >> >> a.mask[0:2] = False >> >> >> >> >> >> would be easy enough to imagine. >> >> > >> >> > It is in this case. I agree the explicit ``a.mask`` is clearer. This >> >> > is >> >> > a >> >> > quite specific point that could be improved in the current >> >> > implementation. >> >> >> >> Thanks - this is helpful. >> >> >> >> > It doesn't require ripping everything out. >> >> >> >> Nathaniel wasn't proposing 'ripping everything out' - but backing off >> >> until consensus has been reached. ?That's different. ? ?If you think >> >> we should not do that, and you are interested, please say why. >> >> Second - I was proposing that we do indeed keep the code in the >> >> codebase but discuss adaptations that could achieve consensus. >> >> >> > >> > I'm much opposed to ripping the current code out. >> >> You are repeating the loaded phrase 'ripping the current code out' and >> thus making the discussion less sensible and more hostile. >> >> > ?It isn't like it is (known >> > to be) buggy, nor has anyone made the case that it isn't a basis on >> > which >> > build other options. It also smacks of gratuitous violence committed by >> > someone yet to make a positive contribution to the project. >> >> This is cheap, rude, and silly. ?All I can see from Nathaniel is a >> reasonable, fair attempt to discuss the code. ?He proposed backing off >> the code in good faith. ? You are emphatically, and, in my view >> childishly, ignoring the substantial points he is making, and >> asserting over and over that he deserves no hearing because he has not >> contributed code. > > Sorry Matthew, but Nathaniel's interaction comes across to me as arrogant, > and your constant use of terms like childish, destructive to the community, > etc. come across as manipulative. I don't know what 'manipulative' means here. Can you explain? > I can live with the words, but you aren't > doing much to get this developer on your side. No, I am not trying to get you on my side because I don't believe in sides, and, unless you tell me otherwise, I think you believe in a implicit model of decision making that is bad for numpy. I will willingly and enthusiastically buy you a drink at scipy - but I believe you are wrong in the way that you have approached this discussion, and I believe the model that you are using, and it's opposite - are of central importance to the health of our shared discussions in the future. Best, Matthew From matthew.brett at gmail.com Sat Oct 29 19:27:50 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 29 Oct 2011 16:27:50 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Sat, Oct 29, 2011 at 4:24 PM, Benjamin Root wrote: > > > On Saturday, October 29, 2011, Matthew Brett > wrote: >> Hi, >> >> On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris >> wrote: >>> >>> >>> On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett >>> wrote: >>>> >>>> Hi, >>>> >>>> On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers >>>> wrote: >>>> > >>>> > >>>> > On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett >>>> > >>>> > wrote: >>>> >> >>>> >> Hi, >>>> >> >>>> >> On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett >>>> >> >>>> >> wrote: >>>> >> > Hi, >>>> >> > >>>> >> > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers >>>> >> > wrote: >>>> >> >> >>>> >> >> >>>> >> >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett >>>> >> >> >>>> >> >> wrote: >>>> >> >>> >>>> >> >>> Hi, >>>> >> >>> >>>> >> >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers >>>> >> >>> wrote: >>>> >> >>> > >>>> >> >>> > >>>> >> >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett >>>> >> >>> > >>>> >> >>> > wrote: >>>> >> >>> >> >>>> >> >>> >> Hi, >>>> >> >>> >> >>>> >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >>>> >> >>> >> wrote: >>>> >> >>> >> > >>>> >> >>> >> > >>>> >> >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >>>> >> >>> >> > >>>> >> >>> >> > wrote: >>>> >> >>> >> >> >>>> >> >>> >> >> Hi, >>>> >> >>> >> >> >>>> >> >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >>>> >> >>> >> >> wrote: >>>> >> >>> >> >> >> >>>> >> >>> >> >> >>>> >> >>> >> >> No, that's not what Nathaniel and I are saying at all. >>>> >> >>> >> >> Nathaniel >>>> >> >>> >> >> was >>>> >> >>> >> >> pointing to links for projects that care that everyone >>>> >> >>> >> >> agrees >>>> >> >>> >> >> before >>>> >> >>> >> >> they go ahead. >>>> >> >>> >> > >>>> >> >>> >> > It looked to me like there was a serious intent to come to >>>> >> >>> >> > an >>>> >> >>> >> > agreement, >>>> >> >>> >> > or >>>> >> >>> >> > at least closer together. The discussion in the summer was >>>> >> >>> >> > going >>>> >> >>> >> > around >>>> >> >>> >> > in >>>> >> >>> >> > circles though, and was too abstract and complex to follow. >>>> You are repeating the loaded phrase 'ripping the current code out' and >> thus making the discussion less sensible and more hostile. >> >>> ?It isn't like it is (known >>> to be) buggy, nor has anyone made the case that it isn't a basis on which >>> build other options. It also smacks of gratuitous violence committed by >>> someone yet to make a positive contribution to the project. >> >> This is cheap, rude, and silly. ?All I can see from Nathaniel is a >> reasonable, fair attempt to discuss the code. ?He proposed backing off >> the code in good faith. ? You are emphatically, and, in my view >> childishly, ignoring the substantial points he is making, and >> asserting over and over that he deserves no hearing because he has not >> contributed code. ? This is a terribly destructive way to work. ?If I >> was a new developer reading this, I would conclude, that I had better >> be damn careful which side I'm on, before I express my opinion, >> otherwise I'm going to be made to feel like I don't exist by the other >> people on the project. ?That is miserable, it is silly, and it's the >> wrong way to do business. >> >> Best, >> >> Matthew >> > > /me blows whistle. Personal foul against defense! Personal foul against > offense! Penalties offset! Repeat first down. Is that right? I think I'm calling Charles on giving Nathaniel the silent treatment. Am I wrong to do that? Is that not true? See you, Matthew From hangenuit at gmail.com Sat Oct 29 19:28:54 2011 From: hangenuit at gmail.com (Han Genuit) Date: Sun, 30 Oct 2011 01:28:54 +0200 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: To be honest, you have been slandering a lot, also in previous discussions, to get what you wanted. This is not a healthy way of discussion, nor does it help in any way. There have been many people willing to listen and agree with you on points; and this is exactly what discussion is all about, but where they might agree on some, they might disagree on others. When you start pulling the - people who won't listen to me are evil - card, it might have some effect the first time, but the second and third time they see what's coming.. o/ From matthew.brett at gmail.com Sat Oct 29 19:30:46 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 29 Oct 2011 16:30:46 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Sat, Oct 29, 2011 at 4:28 PM, Han Genuit wrote: > To be honest, you have been slandering a lot, also in previous > discussions, to get what you wanted. This is not a healthy way of > discussion, nor does it help in any way. That's a severe accusation. Please quote something I said that was false, or unfair. See you, Matthew From matthew.brett at gmail.com Sat Oct 29 20:20:31 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 29 Oct 2011 17:20:31 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Sat, Oct 29, 2011 at 11:14 AM, Wes McKinney wrote: > On Fri, Oct 28, 2011 at 9:32 PM, Charles R Harris > wrote: >> >> >> On Fri, Oct 28, 2011 at 6:45 PM, Wes McKinney wrote: >>> >>> On Fri, Oct 28, 2011 at 7:53 PM, Benjamin Root wrote: >>> > >>> > >>> > On Friday, October 28, 2011, Matthew Brett >>> > wrote: >>> >> Hi, >>> >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >>> >> wrote: >>> >>> >>> >>> >>> >>> On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >>> >>> >>> >>> wrote: >>> >>>> >>> >>>> Hi, >>> >>>> >>> >>>> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >>> >>>> wrote: >>> >>>> > >>> >>>> > >>> >>>> > On Fri, Oct 28, 2011 at 3:56 PM, Matthew Brett >>> >>>> > >>> >>>> > wrote: >>> >>>> >> >>> >>>> >> Hi, >>> >>>> >> >>> >>>> >> On Fri, Oct 28, 2011 at 2:43 PM, Matthew Brett >>> >>>> >> >>> >>>> >> wrote: >>> >>>> >> > Hi, >>> >>>> >> > >>> >>>> >> > On Fri, Oct 28, 2011 at 2:41 PM, Charles R Harris >>> >>>> >> > wrote: >>> >>>> >> >> >>> >>>> >> >> >>> >>>> >> >> On Fri, Oct 28, 2011 at 3:16 PM, Nathaniel Smith >>> >>>> >> >> >>> >>>> >> >> wrote: >>> >>>> >> >>> >>> >>>> >> >>> On Tue, Oct 25, 2011 at 2:56 PM, Travis Oliphant >>> >>>> >> >>> >>> >>>> >> >>> wrote: >>> >>>> >> >>> > I think Nathaniel and Matthew provided very >>> >>>> >> >>> > specific feedback that was helpful in understanding other >>> >>>> >> >>> > perspectives >>> >>>> >> >>> > of a >>> >>>> >> >>> > difficult problem. ? ? In particular, I really wanted >>> >>>> >> >>> > bit-patterns >>> >>>> >> >>> > implemented. ? ?However, I also understand that Mark did >>> >>>> >> >>> > quite >>> >>>> >> >>> > a >>> >>>> >> >>> > bit >>> >>>> >> >>> > of >>> >>>> >> >>> > work >>> >>>> >> >>> > and altered his original designs quite a bit in response to >>> >>>> >> >>> > community >>> >>>> >> >>> > feedback. ? I wasn't a major part of the pull request >>> >>>> >> >>> > discussion, >>> >>>> >> >>> > nor >>> >>>> >> >>> > did I >>> >>>> >> >>> > merge the changes, but I support Charles if he reviewed the >>> >>>> >> >>> > code >>> >>>> >> >>> > and >>> >>>> >> >>> > felt >>> >>>> >> >>> > like it was the right thing to do. ?I likely would have done >>> >>>> >> >>> > the >>> >>>> >> >>> > same >>> >>>> >> >>> > thing >>> >>>> >> >>> > rather than let Mark Wiebe's work languish. >>> >>>> >> >>> >>> >>>> >> >>> My connectivity is spotty this week, so I'll stay out of the >>> >>>> >> >>> technical >>> >>>> >> >>> discussion for now, but I want to share a story. >>> >>>> >> >>> >>> >>>> >> >>> Maybe a year ago now, Jonathan Taylor and I were debating what >>> >>>> >> >>> the >>> >>>> >> >>> best API for describing statistical models would be -- whether >>> >>>> >> >>> we >>> >>>> >> >>> wanted something like R's "formulas" (which I supported), or >>> >>>> >> >>> another >>> >>>> >> >>> approach based on sympy (his idea). To summarize, I thought >>> >>>> >> >>> his >>> >>>> >> >>> API >>> >>>> >> >>> was confusing, pointlessly complicated, and didn't actually >>> >>>> >> >>> solve >>> >>>> >> >>> the >>> >>>> >> >>> problem; he thought R-style formulas were superficially >>> >>>> >> >>> simpler >>> >>>> >> >>> but >>> >>>> >> >>> hopelessly confused and inconsistent underneath. Now, >>> >>>> >> >>> obviously, >>> >>>> >> >>> I >>> >>>> >> >>> was >>> >>>> >> >>> right and he was wrong. Well, obvious to me, anyway... ;-) But >>> >>>> >> >>> it >>> >>>> >> >>> wasn't like I could just wave a wand and make his arguments go >>> >>>> >> >>> away, >>> >>>> >> >>> no I should point out that the implementation hasn't - as far >>> >>>> >> >>> as >>> >>>> >> >>> I can >>> >> see - changed the discussion. ?The discussion was about the API. >>> >> Implementations are useful for agreed APIs because they can point out >>> >> where the API does not make sense or cannot be implemented. ?In this >>> >> case, the API Mark said he was going to implement - he did implement - >>> >> at least as far as I can see. ?Again, I'm happy to be corrected. >>> >> >>> >>>> In saying that we are insisting on our way, you are saying, >>> >>>> implicitly, >>> >>>> 'I >>> >>>> am not going to negotiate'. >>> >>> >>> >>> That is only your interpretation. The observation that Mark >>> >>> compromised >>> >>> quite a bit while you didn't seems largely correct to me. >>> >> >>> >> The problem here stems from our inability to work towards agreement, >>> >> rather than standing on set positions. ?I set out what changes I think >>> >> would make the current implementation OK. ?Can we please, please have >>> >> a discussion about those points instead of trying to argue about who >>> >> has given more ground. >>> >> >>> >>> That commitment would of course be good. However, even if that were >>> >>> possible >>> >>> before writing code and everyone agreed that the ideas of you and >>> >>> Nathaniel >>> >>> should be implemented in full, it's still not clear that either of you >>> >>> would >>> >>> be willing to write any code. Agreement without code still doesn't >>> >>> help >>> >>> us >>> >>> very much. >>> >> >>> >> I'm going to return to Nathaniel's point - it is a highly valuable >>> >> thing to set ourselves the target of resolving substantial discussions >>> >> by consensus. ? The route you are endorsing here is 'implementor >>> >> wins'. ? We don't need to do it that way. ?We're a mature sensible >>> >> bunch of adults who can talk out the issues until we agree they are >>> >> ready for implementation, and then implement. ?That's all Nathaniel is >>> >> saying. ?I think he's obviously right, and I'm sad that it isn't as >>> >> clear to y'all as it is to me. >>> >> >>> >> Best, >>> >> >>> >> Matthew >>> >> >>> > >>> > Everyone, can we please not do this?! I had enough of adults doing >>> > finger >>> > pointing back over the summer during the whole debt ceiling debate. ?I >>> > think >>> > we can all agree that we are better than the US congress? >>> > >>> > Forget about rudeness or decision processes. >>> > >>> > I will start by saying that I am willing to separate ignore and absent, >>> > but >>> > only on the write side of things. ?On read, I want a single way to >>> > identify >>> > the missing values. ?I also want only a single way to perform >>> > calculations >>> > (either skip or propagate). >>> > >>> > An indicator of success would be that people stop using NaNs and magic >>> > numbers (-9999, anyone?) and we could even deprecate nansum(), or at >>> > least >>> > strongly suggest in its docs to use NA. >>> >>> Well, I haven't completely made up my mind yet, will have to do some >>> more prototyping and playing (and potentially have some of my users >>> eat the differently-flavored dogfood), but I'm really not very >>> satisfied with the API at the moment. I'm mainly worried about the >>> abstraction leaking through to pandas users (this is a pretty large >>> group of people judging by # of downloads). >>> >>> The basic position I'm in is that I'm trying to push Python into a new >>> space, namely mainstream data analysis and statistical computing, one >>> that is solidly occupied by R and other such well-known players. My >>> target users are not computer scientists. They are not going to invest >>> in understanding dtypes very deeply or the internals of ndarray. In >>> fact I've spent a great deal of effort making it so that pandas users >>> can be productive and successful while having very little >>> understanding of NumPy. Yes, I essentially "protect" my users from >>> NumPy because using it well requires a certain level of sophistication >>> that I think is unfair to demand of people. This might seem totally >>> bizarre to some of you but it is simply the state of affairs. So far I >>> have been successful because more people are using Python and pandas >>> to do things that they used to do in R. The NA concept in R is dead >>> simple and I don't see why we are incapable of also implementing >>> something that is just as dead simple. To we, the scipy elite let's >>> call us, it seems simple: "oh, just pass an extra flag to all my array >>> constructors!" But this along with the masked array concept is going >>> to have two likely outcomes: >>> >>> 1) Create a great deal more complication in my already very large codebase >>> >>> and/or >>> >>> 2) force pandas users to understand the new masked arrays after I've >>> carefully made it so they can be largely ignorant of NumPy >>> >>> The mostly-NaN-based solution I've cobbled together and tweaked over >>> the last 42 months actually *works really well*, amazingly, with >>> relatively little cost in code complexity. Having found a reasonably >>> stable equilibrium I'm extremely resistant to upset the balance. >>> >>> So I don't know. After watching these threads bounce back and forth >>> I'm frankly not all that hopeful about a solution arising that >>> actually addresses my needs. >> >> But Wes, what *are* your needs? You keep saying this, but we need examples >> of how you want to operate and how numpy fails. As to dtypes, internals, and >> all that, I don't see any of that in the current implementation, unless you >> mean the maskna and skipna keywords. I believe someone on the previous >> thread mentioned a way to deal with that. >> >> Chuck >> > > Here are my needs: > > 1) How NAs are implemented cannot be end user visible. Having to pass > maskna=True is a problem. I suppose a solution is to set the flag to > true on every array inside of pandas so the user never knows (you > mentioned someone else had some other solution, i could go back and > dig it up?) I guess this would be the same with bitpatterns, in that the user would have to specify a custom dtype. Is it possible to add a bitpattern NA (in the NaN values) to the current floating point types, at least in principle? So that np.float etc would have bitpattern NAs without a custom dtype? See you, Matthew From efiring at hawaii.edu Sat Oct 29 20:24:06 2011 From: efiring at hawaii.edu (Eric Firing) Date: Sat, 29 Oct 2011 14:24:06 -1000 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: <4EAC8282.9070708@hawaii.edu> Message-ID: <4EAC9926.7020704@hawaii.edu> On 10/29/2011 12:57 PM, Charles R Harris wrote: > > > On Sat, Oct 29, 2011 at 4:47 PM, Eric Firing > wrote: > > On 10/29/2011 12:02 PM, Olivier Delalleau wrote: > > > > > I haven't been following the discussion closely, but wouldn't it > be instead: > > a.mask[0:2] = True? > > That would be consistent with numpy.ma and the > opposite of Mark's > implementation. > > I can live with either, but I much prefer the numpy.ma > version because > it fits with the use of bit-flags for editing data; set bit 1 if it > fails check A, set bit 2 if it fails check B, etc. So, if it evaluates > as True, there is a problem, and the value is masked *out*. > > Similarly, in Marks implementation, 7 bits are available for a payload > to describe what kind of masking is meant. This seems more consistent > with True as masked (or NA) than with False as masked. > > > I wouldn't rely on the 7 bits yet. Mark left them available to keep open > possible future use, but didn't implement anything using them yet. If > memory use turns out to exclude whole sectors of application we will > have to go to bit masks. Right; I was only commenting on a subjective sense of internal consistency. A minor point. The larger context of all this is how users end up being able to work with all the different types and specifications of "NA" (in the most general sense) data: 1) nans 2) numpy.ma 3) masks in the core (Mark's new code) 4) bit patterns Substantial code now in place--including matplotlib--relies on numpy.ma. It has some rough edges, it can be slow, it is a pain having it as a bolted-on module, it may be more complicated than it needs to be, but it fits a lot of use cases pretty well. There are many users. Everyone using matplotlib is using it, whether they know it or not. The ideal from my numpy.ma-user's standpoint would an NA-handling implementation in the core that would do two things: (1) allow a gradual transition away from numpy.ma, so that the latter would become redundant. (2) allow numpy.ma to be reasonably easily modified to use the in-core facilities for greater efficiency during the long transition. Implicit is the hope that someone (most likely not me, although I might be able to help a bit) would actually perform this modification. Mark's mission, paid for by Enthought, was not to please numpy.ma users, but to add NA-handling that would be comfortable for R-users. He chose to do so with the idea that two possible implementations (masks and bitpatterns) were desirable, each with strengths and weaknesses, and that so as to get *something* done in the very short time he had left, he would start with the mask implementation. We now have the result, incomplete, but not breaking anything. Additional development (coding as well as designing) will be needed. The main question raised by Matthew and Nathaniel is, I think, whether Mark's code should develop in a direction away from the R-compatibility model, with the idea that the latter would be handled via a bit-pattern implementation, some day, when someone codes it; or whether it should remain as the prototype and first implementation of an API to handle the R-compatible use case, minimizing any divergence from any eventual bit-pattern implementation. The answer to this depends on several questions, including: 1) Who is available to do how much implementation of any of the possibilities? My reading of Travis's blog and rare posts to this list suggest that he hopes and expects to be able to free up coding time. Perhaps he will clarify that soon. 2) What sorts of changes would actually be needed to make the present implementation good enough for the R use case? Evolutionary, or revolutionary? 3) What sorts of changes would help with the numpy.ma use case? Evolutionary, or revolutionary. 4) Given available resources, how can we maximize progress: making numpy more capable, easier to use, etc. Unless the answers to questions 2 *and* 3 are "revolutionary", I don't see the point in pulling Mark's changes out of master. At most, the documentation might be changed to mark the NA API as "experimental" for a release or two. Overall, I think that the differences between the R use case and the ma use case have been overstated and over-emphasized. Eric > > Chuck > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Sat Oct 29 21:47:12 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 29 Oct 2011 18:47:12 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Sat, Oct 29, 2011 at 4:11 PM, Matthew Brett wrote: > Hi, > > On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris > wrote: >> >> >> On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett >> wrote: >>> >>> Hi, >>> >>> On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers >>> wrote: >>> > >>> > >>> > On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett >>> > >>> > wrote: >>> >> >>> >> Hi, >>> >> >>> >> On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett >>> >> >>> >> wrote: >>> >> > Hi, >>> >> > >>> >> > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers >>> >> > wrote: >>> >> >> >>> >> >> >>> >> >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett >>> >> >> >>> >> >> wrote: >>> >> >>> >>> >> >>> Hi, >>> >> >>> >>> >> >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers >>> >> >>> wrote: >>> >> >>> > >>> >> >>> > >>> >> >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett >>> >> >>> > >>> >> >>> > wrote: >>> >> >>> >> >>> >> >>> >> Hi, >>> >> >>> >> >>> >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >>> >> >>> >> wrote: >>> >> >>> >> > >>> >> >>> >> > >>> >> >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >>> >> >>> >> > >>> >> >>> >> > wrote: >>> >> >>> >> >> >>> >> >>> >> >> Hi, >>> >> >>> >> >> >>> >> >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >>> >> >>> >> >> wrote: >>> >> >>> >> >> >> >>> >> >>> >> >> >>> >> >>> >> >> No, that's not what Nathaniel and I are saying at all. >>> >> >>> >> >> Nathaniel >>> >> >>> >> >> was >>> >> >>> >> >> pointing to links for projects that care that everyone agrees >>> >> >>> >> >> before >>> >> >>> >> >> they go ahead. >>> >> >>> >> > >>> >> >>> >> > It looked to me like there was a serious intent to come to an >>> >> >>> >> > agreement, >>> >> >>> >> > or >>> >> >>> >> > at least closer together. The discussion in the summer was >>> >> >>> >> > going >>> >> >>> >> > around >>> >> >>> >> > in >>> >> >>> >> > circles though, and was too abstract and complex to follow. >>> >> >>> >> > Therefore >>> >> >>> >> > Mark's >>> >> >>> >> > choice of implementing something and then asking for feedback >>> >> >>> >> > made >>> >> >>> >> > sense >>> >> >>> >> > to >>> >> >>> >> > me. >>> >> >>> >> >>> >> >>> >> I should point out that the implementation hasn't - as far as I >>> >> >>> >> can >>> >> >>> >> see - changed the discussion. ?The discussion was about the API. >>> >> >>> >> >>> >> >>> >> Implementations are useful for agreed APIs because they can >>> >> >>> >> point >>> >> >>> >> out >>> >> >>> >> where the API does not make sense or cannot be implemented. ?In >>> >> >>> >> this >>> >> >>> >> case, the API Mark said he was going to implement - he did >>> >> >>> >> implement - >>> >> >>> >> at least as far as I can see. ?Again, I'm happy to be corrected. >>> >> >>> > >>> >> >>> > Implementations can also help the discussion along, by allowing >>> >> >>> > people >>> >> >>> > to >>> >> >>> > try out some of the proposed changes. It also allows to construct >>> >> >>> > examples >>> >> >>> > that show weaknesses, possibly to be solved by an alternative >>> >> >>> > API. >>> >> >>> > Maybe >>> >> >>> > you >>> >> >>> > can hold the complete history of this topic in your head and >>> >> >>> > comprehend >>> >> >>> > it, >>> >> >>> > but for me it would be very helpful if someone said: >>> >> >>> > - here's my dataset >>> >> >>> > - this is what I want to do with it >>> >> >>> > - this is the best I can do with the current implementation >>> >> >>> > - here's how API X would allow me to solve this better or simpler >>> >> >>> > This can be done much better with actual data and an actual >>> >> >>> > implementation >>> >> >>> > than with a design proposal. You seem to disagree with this >>> >> >>> > statement. >>> >> >>> > That's fine. I would hope though that you recognize that concrete >>> >> >>> > examples >>> >> >>> > help people like me, and construct one or two to help us out. >>> >> >>> That's what use-cases are for in designing APIs. ?There are >>> >> >>> examples >>> >> >>> of use in the NEP: >>> >> >>> >>> >> >>> >>> >> >>> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst >>> >> >>> >>> >> >>> the alterNEP: >>> >> >>> >>> >> >>> https://gist.github.com/1056379 >>> >> >>> >>> >> >>> and my longer email to Travis: >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored >>> >> >>> >>> >> >>> Mark has done a nice job of documentation: >>> >> >>> >>> >> >>> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html >>> >> >>> >>> >> >>> If you want to understand what the alterNEP case is, I'd suggest >>> >> >>> the >>> >> >>> email, just because it's the most recent and I think the >>> >> >>> terminology >>> >> >>> is slightly clearer. >>> >> >>> >>> >> >>> Doing the same examples on a larger array won't make the point >>> >> >>> easier >>> >> >>> to understand. ?The discussion is about what the right concepts >>> >> >>> are, >>> >> >>> and you can help by looking at the snippets of code in those >>> >> >>> documents, and deciding for yourself whether you think the current >>> >> >>> masking / NA implementation seems natural and easy to explain, or >>> >> >>> rather forced and difficult to explain, and then email back trying >>> >> >>> to >>> >> >>> explain your impression (which is not always easy). >>> >> >> >>> >> >> If you seriously believe that looking at a few snippets is as >>> >> >> helpful >>> >> >> and >>> >> >> instructive as being able to play around with them in IPython and >>> >> >> modify >>> >> >> them, then I guess we won't make progress in this part of the >>> >> >> discussion. >>> >> >> You're just telling me to go back and re-read things I'd already >>> >> >> read. >>> >> > >>> >> > The snippets are in ipython or doctest format - aren't they? >>> >> >>> >> Oops - 10 minute rule. ?Now I see that you mean that you can't >>> >> experiment with the alternative implementation without working code. >>> > >>> > Indeed. >>> > >>> >> >>> >> That's true, but I am hoping that the difference between - say: >>> >> >>> >> a[0:2] = np.NA >>> >> >>> >> and >>> >> >>> >> a.mask[0:2] = False >>> >> >>> >> would be easy enough to imagine. >>> > >>> > It is in this case. I agree the explicit ``a.mask`` is clearer. This is >>> > a >>> > quite specific point that could be improved in the current >>> > implementation. >>> >>> Thanks - this is helpful. >>> >>> > It doesn't require ripping everything out. >>> >>> Nathaniel wasn't proposing 'ripping everything out' - but backing off >>> until consensus has been reached. ?That's different. ? ?If you think >>> we should not do that, and you are interested, please say why. >>> Second - I was proposing that we do indeed keep the code in the >>> codebase but discuss adaptations that could achieve consensus. >>> >> >> I'm much opposed to ripping the current code out. > > You are repeating the loaded phrase 'ripping the current code out' and > thus making the discussion less sensible and more hostile. > >> ?It isn't like it is (known >> to be) buggy, nor has anyone made the case that it isn't a basis on which >> build other options. It also smacks of gratuitous violence committed by >> someone yet to make a positive contribution to the project. > > This is cheap, rude, and silly. ?All I can see from Nathaniel is a > reasonable, fair attempt to discuss the code. ?He proposed backing off > the code in good faith. ? You are emphatically, and, in my view > childishly, ignoring the substantial points he is making, and > asserting over and over that he deserves no hearing because he has not > contributed code. ? This is a terribly destructive way to work. ?If I > was a new developer reading this, I would conclude, that I had better > be damn careful which side I'm on, before I express my opinion, > otherwise I'm going to be made to feel like I don't exist by the other > people on the project. ?That is miserable, it is silly, and it's the > wrong way to do business. I conclude that it's bad to drink this much coffee in an afternoon, and that the next time I visit my friend's house, I'll take some decaf. Sorry Chuck - you're right - this was too personal. I do disagree with you, but I was rude here and I am sorry. I owe you an expensive drink, as per Ben's excellent suggestion. See you, Matthew From charlesr.harris at gmail.com Sat Oct 29 22:48:15 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 29 Oct 2011 20:48:15 -0600 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Sat, Oct 29, 2011 at 7:47 PM, Matthew Brett wrote: > Hi, > > On Sat, Oct 29, 2011 at 4:11 PM, Matthew Brett > wrote: > > Hi, > > > > On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris > > wrote: > >> > >> > >> On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett > > >> wrote: > >>> > >>> Hi, > >>> > >>> On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers > >>> wrote: > >>> > > >>> > > >>> > On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett > >>> > > >>> > wrote: > >>> >> > >>> >> Hi, > >>> >> > >>> >> On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett > >>> >> > >>> >> wrote: > >>> >> > Hi, > >>> >> > > >>> >> > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers > >>> >> > wrote: > >>> >> >> > >>> >> >> > >>> >> >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett > >>> >> >> > >>> >> >> wrote: > >>> >> >>> > >>> >> >>> Hi, > >>> >> >>> > >>> >> >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers > >>> >> >>> wrote: > >>> >> >>> > > >>> >> >>> > > >>> >> >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett > >>> >> >>> > > >>> >> >>> > wrote: > >>> >> >>> >> > >>> >> >>> >> Hi, > >>> >> >>> >> > >>> >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers > >>> >> >>> >> wrote: > >>> >> >>> >> > > >>> >> >>> >> > > >>> >> >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett > >>> >> >>> >> > > >>> >> >>> >> > wrote: > >>> >> >>> >> >> > >>> >> >>> >> >> Hi, > >>> >> >>> >> >> > >>> >> >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > >>> >> >>> >> >> wrote: > >>> >> >>> >> >> >> > >>> >> >>> >> >> > >>> >> >>> >> >> No, that's not what Nathaniel and I are saying at all. > >>> >> >>> >> >> Nathaniel > >>> >> >>> >> >> was > >>> >> >>> >> >> pointing to links for projects that care that everyone > agrees > >>> >> >>> >> >> before > >>> >> >>> >> >> they go ahead. > >>> >> >>> >> > > >>> >> >>> >> > It looked to me like there was a serious intent to come to > an > >>> >> >>> >> > agreement, > >>> >> >>> >> > or > >>> >> >>> >> > at least closer together. The discussion in the summer was > >>> >> >>> >> > going > >>> >> >>> >> > around > >>> >> >>> >> > in > >>> >> >>> >> > circles though, and was too abstract and complex to follow. > >>> >> >>> >> > Therefore > >>> >> >>> >> > Mark's > >>> >> >>> >> > choice of implementing something and then asking for > feedback > >>> >> >>> >> > made > >>> >> >>> >> > sense > >>> >> >>> >> > to > >>> >> >>> >> > me. > >>> >> >>> >> > >>> >> >>> >> I should point out that the implementation hasn't - as far as > I > >>> >> >>> >> can > >>> >> >>> >> see - changed the discussion. The discussion was about the > API. > >>> >> >>> >> > >>> >> >>> >> Implementations are useful for agreed APIs because they can > >>> >> >>> >> point > >>> >> >>> >> out > >>> >> >>> >> where the API does not make sense or cannot be implemented. > In > >>> >> >>> >> this > >>> >> >>> >> case, the API Mark said he was going to implement - he did > >>> >> >>> >> implement - > >>> >> >>> >> at least as far as I can see. Again, I'm happy to be > corrected. > >>> >> >>> > > >>> >> >>> > Implementations can also help the discussion along, by > allowing > >>> >> >>> > people > >>> >> >>> > to > >>> >> >>> > try out some of the proposed changes. It also allows to > construct > >>> >> >>> > examples > >>> >> >>> > that show weaknesses, possibly to be solved by an alternative > >>> >> >>> > API. > >>> >> >>> > Maybe > >>> >> >>> > you > >>> >> >>> > can hold the complete history of this topic in your head and > >>> >> >>> > comprehend > >>> >> >>> > it, > >>> >> >>> > but for me it would be very helpful if someone said: > >>> >> >>> > - here's my dataset > >>> >> >>> > - this is what I want to do with it > >>> >> >>> > - this is the best I can do with the current implementation > >>> >> >>> > - here's how API X would allow me to solve this better or > simpler > >>> >> >>> > This can be done much better with actual data and an actual > >>> >> >>> > implementation > >>> >> >>> > than with a design proposal. You seem to disagree with this > >>> >> >>> > statement. > >>> >> >>> > That's fine. I would hope though that you recognize that > concrete > >>> >> >>> > examples > >>> >> >>> > help people like me, and construct one or two to help us out. > >>> >> >>> That's what use-cases are for in designing APIs. There are > >>> >> >>> examples > >>> >> >>> of use in the NEP: > >>> >> >>> > >>> >> >>> > >>> >> >>> > https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst > >>> >> >>> > >>> >> >>> the alterNEP: > >>> >> >>> > >>> >> >>> https://gist.github.com/1056379 > >>> >> >>> > >>> >> >>> and my longer email to Travis: > >>> >> >>> > >>> >> >>> > >>> >> >>> > >>> >> >>> > >>> >> >>> > http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored > >>> >> >>> > >>> >> >>> Mark has done a nice job of documentation: > >>> >> >>> > >>> >> >>> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html > >>> >> >>> > >>> >> >>> If you want to understand what the alterNEP case is, I'd suggest > >>> >> >>> the > >>> >> >>> email, just because it's the most recent and I think the > >>> >> >>> terminology > >>> >> >>> is slightly clearer. > >>> >> >>> > >>> >> >>> Doing the same examples on a larger array won't make the point > >>> >> >>> easier > >>> >> >>> to understand. The discussion is about what the right concepts > >>> >> >>> are, > >>> >> >>> and you can help by looking at the snippets of code in those > >>> >> >>> documents, and deciding for yourself whether you think the > current > >>> >> >>> masking / NA implementation seems natural and easy to explain, > or > >>> >> >>> rather forced and difficult to explain, and then email back > trying > >>> >> >>> to > >>> >> >>> explain your impression (which is not always easy). > >>> >> >> > >>> >> >> If you seriously believe that looking at a few snippets is as > >>> >> >> helpful > >>> >> >> and > >>> >> >> instructive as being able to play around with them in IPython and > >>> >> >> modify > >>> >> >> them, then I guess we won't make progress in this part of the > >>> >> >> discussion. > >>> >> >> You're just telling me to go back and re-read things I'd already > >>> >> >> read. > >>> >> > > >>> >> > The snippets are in ipython or doctest format - aren't they? > >>> >> > >>> >> Oops - 10 minute rule. Now I see that you mean that you can't > >>> >> experiment with the alternative implementation without working code. > >>> > > >>> > Indeed. > >>> > > >>> >> > >>> >> That's true, but I am hoping that the difference between - say: > >>> >> > >>> >> a[0:2] = np.NA > >>> >> > >>> >> and > >>> >> > >>> >> a.mask[0:2] = False > >>> >> > >>> >> would be easy enough to imagine. > >>> > > >>> > It is in this case. I agree the explicit ``a.mask`` is clearer. This > is > >>> > a > >>> > quite specific point that could be improved in the current > >>> > implementation. > >>> > >>> Thanks - this is helpful. > >>> > >>> > It doesn't require ripping everything out. > >>> > >>> Nathaniel wasn't proposing 'ripping everything out' - but backing off > >>> until consensus has been reached. That's different. If you think > >>> we should not do that, and you are interested, please say why. > >>> Second - I was proposing that we do indeed keep the code in the > >>> codebase but discuss adaptations that could achieve consensus. > >>> > >> > >> I'm much opposed to ripping the current code out. > > > > You are repeating the loaded phrase 'ripping the current code out' and > > thus making the discussion less sensible and more hostile. > > > >> It isn't like it is (known > >> to be) buggy, nor has anyone made the case that it isn't a basis on > which > >> build other options. It also smacks of gratuitous violence committed by > >> someone yet to make a positive contribution to the project. > > > > This is cheap, rude, and silly. All I can see from Nathaniel is a > > reasonable, fair attempt to discuss the code. He proposed backing off > > the code in good faith. You are emphatically, and, in my view > > childishly, ignoring the substantial points he is making, and > > asserting over and over that he deserves no hearing because he has not > > contributed code. This is a terribly destructive way to work. If I > > was a new developer reading this, I would conclude, that I had better > > be damn careful which side I'm on, before I express my opinion, > > otherwise I'm going to be made to feel like I don't exist by the other > > people on the project. That is miserable, it is silly, and it's the > > wrong way to do business. > > I conclude that it's bad to drink this much coffee in an afternoon, > and that the next time I visit my friend's house, I'll take some > decaf. > > Sorry Chuck - you're right - this was too personal. I do disagree > with you, but I was rude here and I am sorry. I owe you an expensive > drink, as per Ben's excellent suggestion. > > Apology accepted. Let me add an argument for not pulling out the current implementation, which is the underlying reason of the release early, release often open software mantra: if the NA work is off in a branch, no one will use it and we will lack useful feedback. Now, I don't have a problem with adding a comment to the release notes stating that the API is not completely settled and can change due to user feedback. But we do need users, and they need to work with it for at least a few weeks. My own initial reaction to new software often evolves as: "WTF", followed by hours -- days -- weeks -- while I wander around muttering "morons, idiots" to myself. That is not the best period of time for me to make a balanced assessment, that needs to wait until I settle down. Then I adapt and usually things no longer look so bad, maybe they even look good, maybe even great. So it goes. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Oct 29 22:49:39 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 29 Oct 2011 19:49:39 -0700 Subject: [Numpy-discussion] Large numbers into float128 In-Reply-To: References: Message-ID: Hi, On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett wrote: > Hi, > > Can anyone think of a good way to set a float128 value to an > arbitrarily large number? > > As in > > v = int_to_float128(some_value) > > ? > > I'm trying things like > > v = np.float128(2**64+2) > > but, because (in other threads) the float128 seems to be going through > float64 on assignment, this loses precision, so although 2**64+2 is > representable in float128, in fact I get: > > In [35]: np.float128(2**64+2) > Out[35]: 18446744073709551616.0 > > In [36]: 2**64+2 > Out[36]: 18446744073709551618L > > So - can anyone think of another way to assign values to float128 that > will keep the precision? To answer my own question - I found an unpleasant way of doing this. Basically it is this: def int_to_float128(val): f64 = np.float64(val) res = val - int(f64) return np.float128(f64) + np.float128(res) Used in various places here: https://github.com/matthew-brett/nibabel/blob/e18e94c5b0f54775c46b1c690491b8bd6f07eb49/nibabel/floating.py Best, Matthew From matthew.brett at gmail.com Sat Oct 29 22:52:39 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 29 Oct 2011 19:52:39 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Sat, Oct 29, 2011 at 7:48 PM, Charles R Harris wrote: > > > On Sat, Oct 29, 2011 at 7:47 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Sat, Oct 29, 2011 at 4:11 PM, Matthew Brett >> wrote: >> > Hi, >> > >> > On Sat, Oct 29, 2011 at 2:59 PM, Charles R Harris >> > wrote: >> >> >> >> >> >> On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett >> >> >> >> wrote: >> >>> >> >>> Hi, >> >>> >> >>> On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers >> >>> wrote: >> >>> > >> >>> > >> >>> > On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett >> >>> > >> >>> > wrote: >> >>> >> >> >>> >> Hi, >> >>> >> >> >>> >> On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett >> >>> >> >> >>> >> wrote: >> >>> >> > Hi, >> >>> >> > >> >>> >> > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers >> >>> >> > wrote: >> >>> >> >> >> >>> >> >> >> >>> >> >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett >> >>> >> >> >> >>> >> >> wrote: >> >>> >> >>> >> >>> >> >>> Hi, >> >>> >> >>> >> >>> >> >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers >> >>> >> >>> wrote: >> >>> >> >>> > >> >>> >> >>> > >> >>> >> >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett >> >>> >> >>> > >> >>> >> >>> > wrote: >> >>> >> >>> >> >> >>> >> >>> >> Hi, >> >>> >> >>> >> >> >>> >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >> >>> >> >>> >> wrote: >> >>> >> >>> >> > >> >>> >> >>> >> > >> >>> >> >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >> >>> >> >>> >> > >> >>> >> >>> >> > wrote: >> >>> >> >>> >> >> >> >>> >> >>> >> >> Hi, >> >>> >> >>> >> >> >> >>> >> >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> >>> >> >>> >> >> wrote: >> >>> >> >>> >> >> >> >> >>> >> >>> >> >> >> >>> >> >>> >> >> No, that's not what Nathaniel and I are saying at all. >> >>> >> >>> >> >> Nathaniel >> >>> >> >>> >> >> was >> >>> >> >>> >> >> pointing to links for projects that care that everyone >> >>> >> >>> >> >> agrees >> >>> >> >>> >> >> before >> >>> >> >>> >> >> they go ahead. >> >>> >> >>> >> > >> >>> >> >>> >> > It looked to me like there was a serious intent to come to >> >>> >> >>> >> > an >> >>> >> >>> >> > agreement, >> >>> >> >>> >> > or >> >>> >> >>> >> > at least closer together. The discussion in the summer was >> >>> >> >>> >> > going >> >>> >> >>> >> > around >> >>> >> >>> >> > in >> >>> >> >>> >> > circles though, and was too abstract and complex to >> >>> >> >>> >> > follow. >> >>> >> >>> >> > Therefore >> >>> >> >>> >> > Mark's >> >>> >> >>> >> > choice of implementing something and then asking for >> >>> >> >>> >> > feedback >> >>> >> >>> >> > made >> >>> >> >>> >> > sense >> >>> >> >>> >> > to >> >>> >> >>> >> > me. >> >>> >> >>> >> >> >>> >> >>> >> I should point out that the implementation hasn't - as far >> >>> >> >>> >> as I >> >>> >> >>> >> can >> >>> >> >>> >> see - changed the discussion. ?The discussion was about the >> >>> >> >>> >> API. >> >>> >> >>> >> >> >>> >> >>> >> Implementations are useful for agreed APIs because they can >> >>> >> >>> >> point >> >>> >> >>> >> out >> >>> >> >>> >> where the API does not make sense or cannot be implemented. >> >>> >> >>> >> ?In >> >>> >> >>> >> this >> >>> >> >>> >> case, the API Mark said he was going to implement - he did >> >>> >> >>> >> implement - >> >>> >> >>> >> at least as far as I can see. ?Again, I'm happy to be >> >>> >> >>> >> corrected. >> >>> >> >>> > >> >>> >> >>> > Implementations can also help the discussion along, by >> >>> >> >>> > allowing >> >>> >> >>> > people >> >>> >> >>> > to >> >>> >> >>> > try out some of the proposed changes. It also allows to >> >>> >> >>> > construct >> >>> >> >>> > examples >> >>> >> >>> > that show weaknesses, possibly to be solved by an alternative >> >>> >> >>> > API. >> >>> >> >>> > Maybe >> >>> >> >>> > you >> >>> >> >>> > can hold the complete history of this topic in your head and >> >>> >> >>> > comprehend >> >>> >> >>> > it, >> >>> >> >>> > but for me it would be very helpful if someone said: >> >>> >> >>> > - here's my dataset >> >>> >> >>> > - this is what I want to do with it >> >>> >> >>> > - this is the best I can do with the current implementation >> >>> >> >>> > - here's how API X would allow me to solve this better or >> >>> >> >>> > simpler >> >>> >> >>> > This can be done much better with actual data and an actual >> >>> >> >>> > implementation >> >>> >> >>> > than with a design proposal. You seem to disagree with this >> >>> >> >>> > statement. >> >>> >> >>> > That's fine. I would hope though that you recognize that >> >>> >> >>> > concrete >> >>> >> >>> > examples >> >>> >> >>> > help people like me, and construct one or two to help us out. >> >>> >> >>> That's what use-cases are for in designing APIs. ?There are >> >>> >> >>> examples >> >>> >> >>> of use in the NEP: >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst >> >>> >> >>> >> >>> >> >>> the alterNEP: >> >>> >> >>> >> >>> >> >>> https://gist.github.com/1056379 >> >>> >> >>> >> >>> >> >>> and my longer email to Travis: >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored >> >>> >> >>> >> >>> >> >>> Mark has done a nice job of documentation: >> >>> >> >>> >> >>> >> >>> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html >> >>> >> >>> >> >>> >> >>> If you want to understand what the alterNEP case is, I'd >> >>> >> >>> suggest >> >>> >> >>> the >> >>> >> >>> email, just because it's the most recent and I think the >> >>> >> >>> terminology >> >>> >> >>> is slightly clearer. >> >>> >> >>> >> >>> >> >>> Doing the same examples on a larger array won't make the point >> >>> >> >>> easier >> >>> >> >>> to understand. ?The discussion is about what the right concepts >> >>> >> >>> are, >> >>> >> >>> and you can help by looking at the snippets of code in those >> >>> >> >>> documents, and deciding for yourself whether you think the >> >>> >> >>> current >> >>> >> >>> masking / NA implementation seems natural and easy to explain, >> >>> >> >>> or >> >>> >> >>> rather forced and difficult to explain, and then email back >> >>> >> >>> trying >> >>> >> >>> to >> >>> >> >>> explain your impression (which is not always easy). >> >>> >> >> >> >>> >> >> If you seriously believe that looking at a few snippets is as >> >>> >> >> helpful >> >>> >> >> and >> >>> >> >> instructive as being able to play around with them in IPython >> >>> >> >> and >> >>> >> >> modify >> >>> >> >> them, then I guess we won't make progress in this part of the >> >>> >> >> discussion. >> >>> >> >> You're just telling me to go back and re-read things I'd already >> >>> >> >> read. >> >>> >> > >> >>> >> > The snippets are in ipython or doctest format - aren't they? >> >>> >> >> >>> >> Oops - 10 minute rule. ?Now I see that you mean that you can't >> >>> >> experiment with the alternative implementation without working >> >>> >> code. >> >>> > >> >>> > Indeed. >> >>> > >> >>> >> >> >>> >> That's true, but I am hoping that the difference between - say: >> >>> >> >> >>> >> a[0:2] = np.NA >> >>> >> >> >>> >> and >> >>> >> >> >>> >> a.mask[0:2] = False >> >>> >> >> >>> >> would be easy enough to imagine. >> >>> > >> >>> > It is in this case. I agree the explicit ``a.mask`` is clearer. This >> >>> > is >> >>> > a >> >>> > quite specific point that could be improved in the current >> >>> > implementation. >> >>> >> >>> Thanks - this is helpful. >> >>> >> >>> > It doesn't require ripping everything out. >> >>> >> >>> Nathaniel wasn't proposing 'ripping everything out' - but backing off >> >>> until consensus has been reached. ?That's different. ? ?If you think >> >>> we should not do that, and you are interested, please say why. >> >>> Second - I was proposing that we do indeed keep the code in the >> >>> codebase but discuss adaptations that could achieve consensus. >> >>> >> >> >> >> I'm much opposed to ripping the current code out. >> > >> > You are repeating the loaded phrase 'ripping the current code out' and >> > thus making the discussion less sensible and more hostile. >> > >> >> ?It isn't like it is (known >> >> to be) buggy, nor has anyone made the case that it isn't a basis on >> >> which >> >> build other options. It also smacks of gratuitous violence committed by >> >> someone yet to make a positive contribution to the project. >> > >> > This is cheap, rude, and silly. ?All I can see from Nathaniel is a >> > reasonable, fair attempt to discuss the code. ?He proposed backing off >> > the code in good faith. ? You are emphatically, and, in my view >> > childishly, ignoring the substantial points he is making, and >> > asserting over and over that he deserves no hearing because he has not >> > contributed code. ? This is a terribly destructive way to work. ?If I >> > was a new developer reading this, I would conclude, that I had better >> > be damn careful which side I'm on, before I express my opinion, >> > otherwise I'm going to be made to feel like I don't exist by the other >> > people on the project. ?That is miserable, it is silly, and it's the >> > wrong way to do business. >> >> I conclude that it's bad to drink this much coffee in an afternoon, >> and that the next time I visit my friend's house, I'll take some >> decaf. >> >> Sorry Chuck - you're right - this was too personal. ? I do disagree >> with you, but I was rude here and I am sorry. ?I owe you an expensive >> drink, as per Ben's excellent suggestion. >> > > Apology accepted. Thank you, that is gracious of you. > Let me add an argument for not pulling out the current > implementation, which is the underlying reason of the release early, release > often open software mantra: if the NA work is off in a branch, no one will > use it and we will lack useful feedback. Now, I don't have a problem with > adding a comment to the release notes stating that the API is not completely > settled and can change due to user feedback. But we do need users, and they > need to work with it for at least a few weeks. My own initial reaction to > new software often evolves as: "WTF", followed by hours -- days -- weeks -- > while I wander around muttering "morons, idiots" to myself. That is not the > best period of time for me to make a balanced assessment, that needs to wait > until I settle down. Then I adapt and usually things no longer look so bad, > maybe they even look good, maybe even great. So it goes. Yes, that's very reasonable. It may be that we don't have good hope of resolving the current discussion in the near future, in which case it would not make much sense to pull it out pending agreement. Best (honestly), Matthew From jason-sage at creativetrax.com Sat Oct 29 22:58:13 2011 From: jason-sage at creativetrax.com (Jason Grout) Date: Sat, 29 Oct 2011 21:58:13 -0500 Subject: [Numpy-discussion] consensus In-Reply-To: References: Message-ID: <4EACBD45.10200@creativetrax.com> On 10/29/11 5:02 PM, Olivier Delalleau wrote: > I haven't been following the discussion closely, but wouldn't it be instead: > a.mask[0:2] = True? > > It's something that I actually find a bit difficult to get right in the > current numpy.ma implementation: I would find more > intuitive to have True for "valid" data, and False for invalid / missing > / ... I realize how the implementation makes sense (and is appropriate > given that the name is "mask"), but I just thought I'd point this out... > even if it's just me ;) Just a thought: what if this also worked: a.mask[0:2]=np.NA as a synonym for a.mask[0:2]=True? Would that be less confusing, and/or would it be less powerful or extensible in important ways? Thanks, Jason Grout From charlesr.harris at gmail.com Sat Oct 29 23:02:37 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 29 Oct 2011 21:02:37 -0600 Subject: [Numpy-discussion] Large numbers into float128 In-Reply-To: References: Message-ID: On Sat, Oct 29, 2011 at 8:49 PM, Matthew Brett wrote: > Hi, > > On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett > wrote: > > Hi, > > > > Can anyone think of a good way to set a float128 value to an > > arbitrarily large number? > > > > As in > > > > v = int_to_float128(some_value) > > > > ? > > > > I'm trying things like > > > > v = np.float128(2**64+2) > > > > but, because (in other threads) the float128 seems to be going through > > float64 on assignment, this loses precision, so although 2**64+2 is > > representable in float128, in fact I get: > > > > In [35]: np.float128(2**64+2) > > Out[35]: 18446744073709551616.0 > > > > In [36]: 2**64+2 > > Out[36]: 18446744073709551618L > > > > So - can anyone think of another way to assign values to float128 that > > will keep the precision? > > To answer my own question - I found an unpleasant way of doing this. > > Basically it is this: > > def int_to_float128(val): > f64 = np.float64(val) > res = val - int(f64) > return np.float128(f64) + np.float128(res) > > Used in various places here: > > > https://github.com/matthew-brett/nibabel/blob/e18e94c5b0f54775c46b1c690491b8bd6f07eb49/nibabel/floating.py > > Best, > > It might be useful to look into mpmath. I didn't see any way to export mp values into long double, but they do offer a number of resources for working with arbitrary precision. We could maybe even borrow some of their stuff for parsing values from strings Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Sat Oct 29 23:16:31 2011 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 29 Oct 2011 22:16:31 -0500 Subject: [Numpy-discussion] consensus In-Reply-To: <4EACBD45.10200@creativetrax.com> References: <4EACBD45.10200@creativetrax.com> Message-ID: On Saturday, October 29, 2011, Jason Grout wrote: > On 10/29/11 5:02 PM, Olivier Delalleau wrote: >> I haven't been following the discussion closely, but wouldn't it be instead: >> a.mask[0:2] = True? >> >> It's something that I actually find a bit difficult to get right in the >> current numpy.ma implementation: I would find more >> intuitive to have True for "valid" data, and False for invalid / missing >> / ... I realize how the implementation makes sense (and is appropriate >> given that the name is "mask"), but I just thought I'd point this out... >> even if it's just me ;) > > > Just a thought: what if this also worked: > > a.mask[0:2]=np.NA > > as a synonym for a.mask[0:2]=True? > > Would that be less confusing, and/or would it be less powerful or > extensible in important ways? > > Thanks, > > Jason Grout > > Don't know. It is a different way of looking at it. I am also still wary of adding attributes to the array. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant at enthought.com Sun Oct 30 01:02:14 2011 From: oliphant at enthought.com (Travis Oliphant) Date: Sun, 30 Oct 2011 00:02:14 -0500 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: <0D453B6A-ED52-4BF5-B482-8C53DC02221C@enthought.com> >> >> Here are my needs: >> >> 1) How NAs are implemented cannot be end user visible. Having to pass >> maskna=True is a problem. I suppose a solution is to set the flag to >> true on every array inside of pandas so the user never knows (you >> mentioned someone else had some other solution, i could go back and >> dig it up?) > > I guess this would be the same with bitpatterns, in that the user > would have to specify a custom dtype. > > Is it possible to add a bitpattern NA (in the NaN values) to the > current floating point types, at least in principle? So that np.float > etc would have bitpattern NAs without a custom dtype? That is an interesting idea. It's essentially what people like Wes McKinney are doing now. However, the issue is going to be whether or not you do something special or not with the NA values in the low-level C function the dtype dispatches to. This is the reason for the special bit-pattern dtype. I've always thought that requiring NA checks for code that doesn't want to worry about it would slow things down un-necessarily for those use-cases. But, not dealing with missing data well is a missing NumPy feature. -Travis > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com From oliphant at enthought.com Sun Oct 30 02:19:52 2011 From: oliphant at enthought.com (Travis Oliphant) Date: Sun, 30 Oct 2011 01:19:52 -0500 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: <4EAC9926.7020704@hawaii.edu> References: <4EAC8282.9070708@hawaii.edu> <4EAC9926.7020704@hawaii.edu> Message-ID: On Oct 29, 2011, at 7:24 PM, Eric Firing wrote: > On 10/29/2011 12:57 PM, Charles R Harris wrote: >> >> >> On Sat, Oct 29, 2011 at 4:47 PM, Eric Firing > > wrote: >> >> On 10/29/2011 12:02 PM, Olivier Delalleau wrote: >> >>> >>> I haven't been following the discussion closely, but wouldn't it >> be instead: >>> a.mask[0:2] = True? >> >> That would be consistent with numpy.ma and the >> opposite of Mark's >> implementation. >> >> I can live with either, but I much prefer the numpy.ma >> version because >> it fits with the use of bit-flags for editing data; set bit 1 if it >> fails check A, set bit 2 if it fails check B, etc. So, if it evaluates >> as True, there is a problem, and the value is masked *out*. >> >> Similarly, in Marks implementation, 7 bits are available for a payload >> to describe what kind of masking is meant. This seems more consistent >> with True as masked (or NA) than with False as masked. >> >> >> I wouldn't rely on the 7 bits yet. Mark left them available to keep open >> possible future use, but didn't implement anything using them yet. If >> memory use turns out to exclude whole sectors of application we will >> have to go to bit masks. > > Right; I was only commenting on a subjective sense of internal > consistency. A minor point. > > The larger context of all this is how users end up being able to work > with all the different types and specifications of "NA" (in the most > general sense) data: > > 1) nans > 2) numpy.ma > 3) masks in the core (Mark's new code) > 4) bit patterns > > Substantial code now in place--including matplotlib--relies on numpy.ma. > It has some rough edges, it can be slow, it is a pain having it as a > bolted-on module, it may be more complicated than it needs to be, but it > fits a lot of use cases pretty well. There are many users. Everyone > using matplotlib is using it, whether they know it or not. > > The ideal from my numpy.ma-user's standpoint would an NA-handling > implementation in the core that would do two things: > (1) allow a gradual transition away from numpy.ma, so that the latter > would become redundant. > (2) allow numpy.ma to be reasonably easily modified to use the in-core > facilities for greater efficiency during the long transition. Implicit > is the hope that someone (most likely not me, although I might be able > to help a bit) would actually perform this modification. > > Mark's mission, paid for by Enthought, was not to please numpy.ma users, > but to add NA-handling that would be comfortable for R-users. He chose > to do so with the idea that two possible implementations (masks and > bitpatterns) were desirable, each with strengths and weaknesses, and > that so as to get *something* done in the very short time he had left, > he would start with the mask implementation. We now have the result, > incomplete, but not breaking anything. Additional development (coding > as well as designing) will be needed. > > The main question raised by Matthew and Nathaniel is, I think, whether > Mark's code should develop in a direction away from the R-compatibility > model, with the idea that the latter would be handled via a bit-pattern > implementation, some day, when someone codes it; or whether it should > remain as the prototype and first implementation of an API to handle the > R-compatible use case, minimizing any divergence from any eventual > bit-pattern implementation. > > The answer to this depends on several questions, including: > > 1) Who is available to do how much implementation of any of the > possibilities? My reading of Travis's blog and rare posts to this list > suggest that he hopes and expects to be able to free up coding time. > Perhaps he will clarify that soon. > > 2) What sorts of changes would actually be needed to make the present > implementation good enough for the R use case? Evolutionary, or > revolutionary? > > 3) What sorts of changes would help with the numpy.ma use case? > Evolutionary, or revolutionary. > > 4) Given available resources, how can we maximize progress: making numpy > more capable, easier to use, etc. > > Unless the answers to questions 2 *and* 3 are "revolutionary", I don't > see the point in pulling Mark's changes out of master. At most, the > documentation might be changed to mark the NA API as "experimental" for > a release or two. I appreciate Nathaniel's idea to pull the changes and I can respect his desire to do that. It seemed like there was a lot more heat than light in the discussion this summer. The differences seemed to be enflamed by the discussion instead of illuminated by it. Perhaps, that is why Nathaniel felt like merging Mark's pull request was too strong-armed and not a proper resolution. However, I did not interpret Matthew or Nathaniel's explanations of their position as manipulative or inappropriate. Nonetheless, I don't think removing Mark's changes are a productive direction to take at this point. I agree, it would have been much better to reach a rough consensus before the code was committed. At least, those who felt like their ideas where not accounted for should have felt like there was some plan to either accommodate them, or some explanation of why that was not a good idea. The only thing I recall being said was that there was nobody to implement their ideas. I wish that weren't the case. I think we can still continue to discuss their concerns and look for ways to reasonably incorporate their use-cases if possible. I have probably contributed in the past to the idea that "he who writes the code gets the final say". In early-stage efforts, this is approximately right, but success of anything relies on satisfied users and as projects mature the voice of users becomes more relevant than the voice of contributors in my mind. I've certainly had to learn that in terms of ABI changes to NumPy. Personally, I am very, very interested in users of NumPy and their ideas about how things should be done. I have my own use cases from my experience, but I've always found that the code is better if it incorporates use-cases of others. In the end, I'm much more interested in users of NumPy and their use-cases and experience then even contributors. Historically, contributors to NumPy have been scarce and development slow. I am working to change that right now. I will say more when I have more to say in that direction. To be clear, in this particular case I know that there are multiple users, and the best I can tell there is some disagreement between those users about the appropriate APIs. But, this disagreement is actually lost in some of the discussion. In fact, it seems to me that the different perspectives are not all that different and their ought to be a way to work it out. Perhaps this is hopeless naivete, but it's my current perspective. I really appreciate the efforts of people who have been active on NumPy development and maintenance for the past 4 years. I also appreciate the activity of all the users of NumPy: matplotlib, Pandas, scikits, SciPy, statsmodels, and so on. The larger NumPy community is much broader than the discussions that take place on this list (or even on the SciPy list). I have seen NumPy in use in a lot of places over the past 4 years. I have also seen NumPy *not* in use where it really could be (with some adaptations). I'm still hopeful that we will continue to make this forum a place where even "just users" of NumPy always feel able to raise their voice and say, "Hey, I wish things were done this way." It is rare when all voices can be satisfied, of course, but a priori it is worth a college try. If anything I hope for emerges, the user-base of NumPy will be growing significantly over the coming months and years and I really hope this list continues to be a place where I can be comfortable sending them. More to come, -Travis From matthew.brett at gmail.com Sun Oct 30 02:42:23 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 29 Oct 2011 23:42:23 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: <0D453B6A-ED52-4BF5-B482-8C53DC02221C@enthought.com> References: <0D453B6A-ED52-4BF5-B482-8C53DC02221C@enthought.com> Message-ID: Hi, On Sat, Oct 29, 2011 at 10:02 PM, Travis Oliphant wrote: >>> >>> Here are my needs: >>> >>> 1) How NAs are implemented cannot be end user visible. Having to pass >>> maskna=True is a problem. I suppose a solution is to set the flag to >>> true on every array inside of pandas so the user never knows (you >>> mentioned someone else had some other solution, i could go back and >>> dig it up?) >> >> I guess this would be the same with bitpatterns, in that the user >> would have to specify a custom dtype. >> >> Is it possible to add a bitpattern NA (in the NaN values) to the >> current floating point types, at least in principle? ?So that np.float >> etc would have bitpattern NAs without a custom dtype? > > That is an interesting idea. ? It's essentially what people like Wes McKinney are doing now. ? ?However, the issue is going to be whether or not you do something special or not with the NA values in the low-level C function the dtype dispatches to. ?This is the reason for the special bit-pattern dtype. > > I've always thought that requiring NA checks for code that doesn't want to worry about it would slow things down un-necessarily for those use-cases. Right - now that the caffeine has run through my system adequately, I have a few glasses of wine to disrupt my logic and / or social skills but: Is there any way you could imagine something like this?: In [3]: a = np.arange(10, dtype=np.float) In [4]: a.flags Out[4]: C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False MAYBE_NA : False In [5]: a[0] = np.NA In [6]: a.flags Out[6]: C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False MAYBE_NA : True Obviously extension writers would have to keep the flag maintained... Sorry if that doesn't make sense, I do not claim to be in full possession of my faculties, See you, Matthew From matthew.brett at gmail.com Sun Oct 30 03:00:43 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 30 Oct 2011 00:00:43 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: <4EAC8282.9070708@hawaii.edu> <4EAC9926.7020704@hawaii.edu> Message-ID: Hi, On Sat, Oct 29, 2011 at 11:19 PM, Travis Oliphant wrote: Thanks again for your email, I'm sure I'm not the only one who breathes a deep sigh of relief when I see your posts. > I appreciate Nathaniel's idea to pull the changes and I can respect his desire to do that. ? It seemed like there was a lot more heat than light in the discussion this summer. ? The differences seemed to be enflamed by the discussion instead of illuminated by it. ?Perhaps, that is why Nathaniel felt like merging Mark's pull request was too strong-armed and not a proper resolution. > > However, I did not interpret Matthew or Nathaniel's explanations of their position as manipulative or inappropriate. ?Nonetheless, I don't think removing Mark's changes are a productive direction to take at this point. ? I agree, it would have been much better to reach a rough consensus before the code was committed. ?At least, those who felt like their ideas where not accounted for should have felt like there was some plan to either accommodate them, or some explanation of why that was not a good idea. ?The only thing I recall being said was that there was nobody to implement their ideas. ? I wish that weren't the case. ? I think we can still continue to discuss their concerns and look for ways to reasonably incorporate their use-cases if possible. > > I have probably contributed in the past to the idea that "he who writes the code gets the final say". ? ?In early-stage efforts, this is approximately right, but success of anything relies on satisfied users and as projects mature the voice of users becomes more relevant than the voice of contributors in my mind. ? I've certainly had to learn that in terms of ABI changes to NumPy. I think that's right though - that the person who wrote the code has the final say. But that's the final say. The question I wanted to ask was the one Nathaniel brought up at the beginning of the thread, which is, before the final say, how hard do we try for consensus? Is that - the numpy way? Here Chuck was saying 'I listen to you in proportion to your code contribution' (I hope I'm not misrepresenting him). I think that's different way of working than the consensus building that Karl Fogel describes. But maybe that is just the numpy way. I would feel happier to know what that way is. Then, when we get into this kind of dispute Chuck can say 'Matthew, change the numpy constitution or accept the situation because that's how we've agreed to work'. And I'll say - 'OK - I don't like it, but I agree those are the rules'. And we'll get on with it. But at the moment, it feels as if it isn't clear, and, as Ben pointed out, that means we are having a discussion and a discussion about the discussion at the same time. See you, Matthew From nadavh at visionsense.com Sun Oct 30 04:32:21 2011 From: nadavh at visionsense.com (Nadav Horesh) Date: Sun, 30 Oct 2011 01:32:21 -0700 Subject: [Numpy-discussion] Large numbers into float128 In-Reply-To: References: , Message-ID: <26FC23E7C398A64083C980D16001012D261BA8F055@VA3DIAXVS361.RED001.local> A quick and dirty cython code is attached Use: >> import Float128 >> a = Float128.Float128('1E500') array([ 1e+500], dtype=float128) or >> b = np.float128(1.34) * np.float128(10)**2500 >> b 1.3400000000000000779e+2500 Maybe there is also a way to do it in a pure python code via ctypes? Nadav ________________________________ From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of Charles R Harris [charlesr.harris at gmail.com] Sent: 30 October 2011 05:02 To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Large numbers into float128 On Sat, Oct 29, 2011 at 8:49 PM, Matthew Brett > wrote: Hi, On Sat, Oct 29, 2011 at 3:55 PM, Matthew Brett > wrote: > Hi, > > Can anyone think of a good way to set a float128 value to an > arbitrarily large number? > > As in > > v = int_to_float128(some_value) > > ? > > I'm trying things like > > v = np.float128(2**64+2) > > but, because (in other threads) the float128 seems to be going through > float64 on assignment, this loses precision, so although 2**64+2 is > representable in float128, in fact I get: > > In [35]: np.float128(2**64+2) > Out[35]: 18446744073709551616.0 > > In [36]: 2**64+2 > Out[36]: 18446744073709551618L > > So - can anyone think of another way to assign values to float128 that > will keep the precision? To answer my own question - I found an unpleasant way of doing this. Basically it is this: def int_to_float128(val): f64 = np.float64(val) res = val - int(f64) return np.float128(f64) + np.float128(res) Used in various places here: https://github.com/matthew-brett/nibabel/blob/e18e94c5b0f54775c46b1c690491b8bd6f07eb49/nibabel/floating.py Best, It might be useful to look into mpmath. I didn't see any way to export mp values into long double, but they do offer a number of resources for working with arbitrary precision. We could maybe even borrow some of their stuff for parsing values from strings Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Float128.pyx URL: From berthold at xn--hllmanns-n4a.de Sun Oct 30 05:38:40 2011 From: berthold at xn--hllmanns-n4a.de (Berthold =?utf-8?Q?H=C3=B6llmann?=) Date: Sun, 30 Oct 2011 10:38:40 +0100 Subject: [Numpy-discussion] Large numbers into float128 In-Reply-To: (Matthew Brett's message of "Sat, 29 Oct 2011 15:55:21 -0700") References: Message-ID: <87r51u6cpr.fsf@pchoel.xn--hllmanns-n4a.de> Matthew Brett writes: > Hi, > > Can anyone think of a good way to set a float128 value to an > arbitrarily large number? > > As in > > v = int_to_float128(some_value) > > ? > > I'm trying things like > > v = np.float128(2**64+2) > > but, because (in other threads) the float128 seems to be going through > float64 on assignment, this loses precision, so although 2**64+2 is > representable in float128, in fact I get: > > In [35]: np.float128(2**64+2) > Out[35]: 18446744073709551616.0 > > In [36]: 2**64+2 > Out[36]: 18446744073709551618L > > So - can anyone think of another way to assign values to float128 that > will keep the precision? Just use float128 all the was through, and avoid casting to float in between: .>>> "%20.1f"%float(2**64+2) '18446744073709551616.0' .>>> np.float128(np.float128(2)**64+2) 18446744073709551618.0 Regards Berthold > > Thanks a lot, > > Matthew -- A: Weil es die Lesbarkeit des Textes verschlechtert. F: Warum ist TOFU so schlimm? A: TOFU F: Was ist das gr??te ?rgernis im Usenet? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: From matthew.brett at gmail.com Sun Oct 30 05:49:56 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 30 Oct 2011 02:49:56 -0700 Subject: [Numpy-discussion] Large numbers into float128 In-Reply-To: <87r51u6cpr.fsf@pchoel.xn--hllmanns-n4a.de> References: <87r51u6cpr.fsf@pchoel.xn--hllmanns-n4a.de> Message-ID: Hi, On Sun, Oct 30, 2011 at 2:38 AM, Berthold H?llmann wrote: > Matthew Brett writes: > >> Hi, >> >> Can anyone think of a good way to set a float128 value to an >> arbitrarily large number? >> >> As in >> >> v = int_to_float128(some_value) >> >> ? >> >> I'm trying things like >> >> v = np.float128(2**64+2) >> >> but, because (in other threads) the float128 seems to be going through >> float64 on assignment, this loses precision, so although 2**64+2 is >> representable in float128, in fact I get: >> >> In [35]: np.float128(2**64+2) >> Out[35]: 18446744073709551616.0 >> >> In [36]: 2**64+2 >> Out[36]: 18446744073709551618L >> >> So - can anyone think of another way to assign values to float128 that >> will keep the precision? > > Just use float128 all the was through, and avoid casting to float in > between: > > .>>> "%20.1f"%float(2**64+2) > '18446744073709551616.0' > .>>> np.float128(np.float128(2)**64+2) > 18446744073709551618.0 Ah yes - sorry - that would work in this example where I know the component parts of the number, but I was thinking in the general case where I have been given any int. I think my code works for that, by casting to float64 to break up the number into parts: In [35]: def int_to_float128(val): ....: f64 = np.float64(val) ....: res = val - int(f64) ....: return np.float128(f64) + np.float128(res) ....: In [36]: int_to_float128(2**64) Out[36]: 18446744073709551616.0 In [37]: int_to_float128(2**64+2) Out[37]: 18446744073709551618.0 Thanks, Matthew From cournape at gmail.com Sun Oct 30 07:18:55 2011 From: cournape at gmail.com (David Cournapeau) Date: Sun, 30 Oct 2011 11:18:55 +0000 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: References: Message-ID: On Thu, Oct 27, 2011 at 5:19 PM, Ralf Gommers wrote: > Hi David, > > On Thu, Oct 27, 2011 at 3:02 PM, David Cournapeau > wrote: >> >> Hi, >> >> I was wondering if we could finally move to a more recent version of >> compilers for official win32 installers. This would of course concern >> the next release cycle, not the ones where beta/rc are already in >> progress. >> >> Basically, the pros: >> ?- we will have to move at some point >> ?- gcc 4.* seem less buggy, especially C++ and fortran. >> ?- no need to maintain msvcr90 vodoo >> The cons: >> ?- it will most likely break the ABI >> ?- we need to recompile atlas (but I can take care of it) >> ?- the biggest: it is difficult to combine gfortran with visual >> studio (more exactly you cannot link gfortran runtime to a visual >> studio executable). The only solution I could think of would be to >> recompile the gfortran runtime with Visual Studio, which for some >> reason does not sound very appealing :) > > To get the datetime changes to work with MinGW, we already concluded that > building with 4.x is more or less required (without recognizing some of the > points you list above). Changes to mingw32ccompiler to fix compilation with > 4.x went in in https://github.com/numpy/numpy/pull/156. It would be good if > you could check those. I will look into it more carefully, but overall, it seems that building atlas 3.8.4, numpy and scipy with gcc 4.x works quite well. The main issue is that gcc 4.* adds some dependencies on mingw dlls. There are two options: - adding the dlls in the installers - statically linking those, which seems to be a bad idea (generalizing the dll boundaries problem to exception and things we would rather not care about: http://cygwin.com/ml/cygwin/2007-06/msg00332.html). > It probably makes sense make this move for numpy 1.7. If this breaks the ABI > then it would be easiest to make numpy 1.7 the minimum required version for > scipy 0.11. My thinking as well. cheers, David From cournape at gmail.com Sun Oct 30 07:34:38 2011 From: cournape at gmail.com (David Cournapeau) Date: Sun, 30 Oct 2011 11:34:38 +0000 Subject: [Numpy-discussion] Is distributing GPL + exception dll in the windows installer ok Message-ID: Hi, While testing the mingw gcc 3.x -> 4.x migration, I realized that some technical requirements in gcc 4.x have potential license implications. In short, it is more difficult now than before to statically link gcc-related runtimes into numpy/scipy. I think using the DLL is safer and better, but it means the windows installers will contain GPL code. My understanding is that this is OK because the code in question is GPL + exception, meaning the usual GPL requirements only apply to those runtimes, and that's ok ? cheers, David From matthieu.brucher at gmail.com Sun Oct 30 07:38:31 2011 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Sun, 30 Oct 2011 12:38:31 +0100 Subject: [Numpy-discussion] Is distributing GPL + exception dll in the windows installer ok In-Reply-To: References: Message-ID: Hi David, Is every GPL part GCC related? If yes, GCC has a licence that allows to redistribute its runtime in any program (meaning the program's licence is not relevant). Cheers, Matthieu 2011/10/30 David Cournapeau > Hi, > > While testing the mingw gcc 3.x -> 4.x migration, I realized that some > technical requirements in gcc 4.x have potential license implications. > In short, it is more difficult now than before to statically link > gcc-related runtimes into numpy/scipy. I think using the DLL is safer > and better, but it means the windows installers will contain GPL code. > My understanding is that this is OK because the code in question is > GPL + exception, meaning the usual GPL requirements only apply to > those runtimes, and that's ok ? > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sun Oct 30 08:02:36 2011 From: cournape at gmail.com (David Cournapeau) Date: Sun, 30 Oct 2011 12:02:36 +0000 Subject: [Numpy-discussion] Is distributing GPL + exception dll in the windows installer ok In-Reply-To: References: Message-ID: On Sun, Oct 30, 2011 at 11:38 AM, Matthieu Brucher wrote: > Hi David, > > Is every GPL part GCC related? If yes, GCC has a licence that allows to > redistribute its runtime in any program (meaning the program's licence is > not relevant). Good point, I should have specified the dll in question: - libgcc - libstdc++ - libgfortran As far as I know, all those fall under the license you mention (GPL + exception), but it was not entirely clear to me. cheers, David >> >> While testing the mingw gcc 3.x -> 4.x migration, I realized that some >> technical requirements in gcc 4.x have potential license implications. >> In short, it is more difficult now than before to statically link >> gcc-related runtimes into numpy/scipy. I think using the DLL is safer >> and better, but it means the windows installers will contain GPL code. >> My understanding is that this is OK because the code in question is >> GPL + exception, meaning the usual GPL requirements only apply to >> those runtimes, and that's ok ? >> >> cheers, >> >> David >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > -- > Information System Engineer, Ph.D. > Blog: http://matt.eifelle.com > LinkedIn: http://www.linkedin.com/in/matthieubrucher > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From Chris.Barker at noaa.gov Sun Oct 30 14:29:00 2011 From: Chris.Barker at noaa.gov (Chris Barker) Date: Sun, 30 Oct 2011 11:29:00 -0700 Subject: [Numpy-discussion] consensus In-Reply-To: References: Message-ID: <4EAD976C.1090602@noaa.gov> On 10/29/11 2:48 PM, Ralf Gommers wrote: > That's true, but I am hoping that the difference between - say: > > a[0:2] = np.NA > > and > > a.mask[0:2] = False > > would be easy enough to imagine. > > > It is in this case. I agree the explicit ``a.mask`` is clearer. Interesting -- I suspect I'm mirror's Pandas' users here: a[0:2] = np.NA is simpler and easier to me -- I'm avoiding the word "clearer" because I m not sure what it means -- if we thin it's important for the user to understand that the NA value is implemented with a mask, then setting the mask explicitly is certainly clearer -- but I don't think that's important. Indeed, I still like the idea that for "casual" use, NA could be a special value, and could be a mask, and that the user does not need to know the difference. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Sun Oct 30 14:37:31 2011 From: Chris.Barker at noaa.gov (Chris Barker) Date: Sun, 30 Oct 2011 11:37:31 -0700 Subject: [Numpy-discussion] consensus In-Reply-To: References: Message-ID: <4EAD996B.60302@noaa.gov> On 10/29/11 2:59 PM, Charles R Harris wrote: > I'm much opposed to ripping the current code out. It isn't like it is > (known to be) buggy, nor has anyone made the case that it isn't a basis > on which build other options. It also smacks of gratuitous violence > committed by someone yet to make a positive contribution to the project. 1) contributing to the discussion IS a positive contribution to the project. 2) If we use the term "ripping out" it does "smacks of gratuitous violence" -- if we use the term "roll back", maybe not so much -- it's not like the code couldn't be put back in. That being said, I like the idea of it being easy and accessible for not-very-familiar-with-git folks to test -- so I'd like to see it left there for now at least. On 10/29/11 3:47 PM, Eric Firing wrote: > Similarly, in Marks implementation, 7 bits are available for a payload > to describe what kind of masking is meant. This seems more consistent > with True as masked (or NA) than with False as masked. +1 -- we've got 8 bits, nice to be able to use them On 10/29/11 3:57 PM, Charles R Harris wrote: > I wouldn't rely on the 7 bits yet. Mark left them available to keep open > possible future use, but didn't implement anything using them yet. If > memory use turns out to exclude whole sectors of application we will > have to go to bit masks. would there have to be only one type of mask available? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From ralf.gommers at googlemail.com Sun Oct 30 15:24:32 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 30 Oct 2011 20:24:32 +0100 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: On Sat, Oct 29, 2011 at 11:55 PM, Matthew Brett wrote: > Hi, > > On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers > wrote: > > > > > > On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett > > > wrote: > >> > >> Hi, > >> > >> On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett > > >> wrote: > >> > Hi, > >> > > >> > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers > >> > wrote: > >> >> > >> >> > >> >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett > >> >> > >> >> wrote: > >> >>> > >> >>> Hi, > >> >>> > >> >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers > >> >>> wrote: > >> >>> > > >> >>> > > >> >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett > >> >>> > > >> >>> > wrote: > >> >>> >> > >> >>> >> Hi, > >> >>> >> > >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers > >> >>> >> wrote: > >> >>> >> > > >> >>> >> > > >> >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett > >> >>> >> > > >> >>> >> > wrote: > >> >>> >> >> > >> >>> >> >> Hi, > >> >>> >> >> > >> >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris > >> >>> >> >> wrote: > >> >>> >> >> >> > >> >>> >> >> > >> >>> >> >> No, that's not what Nathaniel and I are saying at all. > Nathaniel > >> >>> >> >> was > >> >>> >> >> pointing to links for projects that care that everyone agrees > >> >>> >> >> before > >> >>> >> >> they go ahead. > >> >>> >> > > >> >>> >> > It looked to me like there was a serious intent to come to an > >> >>> >> > agreement, > >> >>> >> > or > >> >>> >> > at least closer together. The discussion in the summer was > going > >> >>> >> > around > >> >>> >> > in > >> >>> >> > circles though, and was too abstract and complex to follow. > >> >>> >> > Therefore > >> >>> >> > Mark's > >> >>> >> > choice of implementing something and then asking for feedback > >> >>> >> > made > >> >>> >> > sense > >> >>> >> > to > >> >>> >> > me. > >> >>> >> > >> >>> >> I should point out that the implementation hasn't - as far as I > can > >> >>> >> see - changed the discussion. The discussion was about the API. > >> >>> >> > >> >>> >> Implementations are useful for agreed APIs because they can point > >> >>> >> out > >> >>> >> where the API does not make sense or cannot be implemented. In > >> >>> >> this > >> >>> >> case, the API Mark said he was going to implement - he did > >> >>> >> implement - > >> >>> >> at least as far as I can see. Again, I'm happy to be corrected. > >> >>> > > >> >>> > Implementations can also help the discussion along, by allowing > >> >>> > people > >> >>> > to > >> >>> > try out some of the proposed changes. It also allows to construct > >> >>> > examples > >> >>> > that show weaknesses, possibly to be solved by an alternative API. > >> >>> > Maybe > >> >>> > you > >> >>> > can hold the complete history of this topic in your head and > >> >>> > comprehend > >> >>> > it, > >> >>> > but for me it would be very helpful if someone said: > >> >>> > - here's my dataset > >> >>> > - this is what I want to do with it > >> >>> > - this is the best I can do with the current implementation > >> >>> > - here's how API X would allow me to solve this better or simpler > >> >>> > This can be done much better with actual data and an actual > >> >>> > implementation > >> >>> > than with a design proposal. You seem to disagree with this > >> >>> > statement. > >> >>> > That's fine. I would hope though that you recognize that concrete > >> >>> > examples > >> >>> > help people like me, and construct one or two to help us out. > >> >>> That's what use-cases are for in designing APIs. There are examples > >> >>> of use in the NEP: > >> >>> > >> >>> > https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst > >> >>> > >> >>> the alterNEP: > >> >>> > >> >>> https://gist.github.com/1056379 > >> >>> > >> >>> and my longer email to Travis: > >> >>> > >> >>> > >> >>> > >> >>> > http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored > >> >>> > >> >>> Mark has done a nice job of documentation: > >> >>> > >> >>> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html > >> >>> > >> >>> If you want to understand what the alterNEP case is, I'd suggest the > >> >>> email, just because it's the most recent and I think the terminology > >> >>> is slightly clearer. > >> >>> > >> >>> Doing the same examples on a larger array won't make the point > easier > >> >>> to understand. The discussion is about what the right concepts are, > >> >>> and you can help by looking at the snippets of code in those > >> >>> documents, and deciding for yourself whether you think the current > >> >>> masking / NA implementation seems natural and easy to explain, or > >> >>> rather forced and difficult to explain, and then email back trying > to > >> >>> explain your impression (which is not always easy). > >> >> > >> >> If you seriously believe that looking at a few snippets is as helpful > >> >> and > >> >> instructive as being able to play around with them in IPython and > >> >> modify > >> >> them, then I guess we won't make progress in this part of the > >> >> discussion. > >> >> You're just telling me to go back and re-read things I'd already > read. > >> > > >> > The snippets are in ipython or doctest format - aren't they? > >> > >> Oops - 10 minute rule. Now I see that you mean that you can't > >> experiment with the alternative implementation without working code. > > > > Indeed. > > > >> > >> That's true, but I am hoping that the difference between - say: > >> > >> a[0:2] = np.NA > >> > >> and > >> > >> a.mask[0:2] = False > >> > >> would be easy enough to imagine. > > > > It is in this case. I agree the explicit ``a.mask`` is clearer. This is a > > quite specific point that could be improved in the current > implementation. > > Thanks - this is helpful. > So was your example. > > > It doesn't require ripping everything out. > > Nathaniel wasn't proposing 'ripping everything out' - but backing off > until consensus has been reached. That's different. I'm worried that in practice it won't be different. If you put such a large amount of code in a branch, with no one lined up to work on changing/improving/re-integrating it, the most likely thing to happen is that it will just sit there in a branch, bitrot and eventually be lost. > If you think we should not do that, and you are interested, please say why. > Second - I was proposing that we do indeed keep the code in the > codebase but discuss adaptations that could achieve consensus. > Glad to hear it. This is not what I understood from the email you linked to earlier. Quoting: "Honestly, I think that NA should be a synonym for ABSENT, and so should be removed until the dust has settled, and restored as (np.NA == np.ABSENT)". At this point I care much more about having a good implementation than exactly which one; the similarities are much more important the differences. My main worry is we end up with nothing. As for the current situation and way forward, Eric Firing provided a much better summary and list of important points than I managed to communicate so far. I agree with everything he said. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sun Oct 30 15:27:41 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 30 Oct 2011 12:27:41 -0700 Subject: [Numpy-discussion] consensus In-Reply-To: <4EAD996B.60302@noaa.gov> References: <4EAD996B.60302@noaa.gov> Message-ID: Hi, On Sun, Oct 30, 2011 at 11:37 AM, Chris Barker wrote: > On 10/29/11 2:59 PM, Charles R Harris wrote: > >> I'm much opposed to ripping the current code out. It isn't like it is >> (known to be) buggy, nor has anyone made the case that it isn't a basis >> on which build other options. It also smacks of gratuitous violence >> committed by someone yet to make a positive contribution to the project. > > 1) contributing to the discussion IS a positive contribution to the project. Yes, but, personally I'd rather the discussion was not about who was saying something, but what they were saying. That is, if someone proposes something, or offers a discussion, we don't first ask 'who are you', but try and engage with the substance of the argument. Best, Matthew From matthew.brett at gmail.com Sun Oct 30 15:32:53 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 30 Oct 2011 12:32:53 -0700 Subject: [Numpy-discussion] consensus (was: NA masks in the next numpy release?) In-Reply-To: References: Message-ID: Hi, On Sun, Oct 30, 2011 at 12:24 PM, Ralf Gommers wrote: > > > On Sat, Oct 29, 2011 at 11:55 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Sat, Oct 29, 2011 at 2:48 PM, Ralf Gommers >> wrote: >> > >> > >> > On Sat, Oct 29, 2011 at 11:36 PM, Matthew Brett >> > >> > wrote: >> >> >> >> Hi, >> >> >> >> On Sat, Oct 29, 2011 at 1:48 PM, Matthew Brett >> >> >> >> wrote: >> >> > Hi, >> >> > >> >> > On Sat, Oct 29, 2011 at 1:44 PM, Ralf Gommers >> >> > wrote: >> >> >> >> >> >> >> >> >> On Sat, Oct 29, 2011 at 9:04 PM, Matthew Brett >> >> >> >> >> >> wrote: >> >> >>> >> >> >>> Hi, >> >> >>> >> >> >>> On Sat, Oct 29, 2011 at 3:26 AM, Ralf Gommers >> >> >>> wrote: >> >> >>> > >> >> >>> > >> >> >>> > On Sat, Oct 29, 2011 at 1:37 AM, Matthew Brett >> >> >>> > >> >> >>> > wrote: >> >> >>> >> >> >> >>> >> Hi, >> >> >>> >> >> >> >>> >> On Fri, Oct 28, 2011 at 4:21 PM, Ralf Gommers >> >> >>> >> wrote: >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > On Sat, Oct 29, 2011 at 12:37 AM, Matthew Brett >> >> >>> >> > >> >> >>> >> > wrote: >> >> >>> >> >> >> >> >>> >> >> Hi, >> >> >>> >> >> >> >> >>> >> >> On Fri, Oct 28, 2011 at 3:14 PM, Charles R Harris >> >> >>> >> >> wrote: >> >> >>> >> >> >> >> >> >>> >> >> >> >> >>> >> >> No, that's not what Nathaniel and I are saying at all. >> >> >>> >> >> Nathaniel >> >> >>> >> >> was >> >> >>> >> >> pointing to links for projects that care that everyone agrees >> >> >>> >> >> before >> >> >>> >> >> they go ahead. >> >> >>> >> > >> >> >>> >> > It looked to me like there was a serious intent to come to an >> >> >>> >> > agreement, >> >> >>> >> > or >> >> >>> >> > at least closer together. The discussion in the summer was >> >> >>> >> > going >> >> >>> >> > around >> >> >>> >> > in >> >> >>> >> > circles though, and was too abstract and complex to follow. >> >> >>> >> > Therefore >> >> >>> >> > Mark's >> >> >>> >> > choice of implementing something and then asking for feedback >> >> >>> >> > made >> >> >>> >> > sense >> >> >>> >> > to >> >> >>> >> > me. >> >> >>> >> >> >> >>> >> I should point out that the implementation hasn't - as far as I >> >> >>> >> can >> >> >>> >> see - changed the discussion. ?The discussion was about the API. >> >> >>> >> >> >> >>> >> Implementations are useful for agreed APIs because they can >> >> >>> >> point >> >> >>> >> out >> >> >>> >> where the API does not make sense or cannot be implemented. ?In >> >> >>> >> this >> >> >>> >> case, the API Mark said he was going to implement - he did >> >> >>> >> implement - >> >> >>> >> at least as far as I can see. ?Again, I'm happy to be corrected. >> >> >>> > >> >> >>> > Implementations can also help the discussion along, by allowing >> >> >>> > people >> >> >>> > to >> >> >>> > try out some of the proposed changes. It also allows to construct >> >> >>> > examples >> >> >>> > that show weaknesses, possibly to be solved by an alternative >> >> >>> > API. >> >> >>> > Maybe >> >> >>> > you >> >> >>> > can hold the complete history of this topic in your head and >> >> >>> > comprehend >> >> >>> > it, >> >> >>> > but for me it would be very helpful if someone said: >> >> >>> > - here's my dataset >> >> >>> > - this is what I want to do with it >> >> >>> > - this is the best I can do with the current implementation >> >> >>> > - here's how API X would allow me to solve this better or simpler >> >> >>> > This can be done much better with actual data and an actual >> >> >>> > implementation >> >> >>> > than with a design proposal. You seem to disagree with this >> >> >>> > statement. >> >> >>> > That's fine. I would hope though that you recognize that concrete >> >> >>> > examples >> >> >>> > help people like me, and construct one or two to help us out. >> >> >>> That's what use-cases are for in designing APIs. ?There are >> >> >>> examples >> >> >>> of use in the NEP: >> >> >>> >> >> >>> >> >> >>> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst >> >> >>> >> >> >>> the alterNEP: >> >> >>> >> >> >>> https://gist.github.com/1056379 >> >> >>> >> >> >>> and my longer email to Travis: >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> http://article.gmane.org/gmane.comp.python.numeric.general/46544/match=ignored >> >> >>> >> >> >>> Mark has done a nice job of documentation: >> >> >>> >> >> >>> http://docs.scipy.org/doc/numpy/reference/arrays.maskna.html >> >> >>> >> >> >>> If you want to understand what the alterNEP case is, I'd suggest >> >> >>> the >> >> >>> email, just because it's the most recent and I think the >> >> >>> terminology >> >> >>> is slightly clearer. >> >> >>> >> >> >>> Doing the same examples on a larger array won't make the point >> >> >>> easier >> >> >>> to understand. ?The discussion is about what the right concepts >> >> >>> are, >> >> >>> and you can help by looking at the snippets of code in those >> >> >>> documents, and deciding for yourself whether you think the current >> >> >>> masking / NA implementation seems natural and easy to explain, or >> >> >>> rather forced and difficult to explain, and then email back trying >> >> >>> to >> >> >>> explain your impression (which is not always easy). >> >> >> >> >> >> If you seriously believe that looking at a few snippets is as >> >> >> helpful >> >> >> and >> >> >> instructive as being able to play around with them in IPython and >> >> >> modify >> >> >> them, then I guess we won't make progress in this part of the >> >> >> discussion. >> >> >> You're just telling me to go back and re-read things I'd already >> >> >> read. >> >> > >> >> > The snippets are in ipython or doctest format - aren't they? >> >> >> >> Oops - 10 minute rule. ?Now I see that you mean that you can't >> >> experiment with the alternative implementation without working code. >> > >> > Indeed. >> > >> >> >> >> That's true, but I am hoping that the difference between - say: >> >> >> >> a[0:2] = np.NA >> >> >> >> and >> >> >> >> a.mask[0:2] = False >> >> >> >> would be easy enough to imagine. >> > >> > It is in this case. I agree the explicit ``a.mask`` is clearer. This is >> > a >> > quite specific point that could be improved in the current >> > implementation. >> >> Thanks - this is helpful. > > So was your example. >> >> > It doesn't require ripping everything out. >> >> Nathaniel wasn't proposing 'ripping everything out' - but backing off >> until consensus has been reached. ?That's different. > > I'm worried that in practice it won't be different. If you put such a large > amount of code in a branch, with no one lined up to work on > changing/improving/re-integrating it, the most likely thing to happen is > that it will just sit there in a branch, bitrot and eventually be lost. > >> >> If you think we should not do that, and you are interested, please say >> why. >> Second - I was proposing that we do indeed keep the code in the >> codebase but discuss adaptations that could achieve consensus. > > Glad to hear it. This is not what I understood from the email you linked to > earlier. Quoting: "Honestly, I think that NA should be a synonym for ABSENT, > and so should be removed until the dust has settled, and restored as (np.NA > == np.ABSENT)". I was proposing that the name 'np.NA' should be removed, leaving np.IGNORED (with the same meaning as the current np.NA) and np.ABSENT currently not implemented. When it does get implemented, then, in due course, make np.NA a synonym for np.ABSENT. I'm sorry that wasn't obvious. > At this point I care much more about having a good implementation than > exactly which one; the similarities are much more important the differences. > My main worry is we end up with nothing. I don't think any proposed route ended up with nothing. Nathaniel was only suggesting backing off until we had done the work of agreeing. It doesn't look like that has much support; that's fine. Best, Matthew From markflorisson88 at gmail.com Sun Oct 30 16:48:25 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 30 Oct 2011 20:48:25 +0000 Subject: [Numpy-discussion] NumPy nogil API Message-ID: Hello, First, I'd like to report a bug. It seems ndarray does not implement tp_traverse or tp_clear, so if you have a reference cycle in an ndarray with dtype object none of those objects will ever be collected. Secondly, please bear with me, I'm not a NumPy expert, but would it be possible to have NumPy export an API that can be called without the GIL? In the upcoming Cython release 0.16 (soon to be released) we will have what is called typed memoryviews [1], which allow you to obtain a typed view on any PEP 3118 buffer, and it allows you to do a lot more things without the GIL. e.g. it allows you to pass these views around, transpose them, slice them (in the same way as in NumPy but slightly more restricted, it doesn't support masks and such), index them etc without the GIL. However, there is a lot of good functionality in NumPy that is simply not accessible without going through a Python and GIL layer, only to have NumPy (possibly) release the GIL. So instead we could have an API that takes a C-level ndarray view descriptor (shape/strides/ndim (possibly suboffsets), base type description) that would do the actual work (and perhaps doesn't even need to know anything about Python) and won't need the GIL (this wouldn't work for the dtype object, of course). The Py_buffer struct comes to mind, but format strings are hard to deal with. It would be the caller's responsibility to do any necessary synchronization. This API would simply be wrapped by a Python API that gets this "view" from the PyArrayObject, and which may or may not decide to release the GIL, and call the nogil API. This wrapping API could even be written easily in Cython. As for exceptions and error handling, there are many ways we can think of doing this without requiring the GIL. One of the reasons that I think this is important is that when you're using cython.parallel [2] you don't want to hold the gil, but you do want your NumPy goodies. Cython re-implements a very small subset of NumPy to get you the core functionality, but to get back to NumPy world you have to acquire the GIL, convert your memoryview slice to an ndarray (using the buffer interface through numpy.asarray()) and then have NumPy operate on that. It's a pain to write and it's terrible for performance. Even if you forget the GIL part, there's still the (expensive and explicit) conversion. In general I think there might be many advantages to such functionality other than for Cython. There shouldn't really be a reason to tie NumPy only to the CPython platform. Anyway, what do you guys think, does this make any sense? Mark [1]: https://sage.math.washington.edu:8091/hudson/job/cython-docs/doclinks/1/src/userguide/memoryviews.html [2]: http://docs.cython.org/src/userguide/parallelism.html From pav at iki.fi Sun Oct 30 17:01:27 2011 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 30 Oct 2011 22:01:27 +0100 Subject: [Numpy-discussion] NumPy nogil API In-Reply-To: References: Message-ID: 30.10.2011 21:48, mark florisson kirjoitti: > First, I'd like to report a bug. It seems ndarray does not implement > tp_traverse or tp_clear, so if you have a reference cycle in an > ndarray with dtype object none of those objects will ever be > collected. Indeed, this is missing. http://projects.scipy.org/numpy/ticket/1003 If I recall correctly, there was something painful in implementing this for Numpy arrays, though... > Secondly, please bear with me, I'm not a NumPy expert, but would it be > possible to have NumPy export an API that can be called without the > GIL? In the upcoming Cython release 0.16 (soon to be released) we will > have what is called typed memoryviews [1], which allow you to obtain a > typed view on any PEP 3118 buffer, and it allows you to do a lot more > things without the GIL. e.g. it allows you to pass these views around, > transpose them, slice them (in the same way as in NumPy but slightly > more restricted, it doesn't support masks and such), index them etc > without the GIL. The closest thing to making this to happen is the work made on porting Numpy to IronPython. Basically, a major part of that involved ripping the innards of Numpy out to a more reusable C library. It's been in a merge-limbo for some time now, however. -- Pauli Virtanen From josef.pktd at gmail.com Sun Oct 30 20:22:11 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 30 Oct 2011 20:22:11 -0400 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: References: Message-ID: On Sun, Oct 30, 2011 at 7:18 AM, David Cournapeau wrote: > On Thu, Oct 27, 2011 at 5:19 PM, Ralf Gommers > wrote: >> Hi David, >> >> On Thu, Oct 27, 2011 at 3:02 PM, David Cournapeau >> wrote: >>> >>> Hi, >>> >>> I was wondering if we could finally move to a more recent version of >>> compilers for official win32 installers. This would of course concern >>> the next release cycle, not the ones where beta/rc are already in >>> progress. >>> >>> Basically, the pros: >>> ?- we will have to move at some point >>> ?- gcc 4.* seem less buggy, especially C++ and fortran. >>> ?- no need to maintain msvcr90 vodoo >>> The cons: >>> ?- it will most likely break the ABI >>> ?- we need to recompile atlas (but I can take care of it) >>> ?- the biggest: it is difficult to combine gfortran with visual >>> studio (more exactly you cannot link gfortran runtime to a visual >>> studio executable). The only solution I could think of would be to >>> recompile the gfortran runtime with Visual Studio, which for some >>> reason does not sound very appealing :) >> >> To get the datetime changes to work with MinGW, we already concluded that >> building with 4.x is more or less required (without recognizing some of the >> points you list above). Changes to mingw32ccompiler to fix compilation with >> 4.x went in in https://github.com/numpy/numpy/pull/156. It would be good if >> you could check those. > > I will look into it more carefully, but overall, it seems that > building atlas 3.8.4, numpy and scipy with gcc 4.x works quite well. > The main issue is that gcc 4.* adds some dependencies on mingw dlls. > There are two options: > ?- adding the dlls in the installers > ?- statically linking those, which seems to be a bad idea > (generalizing the dll boundaries problem to exception and things we > would rather not care about: > http://cygwin.com/ml/cygwin/2007-06/msg00332.html). > >> It probably makes sense make this move for numpy 1.7. If this breaks the ABI >> then it would be easiest to make numpy 1.7 the minimum required version for >> scipy 0.11. > > My thinking as well. It looks like it's really time to upgrade. pythonxy comes with gfortran and a new MingW, and I cannot build scipy anymore. I don't find an installer for the old MingW 3.5 anymore for my new computer. (I haven't seen any problems mixing gcc 4.4 with C/C++ code like scikits.learn against the official numpy/scipy installer versions.) I can volunteer for testing, since it looks like I'm set up for gfortran. Josef > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From akshar.bhosale at gmail.com Sun Oct 30 23:45:19 2011 From: akshar.bhosale at gmail.com (akshar bhosale) Date: Mon, 31 Oct 2011 09:15:19 +0530 Subject: [Numpy-discussion] problem in running test(nose) with numpy/scipy Message-ID: hi, i have installed numpy (1.6.0) and scipy (0.9). , nose version is 1.0 i have intel cluster toolkit installed on my system. (11/069 version and mkl 10.3). i have machine having intel xeon processor and rhel 5.2 x86_64 platform. i have installed it with intel compilers. when i execute numpy.test and scipy.test, it hangs . numpy.test(verbose=3) Running unit tests for numpy NumPy version 1.6.0 NumPy is installed in /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy Python version 2.6 (r26:66714, May 29 2011, 15:10:47) [GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] nose version 1.0.0 nose.config: INFO: Excluding tests matching ['f2py_ext', 'f2py_f90_ext', 'gen_ext', 'pyrex_ext', 'swig_ext'] nose.selector: INFO: /home/akshar/Python-2.6/lib/ python2.6/site-packages/numpy/core/multiarray.so is executable; skipped nose.selector: INFO: /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy/core/scalarmath.so is executable; skipped nose.selector: INFO: /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy/core/umath.so is executable; skipped nose.selector: INFO: /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy/core/multiarray_tests.so is executable; skipped nose.selector: INFO: /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy/core/umath_tests.so is executable; skipped nose.selector: INFO: /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy/fft/fftpack_lite.so is executable; skipped nose.selector: INFO: /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy/linalg/lapack_lite.so is executable; skipped nose.selector: INFO: /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy/random/mtrand.so is executable; skipped test_api.test_fastCopyAndTranspose ... ok test_arrayprint.TestArrayRepr.test_nan_inf ... ok test_str (test_arrayprint.TestComplexArray) ... ok Ticket 844. ... ok test_blasdot.test_blasdot_used ... ok test_blasdot.test_dot_2args ... ok test_blasdot.test_dot_3args ... ok test_blasdot.test_dot_3args_errors ... ok test_creation (test_datetime.TestDateTime) ... ok test_creation_overflow (test_datetime.TestDateTime) ... ok test_divisor_conversion_as (test_datetime.TestDateTime) ... ok test_divisor_conversion_bday (test_datetime.TestDateTime) ... ok test_divisor_conversion_day (test_datetime.TestDateTime) ... ok test_divisor_conversion_fs (test_datetime.TestDateTime) ... ok test_divisor_conversion_hour (test_datetime.TestDateTime) ... ok test_divisor_conversion_minute (test_datetime.TestDateTime) ... ok test_divisor_conversion_month (test_datetime.TestDateTime) ... ok test_divisor_conversion_second (test_datetime.TestDateTime) ... ok test_divisor_conversion_week (test_datetime.TestDateTime) ... ok test_divisor_conversion_year (test_datetime.TestDateTime) ... ok test_hours (test_datetime.TestDateTime) ... ok test_from_object_array (test_defchararray.TestBasic) ... ok test_from_object_array_unicode (test_defchararray.TestBasic) ... ok test_from_string (test_defchararray.TestBasic) ... ok test_from_string_array (test_defchararray.TestBasic) ... ok test_from_unicode (test_defchararray.TestBasic) ... ok test_from_unicode_array (test_defchararray.TestBasic) ... ok test_unicode_upconvert (test_defchararray.TestBasic) ... ok test_it (test_defchararray.TestChar) ... ok test_equal (test_defchararray.TestComparisons) ... ok test_greater (test_defchararray.TestComparisons) ... ok test_greater_equal (test_defchararray.TestComparisons) ... ok test_less (test_defchararray.TestComparisons) ... ok test_less_equal (test_defchararray.TestComparisons) ... ok test_not_equal (test_defchararray.TestComparisons) ... ok test_equal (test_defchararray.TestComparisonsMixed1) ... ok test_greater (test_defchararray.TestComparisonsMixed1) ... ok test_greater_equal (test_defchararray.TestComparisonsMixed1) ... ok test_less (test_defchararray.TestComparisonsMixed1) ... ok test_less_equal (test_defchararray.TestComparisonsMixed1) ... ok test_not_equal (test_defchararray.TestComparisonsMixed1) ... ok test_equal (test_defchararray.TestComparisonsMixed2) ... ok test_greater (test_defchararray.TestComparisonsMixed2) ... ok test_greater_equal (test_defchararray.TestComparisonsMixed2) ... ok test_less (test_defchararray.TestComparisonsMixed2) ... ok test_less_equal (test_defchararray.TestComparisonsMixed2) ... ok test_not_equal (test_defchararray.TestComparisonsMixed2) ... ok test_count (test_defchararray.TestInformation) ... ok test_endswith (test_defchararray.TestInformation) ... ok test_find (test_defchararray.TestInformation) ... ok test_index (test_defchararray.TestInformation) ... ok test_isalnum (test_defchararray.TestInformation) ... ok test_isalpha (test_defchararray.TestInformation) ... ok test_isdigit (test_defchararray.TestInformation) ... ok test_islower (test_defchararray.TestInformation) ... ok test_isspace (test_defchararray.TestInformation) ... ok test_istitle (test_defchararray.TestInformation) ... ok test_isupper (test_defchararray.TestInformation) ... ok test_len (test_defchararray.TestInformation) ... ok test_rfind (test_defchararray.TestInformation) ... ok test_rindex (test_defchararray.TestInformation) ... ok test_startswith (test_defchararray.TestInformation) ... ok test_capitalize (test_defchararray.TestMethods) ... ok test_center (test_defchararray.TestMethods) ... ok test_decode (test_defchararray.TestMethods) ... ok test_encode (test_defchararray.TestMethods) ... ok test_expandtabs (test_defchararray.TestMethods) ... ok test_isdecimal (test_defchararray.TestMethods) ... ok test_isnumeric (test_defchararray.TestMethods) ... ok test_join (test_defchararray.TestMethods) ... ok test_ljust (test_defchararray.TestMethods) ... ok test_lower (test_defchararray.TestMethods) ... ok test_lstrip (test_defchararray.TestMethods) ... ok test_partition (test_defchararray.TestMethods) ... ok test_replace (test_defchararray.TestMethods) ... ok test_rjust (test_defchararray.TestMethods) ... ok test_rpartition (test_defchararray.TestMethods) ... ok test_rsplit (test_defchararray.TestMethods) ... ok test_rstrip (test_defchararray.TestMethods) ... ok test_split (test_defchararray.TestMethods) ... ok test_splitlines (test_defchararray.TestMethods) ... ok test_strip (test_defchararray.TestMethods) ... ok test_swapcase (test_defchararray.TestMethods) ... ok test_title (test_defchararray.TestMethods) ... ok test_upper (test_defchararray.TestMethods) ... ok test_add (test_defchararray.TestOperations) ... ok Ticket #856 ... ok test_mul (test_defchararray.TestOperations) ... ok test_radd (test_defchararray.TestOperations) ... ok test_rmod (test_defchararray.TestOperations) ... ok test_rmul (test_defchararray.TestOperations) ... ok test_broadcast_error (test_defchararray.TestVecString) ... ok test_invalid_args_tuple (test_defchararray.TestVecString) ... ok test_invalid_function_args (test_defchararray.TestVecString) ... ok test_invalid_result_type (test_defchararray.TestVecString) ... ok test_invalid_type_descr (test_defchararray.TestVecString) ... ok test_non_existent_method (test_defchararray.TestVecString) ... ok test_non_string_array (test_defchararray.TestVecString) ... ok test1 (test_defchararray.TestWhitespace) ... ok test_dtype (test_dtype.TestBuiltin) ... ok Only test hash runs at all. ... ok test_metadata_rejects_nondict (test_dtype.TestMetadata) ... ok test_metadata_takes_dict (test_dtype.TestMetadata) ... ok test_nested_metadata (test_dtype.TestMetadata) ... ok test_no_metadata (test_dtype.TestMetadata) ... ok test1 (test_dtype.TestMonsterType) ... ok test_different_names (test_dtype.TestRecord) ... ok test_different_titles (test_dtype.TestRecord) ... ok Test whether equivalent record dtypes hash the same. ... ok Test if an appropriate exception is raised when passing bad values to ... ok Test whether equivalent subarray dtypes hash the same. ... ok Test whether different subarray dtypes hash differently. ... ok Test some data types that are equal ... ok Test some more complicated cases that shouldn't be equal ... ok Test some simple cases that shouldn't be equal ... ok test_single_subarray (test_dtype.TestSubarray) ... ok test_einsum_errors (test_einsum.TestEinSum) ... ok test_einsum_sums_cfloat128 (test_einsum.TestEinSum) ... It hangs here.. -------------- next part -------------- An HTML attachment was scrubbed... URL: From markflorisson88 at gmail.com Mon Oct 31 04:44:41 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 31 Oct 2011 08:44:41 +0000 Subject: [Numpy-discussion] NumPy nogil API In-Reply-To: References: Message-ID: On 30 October 2011 21:01, Pauli Virtanen wrote: > 30.10.2011 21:48, mark florisson kirjoitti: >> First, I'd like to report a bug. It seems ndarray does not implement >> tp_traverse or tp_clear, so if you have a reference cycle in an >> ndarray with dtype object none of those objects will ever be >> collected. > > Indeed, this is missing. http://projects.scipy.org/numpy/ticket/1003 > > If I recall correctly, there was something painful in implementing this > for Numpy arrays, though... > >> Secondly, please bear with me, I'm not a NumPy expert, but would it be >> possible to have NumPy export an API that can be called without the >> GIL? In the upcoming Cython release 0.16 (soon to be released) we will >> have what is called typed memoryviews [1], which allow you to obtain a >> typed view on any PEP 3118 buffer, and it allows you to do a lot more >> things without the GIL. e.g. it allows you to pass these views around, >> transpose them, slice them (in the same way as in NumPy but slightly >> more restricted, it doesn't support masks and such), index them etc >> without the GIL. > > The closest thing to making this to happen is the work made on porting > Numpy to IronPython. Basically, a major part of that involved ripping > the innards of Numpy out to a more reusable C library. It's been in a > merge-limbo for some time now, however. Ah, that's too bad. Is it anywhere near ready, or was it abandoned for ironclad? Could you point me to the code? > -- > Pauli Virtanen > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From pav at iki.fi Mon Oct 31 05:50:28 2011 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 31 Oct 2011 10:50:28 +0100 Subject: [Numpy-discussion] NumPy nogil API In-Reply-To: References: Message-ID: 31.10.2011 09:44, mark florisson kirjoitti: [clip] > Ah, that's too bad. Is it anywhere near ready, or was it abandoned for > ironclad? Could you point me to the code? It's quite ready and working, and as far as I understand, Enthought is shipping it. I haven't used it, though. The code is here: https://github.com/numpy/numpy-refactor Pauli From d.s.seljebotn at astro.uio.no Mon Oct 31 06:03:46 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 31 Oct 2011 11:03:46 +0100 Subject: [Numpy-discussion] NumPy nogil API In-Reply-To: References: Message-ID: Mark: I'm just wondering what you wanted to do with NumPy from Cython -- a stopgap solution for SIMD, iterator support, or something else? SIMD using NumPy really isn't the best idea long-term because of all the temporaries needed in compound expressions, which is really bad on the memory bus for anything but tiny arrays. For that I'd rather look at finding a nogil core of numexpr or similar. Of course, there is a number of convenient NumPy utility functions which would be cool to have in nogil mode... But given that the GIL is a problem in so many cases, I wonder how far it is really possible to go even given the refactored numpy core. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. Pauli Virtanen wrote: 31.10.2011 09:44, mark florisson kirjoitti: [clip] > Ah, that's too bad. Is it anywhere near ready, or was it abandoned for > ironclad? Could you point me to the code? It's quite ready and working, and as far as I understand, Enthought is shipping it. I haven't used it, though. The code is here: https://github.com/numpy/numpy-refactor Pauli_____________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Mon Oct 31 06:22:29 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 31 Oct 2011 11:22:29 +0100 Subject: [Numpy-discussion] Process-shared memory allocation per default? Message-ID: <4EAE76E5.5020608@astro.uio.no> This comes out of a long discussion on the Cython list. Following Mark's success with the shared memory parallelism, the question is: Where to take Cython's capabilities for parallelism further? One thing that's been up now and then is that we could basically use something like: - multiprocessing (to get rid of any GIL issues) - allocate all NumPy arrays in process-shared memory; passing NumPy arrays between processes happens by "picling views". This can be done with current NumPy by using a seperate constructor, e.g., a = sharedmem_zeros((3, 3)) However, construction of the array feels like the wrong place to make this decision. It is really when it is sent to another process the decision should be made. If all NumPy arrays are allocated in shared memory per default, one could do shared_queue.put(a.shared()) and shared() would wrap a in something that pickled a shared memory pointer rather than the data (and unpickled directly to the NumPy array). I just find this *a lot* more convenient than the tedious business of making sure the memory is allocated in the right way everywhere. Any downsides to doing this? (Additional overhead for small arrays perhaps?) - On the Cython end, parallelism could then both be supported by low-level message passing using ZeroMQ (possibly with syntax candy for sending typed messages), or with another multiprocessing backend to the current prange which requires that any memoryviews worked on are allocated in shared memory. I'm just looking for feedback here. I don't have cycles in terms of implementation; the point is that what NumPy users and devs are thinking about this could direct the further discussion of parallelism within Cython. Dag Sverre From markflorisson88 at gmail.com Mon Oct 31 06:48:04 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 31 Oct 2011 10:48:04 +0000 Subject: [Numpy-discussion] NumPy nogil API In-Reply-To: References: Message-ID: On 31 October 2011 10:03, Dag Sverre Seljebotn wrote: > Mark: I'm just wondering what you wanted to do with NumPy from Cython -- a > stopgap solution for SIMD, iterator support, or something else? > > SIMD using NumPy really isn't the best idea long-term because of all the > temporaries needed in compound expressions, which is really bad on the > memory bus for anything but tiny arrays. For that I'd rather look at finding > a nogil core of numexpr or similar. Yes I'm aware of numexpr and the general problem with array expressions in NumPy. It's not just about SIMD or iterators, it's as you say below, there's lots of stuff that wouldn't be available even if we get SIMD. And if NumPy would get such an API, Cython could figure out how many (or if) temporaries are actually needed and call into the NumPy API with inplace operations. The thing is, how much of NumPy (and numexpr or theano) does Cython want to reimplement? Will you stop at SIMD with elemental functions? And will it run on my GPU? I suppose from a purity perspective I'd just like this functionality to be available in a library and have my language use the library efficiently behind my back, instead of implementing everything itself. > Of course, there is a number of convenient NumPy utility functions which > would be cool to have in nogil mode... But given that the GIL is a problem > in so many cases, I wonder how far it is really possible to go even given > the refactored numpy core. > -- > Sent from my Android phone with K-9 Mail. Please excuse my brevity. > > Pauli Virtanen wrote: >> >> 31.10.2011 09:44, mark florisson kirjoitti: [clip] > Ah, that's too bad. >> Is it anywhere near ready, or was it abandoned for > ironclad? Could you >> point me to the code? It's quite ready and working, and as far as I >> understand, Enthought is shipping it. I haven't used it, though. The code is >> here: https://github.com/numpy/numpy-refactor Pauli >> ________________________________ >> NumPy-Discussion mailing list NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From markflorisson88 at gmail.com Mon Oct 31 06:48:09 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 31 Oct 2011 10:48:09 +0000 Subject: [Numpy-discussion] NumPy nogil API In-Reply-To: References: Message-ID: On 31 October 2011 09:50, Pauli Virtanen wrote: > 31.10.2011 09:44, mark florisson kirjoitti: > [clip] >> Ah, that's too bad. Is it anywhere near ready, or was it abandoned for >> ironclad? Could you point me to the code? > > It's quite ready and working, and as far as I understand, Enthought is > shipping it. I haven't used it, though. > > The code is here: https://github.com/numpy/numpy-refactor > > ? ? ? ?Pauli Cool, thanks. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From d.s.seljebotn at astro.uio.no Mon Oct 31 07:01:33 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 31 Oct 2011 12:01:33 +0100 Subject: [Numpy-discussion] NumPy nogil API In-Reply-To: References: Message-ID: <4EAE800D.3030006@astro.uio.no> On 10/31/2011 11:48 AM, mark florisson wrote: > On 31 October 2011 10:03, Dag Sverre Seljebotn > wrote: >> Mark: I'm just wondering what you wanted to do with NumPy from Cython -- a >> stopgap solution for SIMD, iterator support, or something else? >> >> SIMD using NumPy really isn't the best idea long-term because of all the >> temporaries needed in compound expressions, which is really bad on the >> memory bus for anything but tiny arrays. For that I'd rather look at finding >> a nogil core of numexpr or similar. > > Yes I'm aware of numexpr and the general problem with array > expressions in NumPy. It's not just about SIMD or iterators, it's as > you say below, there's lots of stuff that wouldn't be available even > if we get SIMD. And if NumPy would get such an API, Cython could > figure out how many (or if) temporaries are actually needed and call > into the NumPy API with inplace operations. > > The thing is, how much of NumPy (and numexpr or theano) does Cython > want to reimplement? Will you stop at SIMD with elemental functions? > And will it run on my GPU? > > I suppose from a purity perspective I'd just like this functionality > to be available in a library and have my language use the library > efficiently behind my back, instead of implementing everything itself. I do totally agree, but I'm also afraid that this is a neverending quest as long as the GIL is present in CPython. There will always be stuff I'd like to call without the GIL. Only NumPy is not sufficient; I'd also like to use all the scientific libraries which relies on and extends NumPy (all of SciPy for starters), and so on. I do feel that what we have + SIMD covers just enough situations that it is useful for writing "numerical cores", without needing the rest of NumPy. If one starts to pull in more conveniences then I feel I might equally likely need something in SciPy. I'm not really against what you try to do; any progress at all on how much one can do without the GIL is great, I'm just playing the devil's advocate for a bit. Dag Sverre From d.s.seljebotn at astro.uio.no Mon Oct 31 07:17:41 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 31 Oct 2011 12:17:41 +0100 Subject: [Numpy-discussion] NumPy nogil API In-Reply-To: <4EAE800D.3030006@astro.uio.no> References: <4EAE800D.3030006@astro.uio.no> Message-ID: <4EAE83D5.5020604@astro.uio.no> On 10/31/2011 12:01 PM, Dag Sverre Seljebotn wrote: > On 10/31/2011 11:48 AM, mark florisson wrote: >> On 31 October 2011 10:03, Dag Sverre Seljebotn >> wrote: >>> Mark: I'm just wondering what you wanted to do with NumPy from Cython -- a >>> stopgap solution for SIMD, iterator support, or something else? >>> >>> SIMD using NumPy really isn't the best idea long-term because of all the >>> temporaries needed in compound expressions, which is really bad on the >>> memory bus for anything but tiny arrays. For that I'd rather look at finding >>> a nogil core of numexpr or similar. >> >> Yes I'm aware of numexpr and the general problem with array >> expressions in NumPy. It's not just about SIMD or iterators, it's as >> you say below, there's lots of stuff that wouldn't be available even >> if we get SIMD. And if NumPy would get such an API, Cython could >> figure out how many (or if) temporaries are actually needed and call >> into the NumPy API with inplace operations. >> >> The thing is, how much of NumPy (and numexpr or theano) does Cython >> want to reimplement? Will you stop at SIMD with elemental functions? >> And will it run on my GPU? >> >> I suppose from a purity perspective I'd just like this functionality >> to be available in a library and have my language use the library >> efficiently behind my back, instead of implementing everything itself. > > I do totally agree, but I'm also afraid that this is a neverending quest > as long as the GIL is present in CPython. There will always be stuff I'd > like to call without the GIL. Only NumPy is not sufficient; I'd also > like to use all the scientific libraries which relies on and extends > NumPy (all of SciPy for starters), and so on. As an example, it'd be nice to have scipy.ndimage available without the GIL: http://docs.scipy.org/doc/scipy/reference/ndimage.html Now, this *can* easily be done as the core is written in C++. I'm just pointing out that some people may wish more for calling scipy.ndimage inside their prange than for some parts of NumPy. Although if the proposal is some way of writing Cython code that makes skipping the GIL possible to a larger degree than today then this is indeed highly relevant. People are already talking loudly about reimplementing NumPy in Cython (e.g., Travis Oliphant on his blog)., and SciPy is already largely reimplemented in Cython for the .NET port. Dag Sverre From shish at keba.be Mon Oct 31 09:31:56 2011 From: shish at keba.be (Olivier Delalleau) Date: Mon, 31 Oct 2011 09:31:56 -0400 Subject: [Numpy-discussion] problem in running test(nose) with numpy/scipy In-Reply-To: References: Message-ID: If you google around "einsum hang", it looks like this is a problem with Intel compiler with the -O3 flag. See this thread in particular: http://comments.gmane.org/gmane.comp.python.numeric.general/43168 It looks like there may be more issues too... -=- Olivier 2011/10/30 akshar bhosale > hi, > > i have installed numpy (1.6.0) and scipy (0.9). , nose version is 1.0 > i have intel cluster toolkit installed on my system. (11/069 version and > mkl 10.3). i have machine having intel xeon processor and rhel 5.2 x86_64 > platform. i have installed it with intel compilers. > when i execute numpy.test and scipy.test, it hangs . > numpy.test(verbose=3) > > Running unit tests for numpy > NumPy version 1.6.0 > NumPy is installed in > /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy > Python version 2.6 (r26:66714, May 29 2011, 15:10:47) [GCC 4.1.2 20071124 > (Red Hat 4.1.2-42)] > nose version 1.0.0 > nose.config: INFO: Excluding tests matching ['f2py_ext', 'f2py_f90_ext', > 'gen_ext', 'pyrex_ext', 'swig_ext'] > nose.selector: INFO: /home/akshar/Python-2.6/lib/ > python2.6/site-packages/numpy/core/multiarray.so is executable; skipped > nose.selector: INFO: > /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy/core/scalarmath.so > is executable; skipped > nose.selector: INFO: > /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy/core/umath.so is > executable; skipped > nose.selector: INFO: > /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy/core/multiarray_tests.so > is executable; skipped > nose.selector: INFO: > /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy/core/umath_tests.so > is executable; skipped > nose.selector: INFO: > /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy/fft/fftpack_lite.so > is executable; skipped > nose.selector: INFO: > /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy/linalg/lapack_lite.so > is executable; skipped > nose.selector: INFO: > /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy/random/mtrand.so > is executable; skipped > test_api.test_fastCopyAndTranspose ... ok > test_arrayprint.TestArrayRepr.test_nan_inf ... ok > test_str (test_arrayprint.TestComplexArray) ... ok > Ticket 844. ... ok > test_blasdot.test_blasdot_used ... ok > test_blasdot.test_dot_2args ... ok > test_blasdot.test_dot_3args ... ok > test_blasdot.test_dot_3args_errors ... ok > test_creation (test_datetime.TestDateTime) ... ok > test_creation_overflow (test_datetime.TestDateTime) ... ok > test_divisor_conversion_as (test_datetime.TestDateTime) ... ok > test_divisor_conversion_bday (test_datetime.TestDateTime) ... ok > test_divisor_conversion_day (test_datetime.TestDateTime) ... ok > test_divisor_conversion_fs (test_datetime.TestDateTime) ... ok > test_divisor_conversion_hour (test_datetime.TestDateTime) ... ok > test_divisor_conversion_minute (test_datetime.TestDateTime) ... ok > test_divisor_conversion_month (test_datetime.TestDateTime) ... ok > test_divisor_conversion_second (test_datetime.TestDateTime) ... ok > test_divisor_conversion_week (test_datetime.TestDateTime) ... ok > test_divisor_conversion_year (test_datetime.TestDateTime) ... ok > test_hours (test_datetime.TestDateTime) ... ok > test_from_object_array (test_defchararray.TestBasic) ... ok > test_from_object_array_unicode (test_defchararray.TestBasic) ... ok > test_from_string (test_defchararray.TestBasic) ... ok > test_from_string_array (test_defchararray.TestBasic) ... ok > test_from_unicode (test_defchararray.TestBasic) ... ok > test_from_unicode_array (test_defchararray.TestBasic) ... ok > test_unicode_upconvert (test_defchararray.TestBasic) ... ok > test_it (test_defchararray.TestChar) ... ok > test_equal (test_defchararray.TestComparisons) ... ok > test_greater (test_defchararray.TestComparisons) ... ok > test_greater_equal (test_defchararray.TestComparisons) ... ok > test_less (test_defchararray.TestComparisons) ... ok > test_less_equal (test_defchararray.TestComparisons) ... ok > test_not_equal (test_defchararray.TestComparisons) ... ok > test_equal (test_defchararray.TestComparisonsMixed1) ... ok > test_greater (test_defchararray.TestComparisonsMixed1) ... ok > test_greater_equal (test_defchararray.TestComparisonsMixed1) ... ok > test_less (test_defchararray.TestComparisonsMixed1) ... ok > test_less_equal (test_defchararray.TestComparisonsMixed1) ... ok > test_not_equal (test_defchararray.TestComparisonsMixed1) ... ok > test_equal (test_defchararray.TestComparisonsMixed2) ... ok > test_greater (test_defchararray.TestComparisonsMixed2) ... ok > test_greater_equal (test_defchararray.TestComparisonsMixed2) ... ok > test_less (test_defchararray.TestComparisonsMixed2) ... ok > test_less_equal (test_defchararray.TestComparisonsMixed2) ... ok > test_not_equal (test_defchararray.TestComparisonsMixed2) ... ok > test_count (test_defchararray.TestInformation) ... ok > test_endswith (test_defchararray.TestInformation) ... ok > test_find (test_defchararray.TestInformation) ... ok > test_index (test_defchararray.TestInformation) ... ok > test_isalnum (test_defchararray.TestInformation) ... ok > test_isalpha (test_defchararray.TestInformation) ... ok > test_isdigit (test_defchararray.TestInformation) ... ok > test_islower (test_defchararray.TestInformation) ... ok > test_isspace (test_defchararray.TestInformation) ... ok > test_istitle (test_defchararray.TestInformation) ... ok > test_isupper (test_defchararray.TestInformation) ... ok > test_len (test_defchararray.TestInformation) ... ok > test_rfind (test_defchararray.TestInformation) ... ok > test_rindex (test_defchararray.TestInformation) ... ok > test_startswith (test_defchararray.TestInformation) ... ok > test_capitalize (test_defchararray.TestMethods) ... ok > test_center (test_defchararray.TestMethods) ... ok > test_decode (test_defchararray.TestMethods) ... ok > test_encode (test_defchararray.TestMethods) ... ok > test_expandtabs (test_defchararray.TestMethods) ... ok > test_isdecimal (test_defchararray.TestMethods) ... ok > test_isnumeric (test_defchararray.TestMethods) ... ok > test_join (test_defchararray.TestMethods) ... ok > test_ljust (test_defchararray.TestMethods) ... ok > test_lower (test_defchararray.TestMethods) ... ok > test_lstrip (test_defchararray.TestMethods) ... ok > test_partition (test_defchararray.TestMethods) ... ok > test_replace (test_defchararray.TestMethods) ... ok > test_rjust (test_defchararray.TestMethods) ... ok > test_rpartition (test_defchararray.TestMethods) ... ok > test_rsplit (test_defchararray.TestMethods) ... ok > test_rstrip (test_defchararray.TestMethods) ... ok > test_split (test_defchararray.TestMethods) ... ok > test_splitlines (test_defchararray.TestMethods) ... ok > test_strip (test_defchararray.TestMethods) ... ok > test_swapcase (test_defchararray.TestMethods) ... ok > test_title (test_defchararray.TestMethods) ... ok > test_upper (test_defchararray.TestMethods) ... ok > test_add (test_defchararray.TestOperations) ... ok > Ticket #856 ... ok > test_mul (test_defchararray.TestOperations) ... ok > test_radd (test_defchararray.TestOperations) ... ok > test_rmod (test_defchararray.TestOperations) ... ok > test_rmul (test_defchararray.TestOperations) ... ok > test_broadcast_error (test_defchararray.TestVecString) ... ok > test_invalid_args_tuple (test_defchararray.TestVecString) ... ok > test_invalid_function_args (test_defchararray.TestVecString) ... ok > test_invalid_result_type (test_defchararray.TestVecString) ... ok > test_invalid_type_descr (test_defchararray.TestVecString) ... ok > test_non_existent_method (test_defchararray.TestVecString) ... ok > test_non_string_array (test_defchararray.TestVecString) ... ok > test1 (test_defchararray.TestWhitespace) ... ok > test_dtype (test_dtype.TestBuiltin) ... ok > Only test hash runs at all. ... ok > test_metadata_rejects_nondict (test_dtype.TestMetadata) ... ok > test_metadata_takes_dict (test_dtype.TestMetadata) ... ok > test_nested_metadata (test_dtype.TestMetadata) ... ok > test_no_metadata (test_dtype.TestMetadata) ... ok > test1 (test_dtype.TestMonsterType) ... ok > test_different_names (test_dtype.TestRecord) ... ok > test_different_titles (test_dtype.TestRecord) ... ok > Test whether equivalent record dtypes hash the same. ... ok > Test if an appropriate exception is raised when passing bad values to ... > ok > Test whether equivalent subarray dtypes hash the same. ... ok > Test whether different subarray dtypes hash differently. ... ok > Test some data types that are equal ... ok > Test some more complicated cases that shouldn't be equal ... ok > Test some simple cases that shouldn't be equal ... ok > test_single_subarray (test_dtype.TestSubarray) ... ok > test_einsum_errors (test_einsum.TestEinSum) ... ok > test_einsum_sums_cfloat128 (test_einsum.TestEinSum) ... > > It hangs here.. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Mon Oct 31 09:39:54 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 31 Oct 2011 08:39:54 -0500 Subject: [Numpy-discussion] problem in running test(nose) with numpy/scipy In-Reply-To: References: Message-ID: <4EAEA52A.6040502@gmail.com> On 10/31/2011 08:31 AM, Olivier Delalleau wrote: > If you google around "einsum hang", it looks like this is a problem > with Intel compiler with the -O3 flag. > See this thread in particular: > http://comments.gmane.org/gmane.comp.python.numeric.general/43168 > It looks like there may be more issues too... > > -=- Olivier > > 2011/10/30 akshar bhosale > > > hi, > > i have installed numpy (1.6.0) and scipy (0.9). , nose version is 1.0 > i have intel cluster toolkit installed on my system. (11/069 > version and mkl 10.3). i have machine having intel xeon processor > and rhel 5.2 x86_64 platform. i have installed it with intel > compilers. > when i execute numpy.test and scipy.test, it hangs . > numpy.test(verbose=3) > > Running unit tests for numpy > NumPy version 1.6.0 > NumPy is installed in > /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy > Python version 2.6 (r26:66714, May 29 2011, 15:10:47) [GCC 4.1.2 > 20071124 (Red Hat 4.1.2-42)] > nose version 1.0.0 > nose.config: INFO: Excluding tests matching ['f2py_ext', > 'f2py_f90_ext', 'gen_ext', 'pyrex_ext', 'swig_ext'] > nose.selector: INFO: /home/akshar/Python-2.6/lib/ > python2.6/site-packages/numpy/core/multiarray.so is executable; > skipped > nose.selector: INFO: > /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy/core/scalarmath.so > is executable; skipped > nose.selector: INFO: > /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy/core/umath.so > is executable; skipped > nose.selector: INFO: > /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy/core/multiarray_tests.so > is executable; skipped > nose.selector: INFO: > /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy/core/umath_tests.so > is executable; skipped > nose.selector: INFO: > /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy/fft/fftpack_lite.so > is executable; skipped > nose.selector: INFO: > /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy/linalg/lapack_lite.so > is executable; skipped > nose.selector: INFO: > /home/akshar/Python-2.6/lib/python2.6/site-packages/numpy/random/mtrand.so > is executable; skipped > test_api.test_fastCopyAndTranspose ... ok > test_arrayprint.TestArrayRepr.test_nan_inf ... ok > test_str (test_arrayprint.TestComplexArray) ... ok > Ticket 844. ... ok > test_blasdot.test_blasdot_used ... ok > test_blasdot.test_dot_2args ... ok > test_blasdot.test_dot_3args ... ok > test_blasdot.test_dot_3args_errors ... ok > test_creation (test_datetime.TestDateTime) ... ok > test_creation_overflow (test_datetime.TestDateTime) ... ok > test_divisor_conversion_as (test_datetime.TestDateTime) ... ok > test_divisor_conversion_bday (test_datetime.TestDateTime) ... ok > test_divisor_conversion_day (test_datetime.TestDateTime) ... ok > test_divisor_conversion_fs (test_datetime.TestDateTime) ... ok > test_divisor_conversion_hour (test_datetime.TestDateTime) ... ok > test_divisor_conversion_minute (test_datetime.TestDateTime) ... ok > test_divisor_conversion_month (test_datetime.TestDateTime) ... ok > test_divisor_conversion_second (test_datetime.TestDateTime) ... ok > test_divisor_conversion_week (test_datetime.TestDateTime) ... ok > test_divisor_conversion_year (test_datetime.TestDateTime) ... ok > test_hours (test_datetime.TestDateTime) ... ok > test_from_object_array (test_defchararray.TestBasic) ... ok > test_from_object_array_unicode (test_defchararray.TestBasic) ... ok > test_from_string (test_defchararray.TestBasic) ... ok > test_from_string_array (test_defchararray.TestBasic) ... ok > test_from_unicode (test_defchararray.TestBasic) ... ok > test_from_unicode_array (test_defchararray.TestBasic) ... ok > test_unicode_upconvert (test_defchararray.TestBasic) ... ok > test_it (test_defchararray.TestChar) ... ok > test_equal (test_defchararray.TestComparisons) ... ok > test_greater (test_defchararray.TestComparisons) ... ok > test_greater_equal (test_defchararray.TestComparisons) ... ok > test_less (test_defchararray.TestComparisons) ... ok > test_less_equal (test_defchararray.TestComparisons) ... ok > test_not_equal (test_defchararray.TestComparisons) ... ok > test_equal (test_defchararray.TestComparisonsMixed1) ... ok > test_greater (test_defchararray.TestComparisonsMixed1) ... ok > test_greater_equal (test_defchararray.TestComparisonsMixed1) ... ok > test_less (test_defchararray.TestComparisonsMixed1) ... ok > test_less_equal (test_defchararray.TestComparisonsMixed1) ... ok > test_not_equal (test_defchararray.TestComparisonsMixed1) ... ok > test_equal (test_defchararray.TestComparisonsMixed2) ... ok > test_greater (test_defchararray.TestComparisonsMixed2) ... ok > test_greater_equal (test_defchararray.TestComparisonsMixed2) ... ok > test_less (test_defchararray.TestComparisonsMixed2) ... ok > test_less_equal (test_defchararray.TestComparisonsMixed2) ... ok > test_not_equal (test_defchararray.TestComparisonsMixed2) ... ok > test_count (test_defchararray.TestInformation) ... ok > test_endswith (test_defchararray.TestInformation) ... ok > test_find (test_defchararray.TestInformation) ... ok > test_index (test_defchararray.TestInformation) ... ok > test_isalnum (test_defchararray.TestInformation) ... ok > test_isalpha (test_defchararray.TestInformation) ... ok > test_isdigit (test_defchararray.TestInformation) ... ok > test_islower (test_defchararray.TestInformation) ... ok > test_isspace (test_defchararray.TestInformation) ... ok > test_istitle (test_defchararray.TestInformation) ... ok > test_isupper (test_defchararray.TestInformation) ... ok > test_len (test_defchararray.TestInformation) ... ok > test_rfind (test_defchararray.TestInformation) ... ok > test_rindex (test_defchararray.TestInformation) ... ok > test_startswith (test_defchararray.TestInformation) ... ok > test_capitalize (test_defchararray.TestMethods) ... ok > test_center (test_defchararray.TestMethods) ... ok > test_decode (test_defchararray.TestMethods) ... ok > test_encode (test_defchararray.TestMethods) ... ok > test_expandtabs (test_defchararray.TestMethods) ... ok > test_isdecimal (test_defchararray.TestMethods) ... ok > test_isnumeric (test_defchararray.TestMethods) ... ok > test_join (test_defchararray.TestMethods) ... ok > test_ljust (test_defchararray.TestMethods) ... ok > test_lower (test_defchararray.TestMethods) ... ok > test_lstrip (test_defchararray.TestMethods) ... ok > test_partition (test_defchararray.TestMethods) ... ok > test_replace (test_defchararray.TestMethods) ... ok > test_rjust (test_defchararray.TestMethods) ... ok > test_rpartition (test_defchararray.TestMethods) ... ok > test_rsplit (test_defchararray.TestMethods) ... ok > test_rstrip (test_defchararray.TestMethods) ... ok > test_split (test_defchararray.TestMethods) ... ok > test_splitlines (test_defchararray.TestMethods) ... ok > test_strip (test_defchararray.TestMethods) ... ok > test_swapcase (test_defchararray.TestMethods) ... ok > test_title (test_defchararray.TestMethods) ... ok > test_upper (test_defchararray.TestMethods) ... ok > test_add (test_defchararray.TestOperations) ... ok > Ticket #856 ... ok > test_mul (test_defchararray.TestOperations) ... ok > test_radd (test_defchararray.TestOperations) ... ok > test_rmod (test_defchararray.TestOperations) ... ok > test_rmul (test_defchararray.TestOperations) ... ok > test_broadcast_error (test_defchararray.TestVecString) ... ok > test_invalid_args_tuple (test_defchararray.TestVecString) ... ok > test_invalid_function_args (test_defchararray.TestVecString) ... ok > test_invalid_result_type (test_defchararray.TestVecString) ... ok > test_invalid_type_descr (test_defchararray.TestVecString) ... ok > test_non_existent_method (test_defchararray.TestVecString) ... ok > test_non_string_array (test_defchararray.TestVecString) ... ok > test1 (test_defchararray.TestWhitespace) ... ok > test_dtype (test_dtype.TestBuiltin) ... ok > Only test hash runs at all. ... ok > test_metadata_rejects_nondict (test_dtype.TestMetadata) ... ok > test_metadata_takes_dict (test_dtype.TestMetadata) ... ok > test_nested_metadata (test_dtype.TestMetadata) ... ok > test_no_metadata (test_dtype.TestMetadata) ... ok > test1 (test_dtype.TestMonsterType) ... ok > test_different_names (test_dtype.TestRecord) ... ok > test_different_titles (test_dtype.TestRecord) ... ok > Test whether equivalent record dtypes hash the same. ... ok > Test if an appropriate exception is raised when passing bad values > to ... ok > Test whether equivalent subarray dtypes hash the same. ... ok > Test whether different subarray dtypes hash differently. ... ok > Test some data types that are equal ... ok > Test some more complicated cases that shouldn't be equal ... ok > Test some simple cases that shouldn't be equal ... ok > test_single_subarray (test_dtype.TestSubarray) ... ok > test_einsum_errors (test_einsum.TestEinSum) ... ok > test_einsum_sums_cfloat128 (test_einsum.TestEinSum) ... > > It hangs here.. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion This may be a problem with how float128 are working on your system and/or the Intel compiler because the test passes with lower precision. Thus it really would be great for you to isolate the specific component or components of the test that cause this so we can at least provide a 'fail' rather than a crash or hang. The actual 'check_einsum_sums' test has many subtests so you need to find which operation is involved: My way is just to comment out the different bits of the actual test 'check_einsum_sums' to find which parts work. Note you should be able to change into the actual test directory (numpy/core/tests/) and directly run the test ($python test_einsum.py) or a copy of it. Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From grove.steyn at gmail.com Mon Oct 31 09:46:06 2011 From: grove.steyn at gmail.com (=?utf-8?b?R3JvdsOp?=) Date: Mon, 31 Oct 2011 13:46:06 +0000 (UTC) Subject: [Numpy-discussion] np.in1d() capacity limit? References: Message-ID: Pauli Virtanen iki.fi> writes: > > The problem here seems to be that argsort (or only the mergesort?) for > datetime datatypes is not implemented. > > There's a faster code path that is triggered for small selection arrays, > and that does not require argsort, and that's why the error occurs in > only some of the cases. > Yes, that makes sense. Anyway, I have just implemented a workaround for now. Thanks. Grov? From matthew.brett at gmail.com Mon Oct 31 14:23:30 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 31 Oct 2011 11:23:30 -0700 Subject: [Numpy-discussion] float64 / int comparison different from float / int comparison Message-ID: Hi, I just ran into this confusing difference between np.float and np.float64: In [8]: np.float(2**63) == 2**63 Out[8]: True In [9]: np.float(2**63) > 2**63-1 Out[9]: True In [10]: np.float64(2**63) == 2**63 Out[10]: True In [11]: np.float64(2**63) > 2**63-1 Out[11]: False In [16]: np.float64(2**63-1) == np.float(2**63-1) Out[16]: True I believe values above 2*52 are all represented as integers in float64. http://matthew-brett.github.com/pydagogue/floating_point.html Is this this int64 issue that came up earlier in float128 comparison? Why the difference between np.float and np.float64? Thanks for any insight, Matthew From zachary.pincus at yale.edu Mon Oct 31 14:28:08 2011 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Mon, 31 Oct 2011 14:28:08 -0400 Subject: [Numpy-discussion] NumPy nogil API In-Reply-To: <4EAE83D5.5020604@astro.uio.no> References: <4EAE800D.3030006@astro.uio.no> <4EAE83D5.5020604@astro.uio.no> Message-ID: > As an example, it'd be nice to have scipy.ndimage available without the GIL: > http://docs.scipy.org/doc/scipy/reference/ndimage.html > > Now, this *can* easily be done as the core is written in C++. I'm just > pointing out that some people may wish more for calling scipy.ndimage > inside their prange than for some parts of NumPy. Not exactly to your larger point wrt the GIL, but I *think* some skimage (n?e scikits.image) folks are trying to rewrite most of ndimage's functionality in cython. I don't know what the status of this effort is though... Zach From morph at debian.org Mon Oct 31 15:02:24 2011 From: morph at debian.org (Sandro Tosi) Date: Mon, 31 Oct 2011 20:02:24 +0100 Subject: [Numpy-discussion] Reason behind C_ABI_VERSION so high Message-ID: Hello, in Debian we're trying to define a way to handle numpy transitions more smoothly (you can read the proposal, if interested, at http://bugs.debian.org/643873). In order to do that, we'd like to use the C_API_VERSION and C_ABI_VERSION values; while for C_API_VERSION we can see it's a quite small value, with a clear history at ./numpy/core/code_generators/cversions.txt , we don't have quite clear why C_ABI_VERSION is such a high value and how it would be incremented. Could you please shine some light on it? can we, f.e., just take 6 for API and 9 for ABI and be sure we're seeing them incremented to 7 and 10 (respectively) when needed? C_ABI_VERSION is incremented in a different way? Thanks, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From stefan at sun.ac.za Mon Oct 31 19:48:06 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 31 Oct 2011 16:48:06 -0700 Subject: [Numpy-discussion] NumPy nogil API In-Reply-To: References: <4EAE800D.3030006@astro.uio.no> <4EAE83D5.5020604@astro.uio.no> Message-ID: On Mon, Oct 31, 2011 at 11:28 AM, Zachary Pincus wrote: >> As an example, it'd be nice to have scipy.ndimage available without the GIL: >> http://docs.scipy.org/doc/scipy/reference/ndimage.html >> >> Now, this *can* easily be done as the core is written in C++. I'm just >> pointing out that some people may wish more for calling scipy.ndimage >> inside their prange than for some parts of NumPy. > > Not exactly to your larger point wrt the GIL, but I *think* some skimage (n?e scikits.image) folks are trying to rewrite most of ndimage's functionality in cython. I don't know what the status of this effort is though... We still rely on scipy.ndimage in some places, but since we don't have to support N-dimensional arrays, we can often do things in a slightly simpler and faster way. Almost all the performance code in the scikit is written in Cython, which makes it trivial to release the GIL on internal loops. I am actively soliciting feedback from current or prospective users, so that we can drive the scikit in the right direction. Already, it's an entirely different project from what is was a couple of months ago. We stopped trying to duplicate the MATLAB toolbox functionality, and have been adding some exciting new algorithms. The number of pull requests have tripled since the 0.3 release, and we're aiming to have 0.4 done this week. Regards St?fan From stefan at sun.ac.za Mon Oct 31 20:59:26 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 31 Oct 2011 17:59:26 -0700 Subject: [Numpy-discussion] float64 / int comparison different from float / int comparison In-Reply-To: References: Message-ID: On Mon, Oct 31, 2011 at 11:23 AM, Matthew Brett wrote: > In [8]: np.float(2**63) == 2**63 > Out[8]: True > > In [9]: np.float(2**63) > 2**63-1 > Out[9]: True > > In [10]: np.float64(2**63) == 2**63 > Out[10]: True > > In [11]: np.float64(2**63) > 2**63-1 > Out[11]: False > > In [16]: np.float64(2**63-1) == np.float(2**63-1) > Out[16]: True Interesting. Turns out that np.float(x) returns a Python float object. If you change the experiment to only use numpy array scalars, things are more consistent: In [36]: np.array(2**63, dtype=np.float) > 2**63 - 1 Out[36]: False In [37]: np.array(2**63, dtype=np.float32) > 2**63 - 1 Out[37]: False In [38]: np.array(2**63, dtype=np.float64) > 2**63 - 1 Regards St?fan From stefan at sun.ac.za Mon Oct 31 21:15:57 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 31 Oct 2011 18:15:57 -0700 Subject: [Numpy-discussion] Reason behind C_ABI_VERSION so high In-Reply-To: References: Message-ID: Hi Sandro On Mon, Oct 31, 2011 at 12:02 PM, Sandro Tosi wrote: > In order to do that, we'd like to use the C_API_VERSION and > C_ABI_VERSION values; while for C_API_VERSION we can see it's a quite > small value, with a clear history at > ./numpy/core/code_generators/cversions.txt , we don't have quite clear > why C_ABI_VERSION is such a high value and how it would be > incremented. Once, long long ago, the C_ABI_VERSION used to be called NPY_VERSION (before we had the difference in notion between ABI and API version). It then roughly matched the release number, e.g.: commit 83d1d47bb8e5df4c9578a42f9f2ce8db408ccd00 Author: Travis Oliphant Date: Fri Jul 21 08:00:51 2006 +0000 Change c-api to 1.0 diff --git a/numpy/core/include/numpy/arrayobject.h b/numpy/core/include/numpy/a index f27450b..0033e58 100644 --- a/numpy/core/include/numpy/arrayobject.h +++ b/numpy/core/include/numpy/arrayobject.h @@ -36,7 +36,7 @@ extern "C" CONFUSE_EMACS #define NPY_SUCCEED 1 /* Helpful to distinguish what is installed */ -#define NPY_VERSION 0x0009090D +#define NPY_VERSION 0x01000000 After that, I suspect the number could not be made smaller for backward compatibility. But nowadays the ABI number is simply bumped by one after every change, so for a good couple of million releases you should be safe ignoring the "1" (but is the value really large enough to cause problems?). Regards St?fan From matthew.brett at gmail.com Mon Oct 31 21:25:35 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 31 Oct 2011 18:25:35 -0700 Subject: [Numpy-discussion] float64 / int comparison different from float / int comparison In-Reply-To: References: Message-ID: Hi, 2011/10/31 St?fan van der Walt : > On Mon, Oct 31, 2011 at 11:23 AM, Matthew Brett wrote: >> In [8]: np.float(2**63) == 2**63 >> Out[8]: True >> >> In [9]: np.float(2**63) > 2**63-1 >> Out[9]: True >> >> In [10]: np.float64(2**63) == 2**63 >> Out[10]: True >> >> In [11]: np.float64(2**63) > 2**63-1 >> Out[11]: False >> >> In [16]: np.float64(2**63-1) == np.float(2**63-1) >> Out[16]: True > > Interesting. ?Turns out that np.float(x) returns a Python float > object. ?If you change the experiment to only use numpy array scalars, > things are more consistent: > > In [36]: np.array(2**63, dtype=np.float) > 2**63 - 1 > Out[36]: False > > In [37]: np.array(2**63, dtype=np.float32) > 2**63 - 1 > Out[37]: False > > In [38]: np.array(2**63, dtype=np.float64) > 2**63 - 1 Oh, dear, I'm suffering now: In [11]: res = np.array((2**31,), dtype=np.float32) In [12]: res > 2**31-1 Out[12]: array([False], dtype=bool) OK - that's what I was expecting from the above, but now: In [13]: res[0] > 2**31-1 Out[13]: True In [14]: res[0].dtype Out[14]: dtype('float32') Sorry, maybe I'm not thinking straight, but I'm confused... See you, Matthew From stefan at sun.ac.za Mon Oct 31 21:38:24 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 31 Oct 2011 18:38:24 -0700 Subject: [Numpy-discussion] float64 / int comparison different from float / int comparison In-Reply-To: References: Message-ID: On Mon, Oct 31, 2011 at 6:25 PM, Matthew Brett wrote: > Oh, dear, I'm suffering now: > > In [11]: res = np.array((2**31,), dtype=np.float32) > > In [12]: res > 2**31-1 > Out[12]: array([False], dtype=bool) > > OK - that's what I was expecting from the above, but now: > > In [13]: res[0] > 2**31-1 > Out[13]: True > > In [14]: res[0].dtype > Out[14]: dtype('float32') I'm seeing: In [7]: res = np.array((2**31,), dtype=np.float32) In [9]: res > (2**31-1) Out[9]: array([ True], dtype=bool) In [10]: res[0] Out[10]: 2.1474836e+09 In [11]: res[0] > (2**31-1) Out[11]: True Your result seems very strange, because the numpy scalars should perform exactly the same inside and outside an array. St?fan