From hannesschoenberger at gmail.com Tue Jan 1 09:50:13 2013 From: hannesschoenberger at gmail.com (=?iso-8859-1?Q?Sch=F6nberger_Johannes?=) Date: Tue, 1 Jan 2013 15:50:13 +0100 Subject: [Numpy-discussion] Conversion functions Message-ID: <85ADC5DF-F17D-41EE-84D6-DA1F22015D3C@gmail.com> Hello everyone, I recently opened a new pull request which adds the functionality to convert between degrees and degrees, minutes and seconds (https://github.com/numpy/numpy/pull/2869). The discussion is about whether such conversion functionality should be integrated into numpy at all or whether this belongs to scipy. I suggest to move the most common conversion functions (deg2rad, rad2deg, deg2dms, dms2deg and some more could be added) to a separate file `conversion.py` file in `numpy/lib`. I could implement this in a new pull request, if the general consensus is in favor of it? What are your thoughts? Regards, Johannes From ralf.gommers at gmail.com Tue Jan 1 12:23:26 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 1 Jan 2013 18:23:26 +0100 Subject: [Numpy-discussion] Conversion functions In-Reply-To: <85ADC5DF-F17D-41EE-84D6-DA1F22015D3C@gmail.com> References: <85ADC5DF-F17D-41EE-84D6-DA1F22015D3C@gmail.com> Message-ID: On Tue, Jan 1, 2013 at 3:50 PM, Sch?nberger Johannes < hannesschoenberger at gmail.com> wrote: > Hello everyone, > > I recently opened a new pull request which adds the functionality to > convert between degrees and degrees, minutes and seconds > (https://github.com/numpy/numpy/pull/2869). > > The discussion is about whether such conversion functionality should > be integrated into numpy at all or whether this belongs to scipy. I > suggest to move the most common conversion functions (deg2rad, > rad2deg, deg2dms, dms2deg and some more could be added) to a separate > file `conversion.py` file in `numpy/lib`. > > I could implement this in a new pull request, if the general consensus > is in favor of it? What are your thoughts? > After checking what's in scipy.constants now (degree/arcminute/arcsecond and for example temperature and frequency conversion function), I think that that's where it belongs. A separate new numpy submodule with a bunch of these type of conversion utilities would be my second choice. I'm -1 on adding such small and fairly domain-specific functions to the main numpy namespace. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Jan 1 22:41:19 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Tue, 1 Jan 2013 19:41:19 -0800 Subject: [Numpy-discussion] Conversion functions In-Reply-To: <85ADC5DF-F17D-41EE-84D6-DA1F22015D3C@gmail.com> References: <85ADC5DF-F17D-41EE-84D6-DA1F22015D3C@gmail.com> Message-ID: On Tue, Jan 1, 2013 at 6:50 AM, Sch?nberger Johannes wrote: > I recently opened a new pull request which adds the functionality to > convert between degrees and degrees, minutes and seconds > (https://github.com/numpy/numpy/pull/2869). > > The discussion is about whether such conversion functionality should > be integrated into numpy at all or whether this belongs to scipy. handy functions, yes, but certainly not something to put in numpy -- maybe scipy, not sure the best place. I see someone (chuck? )on github suggested a "conversion.py" module -- that should be in scipy, not numpy, but I"m wary -- where would it stop? RAther, perhaps the quantaties package should be adopted. But another note: conversion to deg.min.sec with floating point is a bit less trivial than you'd think, you can end up with results like: xdegrees, 60 minutes.... if you're not careful -- it looks from a first glance that the pull request does not address this. Note: I suppose we could consider it technically OK, but it's certainly not aesthetically pleasing. Example: import numpy as np def deg2dms(x): out = [0,0,0] out[0] = np.floor(x) out[1] = np.floor((x - out[0]) * 60) out[2] = ((x - out[0]) * 60 - out[1]) * 60 return out print deg2dms(1.0) print deg2dms(1.1) print deg2dms(45.05) In [60]: run deg2dms.py [1.0, 0.0, 0.0] [1.0, 6.0, 3.1974423109204508e-13] [45.0, 2.0, 59.999999999989768] you'd really want that to be: 1degree 6 minutes, 0 seconds and 45 degrees 3 minutes, zero seconds Here's the code I used: @classmethod def ToDegMin(self, DecDegrees, ustring = False): """ Converts from decimal (binary float) degrees to: Degrees, Minutes If the optional parameter: "ustring" is True, a Unicode string is returned """ if signbit(DecDegrees): Sign = -1 DecDegrees = abs(DecDegrees) else: Sign = 1 Degrees = int(DecDegrees) DecMinutes = round((DecDegrees - Degrees + 1e-14) * 60, 10)# add a tiny bit then round to avoid binary rounding issues if ustring: if Sign == 1: return u"%i\xb0 %.3f'"%(Degrees, DecMinutes) else: return u"-%i\xb0 %.3f'"%(Degrees, DecMinutes) else: return (Sign*float(Degrees), DecMinutes) # float to preserve -0.0 perhaps ugly but it results in pretty output -- someone smart here could probably offer a cleaner solution. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From chris.barker at noaa.gov Tue Jan 1 22:53:24 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Tue, 1 Jan 2013 19:53:24 -0800 Subject: [Numpy-discussion] 3D array problem challenging in Python In-Reply-To: <1356921614.45639691@f150.mail.ru> References: <1356867686.200644432@f373.mail.ru> <1356921614.45639691@f150.mail.ru> Message-ID: On Sun, Dec 30, 2012 at 6:40 PM, Happyman wrote: > Again the same problem here I want to optimize my codes in order to avoid > "Loop" as well as to get quick response as much as possible. BUT, it seems > really confusing, would be great to get help from Python programmers !!! > ================================== > The codes here: > ================================================================= > > import numpy as np > import scipy.special as ss > > from scipy.special import sph_jnyn,sph_jn,jv,yv > from scipy import integrate > > import time > import os > > --------------------------- > 1) Problem: no problem in this F0() function > --------------------------- > Inputs: m=5+0.4j - complex number as an example! > x= one value - float! > --------------------------- > #This function returns an, bn coefficients I don't want it to be vectorized > because it is already done. it is working well! > > def F0(m, x): > > nmax = np.round(2.0+x+4.0*x**(1.0/3.0)) > mx = m * x > > j_x,jd_x,y_x,yd_x = ss.sph_jnyn(nmax, x) # sph_jnyn - from > scipy special functions > > j_x = j_x[1:] > jd_x = jd_x[1:] > y_x = y_x[1:] > yd_x = yd_x[1:] > > h1_x = j_x + 1.0j*y_x > h1d_x = jd_x + 1.0j*yd_x > > j_mx,jd_mx = ss.sph_jn(nmax, mx) # sph_jn - from scipy > special functions > j_mx = j_mx[1:] > jd_mx = jd_mx[1:] > > j_xp = j_x + x*jd_x > j_mxp = j_mx + mx*jd_mx > h1_xp = h1_x + x*h1d_x > > m2 = m * m > an = (m2 * j_mx * j_xp - j_x * j_mxp)/(m2 * j_mx * h1_xp - h1_x * j_mxp) > bn = (j_mx * j_xp - j_x * j_mxp)/(j_mx * h1_xp - h1_x * j_mxp) > > return an, bn > > -------------------------------------- > 2) Problem: 1) To avoid loop > 2) To return values from the function (below) no > matter whether 'a' array or scalar! > -------------------------------------- > m=5+0.4j - for example > L = 30 - for example > a - array(one dimensional) > -------------------------------------- > > def F1(m,L,a): > > xs = pi * a / L > if(m.imag < 0.0): > m = conj(m) in this case, you can do things like: m = np.where(m.imag < 0.0, np.conj(m), m) to vectorize. > # Want to make sure we can accept single arguments or arrays > try: > xs.size > xlist = xs > except: > xlist = array(xs) here I use: xs = np.asarray(xs, dtype-the_dtype_you_want) it is essentially a no-op if xs is already an array, and will convert it if it isn't. > q=[ ] > for i,s in enumerate(xlist.flat): > > if float(s)==0.0: # To avoid a singularity at x=0 > q.append(0.0) again, look to use np.where, or "fancy indexing": ind = xs == 0.0 q[xs==0.0] = 0.0 > q.append(((L*L)/(2*pi) * (c * (an.real + bn.real > )).sum())) even if you do need the loop -- pre-allocate the result array (with np.zeros() ), and then put stuf in it -- it will should be faster than using a list. > 3) Problem: 1) I used "try" to avoid whether 'D' is singular or not!!! IS > there better way beside this? The other option is an if test -- try is faster if it's a rare occurrence, slower if it's common. > def F2(a,s): > for i,d in enumerate(Dslist.flat): # IS there any wayy to avoid from the > loop here in this case??? see above. note that using the where() or fancy indexing does mean you need to go through the loop multiple times, but still probably much faster then looping. For full-on speed for this sort of thing, Cython is a nice option. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From njs at pobox.com Wed Jan 2 06:24:10 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 2 Jan 2013 11:24:10 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: Message-ID: This discussion seems to have petered out without reaching consensus one way or another. This seems like an important issue, so I've opened a bug: https://github.com/numpy/numpy/issues/2878 Hopefully this way we'll at least not forget about it; also I tried to summarize the main issues there and would welcome comments. -n On Mon, Nov 12, 2012 at 7:54 PM, Matthew Brett wrote: > Hi, > > I wanted to check that everyone knows about and is happy with the > scalar casting changes from 1.6.0. > > Specifically, the rules for (array, scalar) casting have changed such > that the resulting dtype depends on the _value_ of the scalar. > > Mark W has documented these changes here: > > http://docs.scipy.org/doc/numpy/reference/ufuncs.html#casting-rules > http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html > http://docs.scipy.org/doc/numpy/reference/generated/numpy.promote_types.html > > Specifically, as of 1.6.0: > > In [19]: arr = np.array([1.], dtype=np.float32) > > In [20]: (arr + (2**16-1)).dtype > Out[20]: dtype('float32') > > In [21]: (arr + (2**16)).dtype > Out[21]: dtype('float64') > > In [25]: arr = np.array([1.], dtype=np.int8) > > In [26]: (arr + 127).dtype > Out[26]: dtype('int8') > > In [27]: (arr + 128).dtype > Out[27]: dtype('int16') > > There's discussion about the changes here: > > http://mail.scipy.org/pipermail/numpy-discussion/2011-September/058563.html > http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055156.html > http://mail.scipy.org/pipermail/numpy-discussion/2012-February/060381.html > > It seems to me that this change is hard to explain, and does what you > want only some of the time, making it a false friend. > > Is it the right behavior for numpy 2.0? > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Wed Jan 2 09:56:15 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 2 Jan 2013 14:56:15 +0000 Subject: [Numpy-discussion] Numpy speed ups to simple tasks - final findings and suggestions In-Reply-To: <50D4B69A.7000409@virtualmaterials.com> References: <50D4B69A.7000409@virtualmaterials.com> Message-ID: On Fri, Dec 21, 2012 at 7:20 PM, Raul Cota wrote: > Hello, > > > On Dec/2/2012 I sent an email about some meaningful speed problems I was > facing when porting our core program from Numeric (Python 2.2) to Numpy > (Python 2.6). Some of our tests went from 30 seconds to 90 seconds for > example. Hi Raul, This is great work! Sorry you haven't gotten any feedback yet -- I guess it's a busy time of year for most people; and, the way you've described your changes makes it hard for us to use our usual workflow to discuss them. > These are the actual changes to the C code, > For bottleneck (a) > > In general, > - avoid calls to PyObject_GetAttrString when I know the type is > List, None, Tuple, Float, Int, String or Unicode > > - avoid calls to PyObject_GetBuffer when I know the type is > List, None or Tuple This definitely seems like a worthwhile change. There are possible quibbles about coding style -- the macros could have better names, and would probably be better as (inline) functions instead of macros -- but that can be dealt with. Can you make a pull request on github with these changes? I guess you haven't used git before, but I think you'll find it makes things *much* easier (in particular, you'll never have to type out long awkward english descriptions of the changes you made ever again!) We have docs here: http://docs.scipy.org/doc/numpy/dev/gitwash/git_development.html and your goal is to get to the point where you can file a "pull request": http://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html#asking-for-your-changes-to-be-merged-with-the-main-repo Feel free to ask on the list if you get stuck of course. > For bottleneck (b) > > b.1) > I noticed that PyFloat * Float64 resulted in an unnecessary "on the fly" > conversion of the PyFloat into a Float64 to extract its underlying C > double value. This happened in the function > _double_convert_to_ctype which comes from the pattern, > _ at name@_convert_to_ctype This also sounds like an excellent change, and perhaps should be extended to ints and bools as well... again, can you file a pull request? > b.2) This is the change that may not be very popular among Numpy users. > I modified Float64 operations to return a Float instead of Float64. I > could not think or see any ill effects and I got a fairly decent speed > boost. Yes, unfortunately, there's no way we'll be able to make this change upstream -- there's too much chance of it breaking people's code. (And numpy float64's do act different than python floats in at least some cases, e.g., numpy gives more powerful control over floating point error handling, see np.seterr.) But, it's almost certainly possible to optimize numpy's float64 (and friends), so that they are themselves (almost) as fast as the native python objects. And that would help all the code that uses them, not just the ones where regular python floats could be substituted instead. Have you tried profiling, say, float64 * float64 to figure out where the bottlenecks are? -n From njs at pobox.com Wed Jan 2 09:58:32 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 2 Jan 2013 14:58:32 +0000 Subject: [Numpy-discussion] Numpy speed ups to simple tasks - final findings and suggestions In-Reply-To: References: <50D4B69A.7000409@virtualmaterials.com> Message-ID: On Wed, Jan 2, 2013 at 2:56 PM, Nathaniel Smith wrote: > On Fri, Dec 21, 2012 at 7:20 PM, Raul Cota wrote: >> b.1) >> I noticed that PyFloat * Float64 resulted in an unnecessary "on the fly" >> conversion of the PyFloat into a Float64 to extract its underlying C >> double value. This happened in the function >> _double_convert_to_ctype which comes from the pattern, >> _ at name@_convert_to_ctype > > This also sounds like an excellent change, and perhaps should be > extended to ints and bools as well... again, can you file a pull > request? Immediately after I hit 'send' I realized this might be unclear... what I mean is, please file two separate pull requests, one for the (a) changes and one for the (b.1) changes. They're logically separate so it'll be easier to review and merge them separately. -n From bahtiyor_zohidov at mail.ru Wed Jan 2 20:27:26 2013 From: bahtiyor_zohidov at mail.ru (=?UTF-8?B?SGFwcHltYW4=?=) Date: Thu, 03 Jan 2013 05:27:26 +0400 Subject: [Numpy-discussion] =?utf-8?q?3D_array_problem_challenging_in_Pyth?= =?utf-8?q?on?= In-Reply-To: References: <1356867686.200644432@f373.mail.ru> <1356921614.45639691@f150.mail.ru> Message-ID: <1357176446.246671186@f220.mail.ru> Hi Chris, Thanks a lot. I did as you advised..but, unfortunately, I could not "negotiate" with "quad" function at all. what do you think if quad function can get arrays or not? ???????, 1 ?????? 2013, 19:53 -08:00 ?? Chris Barker - NOAA Federal : >On Sun, Dec 30, 2012 at 6:40 PM, Happyman < bahtiyor_zohidov at mail.ru > wrote: > >> Again the same problem here I want to optimize my codes in order to avoid >> "Loop" as well as to get quick response as much as possible. BUT, it seems >> really confusing, would be great to get help from Python programmers !!! >> ================================== >> The codes here: >> ================================================================= >> >> import numpy as np >> import scipy.special as ss >> >> from scipy.special import sph_jnyn,sph_jn,jv,yv >> from scipy import integrate >> >> import time >> import os >> >> --------------------------- >> 1) Problem: no problem in this F0() function >> --------------------------- >> Inputs: m=5+0.4j - complex number as an example! >> x= one value - float! >> --------------------------- >> #This function returns an, bn coefficients I don't want it to be vectorized >> because it is already done. it is working well! >> >> def F0(m, x): >> >> nmax = np.round(2.0+x+4.0*x**(1.0/3.0)) >> mx = m * x >> >> j_x,jd_x,y_x,yd_x = ss.sph_jnyn(nmax, x) # sph_jnyn - from >> scipy special functions >> >> j_x = j_x[1:] >> jd_x = jd_x[1:] >> y_x = y_x[1:] >> yd_x = yd_x[1:] >> >> h1_x = j_x + 1.0j*y_x >> h1d_x = jd_x + 1.0j*yd_x >> >> j_mx,jd_mx = ss.sph_jn(nmax, mx) # sph_jn - from scipy >> special functions >> j_mx = j_mx[1:] >> jd_mx = jd_mx[1:] >> >> j_xp = j_x + x*jd_x >> j_mxp = j_mx + mx*jd_mx >> h1_xp = h1_x + x*h1d_x >> >> m2 = m * m >> an = (m2 * j_mx * j_xp - j_x * j_mxp)/(m2 * j_mx * h1_xp - h1_x * j_mxp) >> bn = (j_mx * j_xp - j_x * j_mxp)/(j_mx * h1_xp - h1_x * j_mxp) >> >> return an, bn >> >> -------------------------------------- >> 2) Problem: 1) To avoid loop >> 2) To return values from the function (below) no >> matter whether 'a' array or scalar! >> -------------------------------------- >> m=5+0.4j - for example >> L = 30 - for example >> a - array(one dimensional) >> -------------------------------------- >> >> def F1(m,L,a): >> >> xs = pi * a / L >> if(m.imag < 0.0): >> m = conj(m) > >in this case, you can do things like: > >m = np.where(m.imag < 0.0, np.conj(m), m) > >to vectorize. > > > > >> # Want to make sure we can accept single arguments or arrays >> try: >> xs.size >> xlist = xs >> except: >> xlist = array(xs) > >here I use: > >xs = np.asarray(xs, dtype-the_dtype_you_want) > >it is essentially a no-op if xs is already an array, and will convert >it if it isn't. > >> q=[ ] >> for i,s in enumerate(xlist.flat): >> >> if float(s)==0.0: # To avoid a singularity at x=0 >> q.append(0.0) > >again, look to use np.where, or "fancy indexing": > >ind = xs == 0.0 >q[xs==0.0] = 0.0 > >> q.append(((L*L)/(2*pi) * (c * (an.real + bn.real >> )).sum())) > >even if you do need the loop -- pre-allocate the result array (with >np.zeros() ), and then put stuf in it -- it will should be faster than >using a list. > >> 3) Problem: 1) I used "try" to avoid whether 'D' is singular or not!!! IS >> there better way beside this? > >The other option is an if test -- try is faster if it's a rare >occurrence, slower if it's common. > >> def F2(a,s): >> for i,d in enumerate(Dslist.flat): # IS there any wayy to avoid from the >> loop here in this case??? > >see above. > >note that using the where() or fancy indexing does mean you need to go >through the loop multiple times, but still probably much faster then >looping. For full-on speed for this sort of thing, Cython is a nice >option. > >-Chris > > >-- > >Christopher Barker, Ph.D. >Oceanographer > >Emergency Response Division >NOAA/NOS/OR&R (206) 526-6959 voice >7600 Sand Point Way NE (206) 526-6329 fax >Seattle, WA 98115 (206) 526-6317 main reception > >Chris.Barker at noaa.gov >_______________________________________________ >NumPy-Discussion mailing list >NumPy-Discussion at scipy.org >http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From robince at gmail.com Thu Jan 3 10:54:25 2013 From: robince at gmail.com (Robin) Date: Thu, 3 Jan 2013 15:54:25 +0000 Subject: [Numpy-discussion] test failures when embedded (in matlab) Message-ID: Hi All, When using Numpy from an embedded Python (Python embedded in a Matlab mex function) I get a lot of test failures (see attached log). I am using CentOS 6.3, distribution packaged Python (2.6) and Numpy (1.4.1). Running numpy tests from a normal Python interpreter results in no errors or failures. Most of the failures look to be to do with errors calling Fortran functions - is it possible there is some linking / ABI problem? Would there be any way to overcome it? I get similar errors using EPD 7.3. Any advice appreciated, Cheers Robin -------------- next part -------------- A non-text attachment was scrubbed... Name: embed_numpy_test.log Type: application/octet-stream Size: 78041 bytes Desc: not available URL: From njs at pobox.com Thu Jan 3 15:06:57 2013 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 3 Jan 2013 20:06:57 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: Message-ID: On Wed, Jan 2, 2013 at 11:24 AM, Nathaniel Smith wrote: > This discussion seems to have petered out without reaching consensus > one way or another. This seems like an important issue, so I've opened > a bug: > https://github.com/numpy/numpy/issues/2878 > Hopefully this way we'll at least not forget about it; also I tried to > summarize the main issues there and would welcome comments. Consensus in that bug report seems to be that for array/scalar operations like: np.array([1], dtype=np.int8) + 1000 # can't be represented as an int8! we should raise an error, rather than either silently upcasting the result (as in 1.6 and 1.7) or silently downcasting the scalar (as in 1.5 and earlier). The next question is how to handle the warning period, or if there should be a warning period, given that we've already silently changed the semantics of this operation, so raising a warning now is perhaps like noticing that the horses are gone and putting up a notice warning that we plan to close the barn door shortly. But then again, people who have already adjusted their code for 1.6 may appreciate such a warning. Or maybe no-one actually writes dumb things like int8-plus-1000 so it doesn't matter, but anyway I thought the list should have a heads-up :-) -n From andrew.collette at gmail.com Thu Jan 3 18:39:37 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Thu, 3 Jan 2013 16:39:37 -0700 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: Message-ID: > Consensus in that bug report seems to be that for array/scalar operations like: > np.array([1], dtype=np.int8) + 1000 # can't be represented as an int8! > we should raise an error, rather than either silently upcasting the > result (as in 1.6 and 1.7) or silently downcasting the scalar (as in > 1.5 and earlier). I have run into this a few times as a NumPy user, and I just wanted to comment that (in my opinion), having this case generate an error is the worst of both worlds. The reason people can't decide between rollover and promotion is because neither is objectively better. One avoids memory inflation, and the other avoids losing precision. You just need to pick one and document it. Kicking the can down the road to the user, and making him/her explicitly test for this condition, is not a very good solution. What does this mean in practical terms for NumPy users? I personally don't relish the choice of always using numpy.add, or always wrapping my additions in checks for ValueError. Andrew From d.s.seljebotn at astro.uio.no Thu Jan 3 19:11:21 2013 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 04 Jan 2013 01:11:21 +0100 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: Message-ID: <50E61E29.1020709@astro.uio.no> On 01/04/2013 12:39 AM, Andrew Collette wrote: >> Consensus in that bug report seems to be that for array/scalar operations like: >> np.array([1], dtype=np.int8) + 1000 # can't be represented as an int8! >> we should raise an error, rather than either silently upcasting the >> result (as in 1.6 and 1.7) or silently downcasting the scalar (as in >> 1.5 and earlier). > > I have run into this a few times as a NumPy user, and I just wanted to > comment that (in my opinion), having this case generate an error is > the worst of both worlds. The reason people can't decide between > rollover and promotion is because neither is objectively better. One If neither is objectively better, I think that is a very good reason to kick it down to the user. "Explicit is better than implicit". > avoids memory inflation, and the other avoids losing precision. You > just need to pick one and document it. Kicking the can down the road > to the user, and making him/her explicitly test for this condition, is > not a very good solution. It's a good solution to encourage bug-free code. It may not be a good solution to avoid typing. > What does this mean in practical terms for NumPy users? I personally > don't relish the choice of always using numpy.add, or always wrapping > my additions in checks for ValueError. I think you usually have a bug in your program when this happens, since either the dtype is wrong, or the value one is trying to store is wrong. I know that's true for myself, though I don't claim to know everybody elses usecases. Dag Sverre From njs at pobox.com Thu Jan 3 19:26:46 2013 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 4 Jan 2013 00:26:46 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: Message-ID: On 3 Jan 2013 23:39, "Andrew Collette" wrote: > > > Consensus in that bug report seems to be that for array/scalar operations like: > > np.array([1], dtype=np.int8) + 1000 # can't be represented as an int8! > > we should raise an error, rather than either silently upcasting the > > result (as in 1.6 and 1.7) or silently downcasting the scalar (as in > > 1.5 and earlier). > > I have run into this a few times as a NumPy user, and I just wanted to > comment that (in my opinion), having this case generate an error is > the worst of both worlds. The reason people can't decide between > rollover and promotion is because neither is objectively better. One > avoids memory inflation, and the other avoids losing precision. You > just need to pick one and document it. Kicking the can down the road > to the user, and making him/her explicitly test for this condition, is > not a very good solution. > > What does this mean in practical terms for NumPy users? I personally > don't relish the choice of always using numpy.add, or always wrapping > my additions in checks for ValueError. To be clear: we're only talking here about the case where you have a mix of a narrow dtype in an array and a scalar value that cannot be represented in that narrow dtype. If both sides are arrays then we continue to upcast as normal. So my impression is that this means very little in practical terms, because this is a rare and historically poorly supported situation. But if this is something you're running into in practice then you may have a better idea than us about the practical effects. Do you have any examples where this has come up that you can share? -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.j.a.cock at googlemail.com Thu Jan 3 19:39:50 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 4 Jan 2013 00:39:50 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: <50E61E29.1020709@astro.uio.no> References: <50E61E29.1020709@astro.uio.no> Message-ID: On Fri, Jan 4, 2013 at 12:11 AM, Dag Sverre Seljebotn wrote: > On 01/04/2013 12:39 AM, Andrew Collette wrote: > > Nathaniel Smith wrote: > >> Consensus in that bug report seems to be that for array/scalar operations like: > >> np.array([1], dtype=np.int8) + 1000 # can't be represented as an int8! > >> we should raise an error, rather than either silently upcasting the > >> result (as in 1.6 and 1.7) or silently downcasting the scalar (as in > >> 1.5 and earlier). > > > > I have run into this a few times as a NumPy user, and I just wanted to > > comment that (in my opinion), having this case generate an error is > > the worst of both worlds. The reason people can't decide between > > rollover and promotion is because neither is objectively better. One > > If neither is objectively better, I think that is a very good reason to > kick it down to the user. "Explicit is better than implicit". > > > avoids memory inflation, and the other avoids losing precision. You > > just need to pick one and document it. Kicking the can down the road > > to the user, and making him/her explicitly test for this condition, is > > not a very good solution. > > It's a good solution to encourage bug-free code. It may not be a good > solution to avoid typing. > > > What does this mean in practical terms for NumPy users? I personally > > don't relish the choice of always using numpy.add, or always wrapping > > my additions in checks for ValueError. > > I think you usually have a bug in your program when this happens, since > either the dtype is wrong, or the value one is trying to store is wrong. > I know that's true for myself, though I don't claim to know everybody > elses usecases. I agree with Dag rather than Andrew, "Explicit is better than implicit". i.e. What Nathaniel described earlier as the apparent consensus. Since I've actually used NumPy arrays with specific low memory types, I thought I should comment about my use case if case it is helpful: I've only used the low precision types like np.uint8 (unsigned) where I needed to limit my memory usage. In this case, the topology of a graph allowing multiple edges held as an integer adjacency matrix, A. I would calculate things like A^n for paths of length n, and also make changes to A directly (e.g. adding edges). So an overflow was always possible, and neither the old behaviour (type preserving but wrapping on overflow giving data corruption) nor the current behaviour (type promotion overriding my deliberate memory management) are nice. My preferences here would be for an exception, so I knew right away. The other use case which comes to mind is dealing with low level libraries and/or file formats, and here automagic type promotion would probably be unwelcome. Regards, Peter From njs at pobox.com Thu Jan 3 19:49:24 2013 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 4 Jan 2013 00:49:24 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: On 4 Jan 2013 00:39, "Peter Cock" wrote: > I agree with Dag rather than Andrew, "Explicit is better than implicit". > i.e. What Nathaniel described earlier as the apparent consensus. > > Since I've actually used NumPy arrays with specific low memory > types, I thought I should comment about my use case if case it > is helpful: > > I've only used the low precision types like np.uint8 (unsigned) where > I needed to limit my memory usage. In this case, the topology of a > graph allowing multiple edges held as an integer adjacency matrix, A. > I would calculate things like A^n for paths of length n, and also make > changes to A directly (e.g. adding edges). So an overflow was always > possible, and neither the old behaviour (type preserving but wrapping > on overflow giving data corruption) nor the current behaviour (type > promotion overriding my deliberate memory management) are nice. > My preferences here would be for an exception, so I knew right away. I don't think the changes we're talking about here will help your use case actually; this is only about the specific case where one of your operands, itself, cannot be cleanly cast to the types being used for the operation - it won't detect overflow in general. For that you want #593: https://github.com/numpy/numpy/issues/593 On another note, while you're here, perhaps I can tempt you into having a go at fixing #593? :-) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.j.a.cock at googlemail.com Thu Jan 3 20:04:16 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 4 Jan 2013 01:04:16 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: On Fri, Jan 4, 2013 at 12:39 AM, Peter Cock wrote: >> Since I've actually used NumPy arrays with specific low memory >> types, I thought I should comment about my use case if case it >> is helpful: >> >> I've only used the low precision types like np.uint8 (unsigned) where >> I needed to limit my memory usage. In this case, the topology of a >> graph allowing multiple edges held as an integer adjacency matrix, A. >> I would calculate things like A^n for paths of length n, and also make >> changes to A directly (e.g. adding edges). So an overflow was always >> possible, and neither the old behaviour (type preserving but wrapping >> on overflow giving data corruption) nor the current behaviour (type >> promotion overriding my deliberate memory management) are nice. >> My preferences here would be for an exception, so I knew right away. >> >> The other use case which comes to mind is dealing with low level >> libraries and/or file formats, and here automagic type promotion >> would probably be unwelcome. > > Regards, > > Peter Elsewhere on the thread, Nathaniel Smith wrote: > > To be clear: we're only talking here about the case where you have a mix of > a narrow dtype in an array and a scalar value that cannot be represented in > that narrow dtype. If both sides are arrays then we continue to upcast as > normal. So my impression is that this means very little in practical terms, > because this is a rare and historically poorly supported situation. > > But if this is something you're running into in practice then you may have a > better idea than us about the practical effects. Do you have any examples > where this has come up that you can share? > > -n Clarification appreciated - on closer inspection for my adjacency matrix example I would not fall over the issue in https://github.com/numpy/numpy/issues/2878 >>> import numpy as np >>> np.__version__ '1.6.1' >>> A = np.zeros((100,100), np.uint8) # Matrix could be very big >>> A[3,4] = 255 # Max value, setting up next step in example >>> A[3,4] 255 >>> A[3,4] += 1 # Silently overflows on NumPy 1.6 >>> A[3,4] 0 To trigger the contentious behaviour I'd have to do something like this: >>> A = np.zeros((100,100), np.uint8) >>> B = A + 256 >>> B array([[256, 256, 256, ..., 256, 256, 256], [256, 256, 256, ..., 256, 256, 256], [256, 256, 256, ..., 256, 256, 256], ..., [256, 256, 256, ..., 256, 256, 256], [256, 256, 256, ..., 256, 256, 256], [256, 256, 256, ..., 256, 256, 256]], dtype=uint16) I wasn't doing anything like that in my code though, just simple matrix multiplication and in situ element modification, for example A[i,j] += 1 to add an edge. I still agree that for https://github.com/numpy/numpy/issues/2878 an exception sounds sensible. Peter From andrew.collette at gmail.com Thu Jan 3 20:15:41 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Thu, 3 Jan 2013 18:15:41 -0700 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: <50E61E29.1020709@astro.uio.no> References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi Dag, > If neither is objectively better, I think that is a very good reason to > kick it down to the user. "Explicit is better than implicit". I agree with you, up to a point. However, we are talking about an extremely common operation that I think most people (myself included) would not expect to raise an exception: namely, adding a number to an array. > It's a good solution to encourage bug-free code. It may not be a good > solution to avoid typing. Ha! But seriously, checking every time I make an addition? And in the current version of numpy it's not buggy code to add 128 to an int8 array; it's documented to give you an int16 with the result of the addition. Maybe it shouldn't, but that's what it does. > I think you usually have a bug in your program when this happens, since > either the dtype is wrong, or the value one is trying to store is wrong. > I know that's true for myself, though I don't claim to know everybody > elses usecases. I don't think it's unreasonable to add a number to an int16 array (or int32), and rely on specific, documented behavior if the number is outside the range. For example, IDL will clip the value. Up until 1.6, in NumPy it would roll over. Currently it upcasts. I won't make the case for upcasting vs rollover again, as I think that's dealt with extensively in the threads linked in the bug. I am concerned about the tests I need to add wherever I might have a scalar, or the program blows up. It occurs to me that, if I have "a = b + c" in my code, and "c" is sometimes a scalar and sometimes an array, I will get different behavior. If I have this right, if "c" is an array of larger dtype, including a 1-element array, it will upcast, if it's the same dtype, it will roll over regardless, but if it's a scalar and the result won't fit, it will raise ValueError. By the way, how do I test for this? I can't test just the scalar because the proposed behavior (as I understand it) considers the result of the addition. Should I always compute amax (nanmax)? Do I need to try adding them and look for ValueError? And things like this suddenly become dangerous: try: some_function(myarray + something) except ValueError: print "Problem in some_function!" Nathaniel asked: > But if this is something you're running into in practice then you may have a better idea than us about the practical effects. Do you have any examples where this has come up that you can share? The only time I really ran into the 1.5/1.6 change was some old code ported from IDL which did odd things with the wrapping behavior. But what I'm really trying to get a handle on here is the proposed future behavior. I am coming to this from the perspective of both a user and a library developer (h5py) trying to work out what if anything I have to do when handling arrays and values I get from users. Andrew From p.j.a.cock at googlemail.com Thu Jan 3 20:17:42 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 4 Jan 2013 01:17:42 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: On Fri, Jan 4, 2013 at 12:49 AM, Nathaniel Smith wrote: > On 4 Jan 2013 00:39, "Peter Cock" wrote: >> I agree with Dag rather than Andrew, "Explicit is better than implicit". >> i.e. What Nathaniel described earlier as the apparent consensus. >> >> Since I've actually used NumPy arrays with specific low memory >> types, I thought I should comment about my use case if case it >> is helpful: >> >> I've only used the low precision types like np.uint8 (unsigned) where >> I needed to limit my memory usage. In this case, the topology of a >> graph allowing multiple edges held as an integer adjacency matrix, A. >> I would calculate things like A^n for paths of length n, and also make >> changes to A directly (e.g. adding edges). So an overflow was always >> possible, and neither the old behaviour (type preserving but wrapping >> on overflow giving data corruption) nor the current behaviour (type >> promotion overriding my deliberate memory management) are nice. >> My preferences here would be for an exception, so I knew right away. > > I don't think the changes we're talking about here will help your use case > actually; this is only about the specific case where one of your operands, > itself, cannot be cleanly cast to the types being used for the operation - Understood - I replied to your other message before I saw this one. > it won't detect overflow in general. For that you want #593: > https://github.com/numpy/numpy/issues/593 > > On another note, while you're here, perhaps I can tempt you into having a go > at fixing #593? :-) > > -n I agree, and have commented on that issue. Thanks for pointing me to that separate issue. Peter From shish at keba.be Thu Jan 3 21:39:05 2013 From: shish at keba.be (Olivier Delalleau) Date: Thu, 3 Jan 2013 21:39:05 -0500 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: 2013/1/3 Andrew Collette : > Hi Dag, > >> If neither is objectively better, I think that is a very good reason to >> kick it down to the user. "Explicit is better than implicit". > > I agree with you, up to a point. However, we are talking about an > extremely common operation that I think most people (myself included) > would not expect to raise an exception: namely, adding a number to an > array. > >> It's a good solution to encourage bug-free code. It may not be a good >> solution to avoid typing. > > Ha! But seriously, checking every time I make an addition? And in > the current version of numpy it's not buggy code to add 128 to an int8 > array; it's documented to give you an int16 with the result of the > addition. Maybe it shouldn't, but that's what it does. > >> I think you usually have a bug in your program when this happens, since >> either the dtype is wrong, or the value one is trying to store is wrong. >> I know that's true for myself, though I don't claim to know everybody >> elses usecases. > > I don't think it's unreasonable to add a number to an int16 array (or > int32), and rely on specific, documented behavior if the number is > outside the range. For example, IDL will clip the value. Up until > 1.6, in NumPy it would roll over. Currently it upcasts. > > I won't make the case for upcasting vs rollover again, as I think > that's dealt with extensively in the threads linked in the bug. I am > concerned about the tests I need to add wherever I might have a > scalar, or the program blows up. > > It occurs to me that, if I have "a = b + c" in my code, and "c" is > sometimes a scalar and sometimes an array, I will get different > behavior. If I have this right, if "c" is an array of larger dtype, > including a 1-element array, it will upcast, if it's the same dtype, > it will roll over regardless, but if it's a scalar and the result > won't fit, it will raise ValueError. > > By the way, how do I test for this? I can't test just the scalar > because the proposed behavior (as I understand it) considers the > result of the addition. Should I always compute amax (nanmax)? Do I > need to try adding them and look for ValueError? > > And things like this suddenly become dangerous: > > try: > some_function(myarray + something) > except ValueError: > print "Problem in some_function!" Actually, the proposed behavior considers only the value of the scalar, not the result of the addition. So the correct way to do things with this proposal would be to be sure you don't add to an array a scalar value that can't fit in the array's dtype. In 1.6.1, you should make this check anyway, since otherwise your computation can be doing something completely different without telling you (and I doubt it's what you'd want): In [50]: np.array([2], dtype='int8') + 127 Out[50]: array([-127], dtype=int8) In [51]: np.array([2], dtype='int8') + 128 Out[51]: array([130], dtype=int16) If the decision is to always roll-over, the first thing to decide is whether this means the scalar is downcasted, or the output of the computation. It doesn't matter for +, but for instance for the "maximum" ufunc, I don't think it makes sense to perform the computation at higher precision then downcast the output, as you would otherwise have: np.maximum(np.ones(1, dtype='int8'), 128)) == [-128] So out of consistency (across ufuncs) I think it should always downcast the scalar (it has the advantage of being more efficient too, since you don't need to do an upcast to perform the computation). But then you're up for some nasty surprise if your scalar overflows and you didn't expect it. For instance the "maximum" example above would return [1], which may be expected... or not (maybe you wanted to obtain [128] instead?). Another solution is to forget about trying to be smart and always upcast the operation. That would be my 2nd preferred solution, but it would make it very annoying to deal with Python scalars (typically int64 / float64) that would be upcasting lots of things, potentially breaking a significant amount of existing code. So, personally, I don't see a straightforward solution without warning/error, that would be safe enough for programmers. -=- Olivier > > Nathaniel asked: > >> But if this is something you're running into in practice then you may have a better idea than us about the practical effects. Do you have any examples where this has come up that you can share? > > The only time I really ran into the 1.5/1.6 change was some old code > ported from IDL which did odd things with the wrapping behavior. But > what I'm really trying to get a handle on here is the proposed future > behavior. I am coming to this from the perspective of both a user and > a library developer (h5py) trying to work out what if anything I have > to do when handling arrays and values I get from users. > > Andrew From andrew.collette at gmail.com Thu Jan 3 22:35:57 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Thu, 3 Jan 2013 20:35:57 -0700 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi Olivier, > Another solution is to forget about trying to be smart and always > upcast the operation. That would be my 2nd preferred solution, but it > would make it very annoying to deal with Python scalars (typically > int64 / float64) that would be upcasting lots of things, potentially > breaking a significant amount of existing code. > > So, personally, I don't see a straightforward solution without > warning/error, that would be safe enough for programmers. I guess what's really confusing me here is that I had assumed that this: result = myarray + scalar was equivalent to this: result = myarray + numpy.array(scalar) where the dtype of the converted scalar was chosen to be "just big enough" for it to fit. Then you proceed using the normal rules for array addition. Yes, you can have upcasting or rollover depending on the values involved, but you have that anyway with array addition; it's just how arrays work in NumPy. Also, have I got this (proposed behavior) right? array([127], dtype=int8) + 128 -> ValueError array([127], dtype=int8) + 127 -> -2 It seems like all this does is raise an error when the current rules would require upcasting, but still allows rollover for smaller values. What error condition, specifically, is the ValueError designed to tell me about? You can still get "unexpected" data (if you're not expecting rollover) with no exception. Andrew From ondrej.certik at gmail.com Fri Jan 4 00:16:33 2013 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Thu, 3 Jan 2013 21:16:33 -0800 Subject: [Numpy-discussion] test failures when embedded (in matlab) In-Reply-To: References: Message-ID: On Thu, Jan 3, 2013 at 7:54 AM, Robin wrote: > Hi All, > > When using Numpy from an embedded Python (Python embedded in a Matlab > mex function) I get a lot of test failures (see attached log). > > I am using CentOS 6.3, distribution packaged Python (2.6) and Numpy > (1.4.1). Running numpy tests from a normal Python interpreter results > in no errors or failures. > > Most of the failures look to be to do with errors calling Fortran > functions - is it possible there is some linking / ABI problem? Would > there be any way to overcome it? > > I get similar errors using EPD 7.3. > > Any advice appreciated, In your log I can see failures of the type: ====================================================================== ERROR: test_cdouble (test_linalg.TestDet) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python2.6/site-packages/numpy/linalg/tests/test_linalg.py", line 44, in test_cdouble self.do(a, b) File "/usr/lib64/python2.6/site-packages/numpy/linalg/tests/test_linalg.py", line 129, in do d = linalg.det(a) File "/usr/lib64/python2.6/site-packages/numpy/linalg/linalg.py", line 1503, in det raise TypeError, "Illegal input to Fortran routine" TypeError: Illegal input to Fortran routine So I can only offer a general advice, that I learned while fixing release critical bugs in NumPy: I would look into the source file numpy/linalg/linalg.py, line 1503 and start debugging to figure out why the TypeError is raised. Which exact numpy do you use? In the latest master, the line numbers are different and the det() routine seems to be reworked. But in general, you can see there a code like this: results = lapack_routine(n, n, a, n, pivots, 0) info = results['info'] if (info < 0): raise TypeError("Illegal input to Fortran routine") so that typically means that some wrong argument are being passed to the Lapack routine. Try to print the "info" variable and then lookup the Lapack documentation, it should say more (e.g. which exact argument is wrong). Then you can go from there, e.g. I would put some debug print statements into the code which gets called in lapack_routine(), i.e. is it lapack_lite from NumPy, or some other Lapack implementation? And so on. Ondrej From mike.r.anderson.13 at gmail.com Fri Jan 4 01:29:39 2013 From: mike.r.anderson.13 at gmail.com (Mike Anderson) Date: Fri, 4 Jan 2013 14:29:39 +0800 Subject: [Numpy-discussion] Insights / lessons learned from NumPy design Message-ID: Hello all, In the Clojure community there has been some discussion about creating a common matrix maths library / API. Currently there are a few different fledgeling matrix libraries in Clojure, so it seemed like a worthwhile effort to unify them and have a common base on which to build on. NumPy has been something of an inspiration for this, so I though I'd ask here to see what lessons have been learned. We're thinking of a matrix library with roughly the following design (subject to change!) - Support for multi-dimensional matrices (but with fast paths for 1D vectors and 2D matrices as the common cases) - Immutability by default, i.e. matrix operations are pure functions that create new matrices. There could be a "backdoor" option to mutate matrices, but that would be unidiomatic in Clojure - Support for 64-bit double precision floats only (this is the standard float type in Clojure) - Ability to support multiple different back-end matrix implementations (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.) - A full range of matrix operations. Operations would be delegated to back end implementations where they are supported, otherwise generic implementations could be used. Any thoughts on this topic based on the NumPy experience? In particular would be very interesting to know: - Features in NumPy which proved to be redundant / not worth the effort - Features that you wish had been designed in at the start - Design decisions that turned out to be a particularly big mistake / success Would love to hear your insights, any ideas+advice greatly appreciated! Mike. -------------- next part -------------- An HTML attachment was scrubbed... URL: From raul at virtualmaterials.com Fri Jan 4 01:50:37 2013 From: raul at virtualmaterials.com (Raul Cota) Date: Thu, 03 Jan 2013 23:50:37 -0700 Subject: [Numpy-discussion] Numpy speed ups to simple tasks - final findings and suggestions In-Reply-To: References: <50D4B69A.7000409@virtualmaterials.com> Message-ID: <50E67BBD.7090804@virtualmaterials.com> On 02/01/2013 7:56 AM, Nathaniel Smith wrote: > On Fri, Dec 21, 2012 at 7:20 PM, Raul Cota wrote: >> Hello, >> >> >> On Dec/2/2012 I sent an email about some meaningful speed problems I was >> facing when porting our core program from Numeric (Python 2.2) to Numpy >> (Python 2.6). Some of our tests went from 30 seconds to 90 seconds for >> example. > > Hi Raul, > > This is great work! Sorry you haven't gotten any feedback yet -- I > guess it's a busy time of year for most people; and, the way you've > described your changes makes it hard for us to use our usual workflow > to discuss them. > Sorry about that. >> These are the actual changes to the C code, >> For bottleneck (a) >> >> In general, >> - avoid calls to PyObject_GetAttrString when I know the type is >> List, None, Tuple, Float, Int, String or Unicode >> >> - avoid calls to PyObject_GetBuffer when I know the type is >> List, None or Tuple > > This definitely seems like a worthwhile change. There are possible > quibbles about coding style -- the macros could have better names, and > would probably be better as (inline) functions instead of macros -- > but that can be dealt with. > > Can you make a pull request on github with these changes? I guess you > haven't used git before, but I think you'll find it makes things > *much* easier (in particular, you'll never have to type out long > awkward english descriptions of the changes you made ever again!) We > have docs here: > http://docs.scipy.org/doc/numpy/dev/gitwash/git_development.html > and your goal is to get to the point where you can file a "pull request": > http://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html#asking-for-your-changes-to-be-merged-with-the-main-repo > Feel free to ask on the list if you get stuck of course. > >> For bottleneck (b) >> >> b.1) >> I noticed that PyFloat * Float64 resulted in an unnecessary "on the fly" >> conversion of the PyFloat into a Float64 to extract its underlying C >> double value. This happened in the function >> _double_convert_to_ctype which comes from the pattern, >> _ at name@_convert_to_ctype > > This also sounds like an excellent change, and perhaps should be > extended to ints and bools as well... again, can you file a pull > request? > >> b.2) This is the change that may not be very popular among Numpy users. >> I modified Float64 operations to return a Float instead of Float64. I >> could not think or see any ill effects and I got a fairly decent speed >> boost. > > Yes, unfortunately, there's no way we'll be able to make this change > upstream -- there's too much chance of it breaking people's code. (And > numpy float64's do act different than python floats in at least some > cases, e.g., numpy gives more powerful control over floating point > error handling, see np.seterr.) I thought so. I may keep a fork of the changes for myself. > > But, it's almost certainly possible to optimize numpy's float64 (and > friends), so that they are themselves (almost) as fast as the native > python objects. And that would help all the code that uses them, not > just the ones where regular python floats could be substituted > instead. Have you tried profiling, say, float64 * float64 to figure > out where the bottlenecks are? > Seems to be split between - (primarily) the memory allocation/deallocation of the float64 that is created from the operation float64 * float64. This is the reason why float64 * Pyfloat got improved with one of my changes because PyFloat was being internally converted into a float64 before doing the multiplication. - the rest of the time is the actual multiplication path way. I attach an image of the profiler using the original numpy code with a loop on val = float64 * float64 * float64 * float64 Let me know if something is not clear. Raul > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: numpy_prof.png Type: image/png Size: 39190 bytes Desc: not available URL: From raul at virtualmaterials.com Fri Jan 4 01:56:03 2013 From: raul at virtualmaterials.com (Raul Cota) Date: Thu, 03 Jan 2013 23:56:03 -0700 Subject: [Numpy-discussion] Numpy speed ups to simple tasks - final findings and suggestions In-Reply-To: References: <50D4B69A.7000409@virtualmaterials.com> Message-ID: <50E67D03.6000403@virtualmaterials.com> On 02/01/2013 7:58 AM, Nathaniel Smith wrote: > On Wed, Jan 2, 2013 at 2:56 PM, Nathaniel Smith wrote: >> On Fri, Dec 21, 2012 at 7:20 PM, Raul Cota wrote: >>> b.1) >>> I noticed that PyFloat * Float64 resulted in an unnecessary "on the fly" >>> conversion of the PyFloat into a Float64 to extract its underlying C >>> double value. This happened in the function >>> _double_convert_to_ctype which comes from the pattern, >>> _ at name@_convert_to_ctype >> >> This also sounds like an excellent change, and perhaps should be >> extended to ints and bools as well... again, can you file a pull >> request? > > Immediately after I hit 'send' I realized this might be unclear... > what I mean is, please file two separate pull requests, one for the > (a) changes and one for the (b.1) changes. They're logically separate > so it'll be easier to review and merge them separately. > I understood it like that :) I will give it a try. Thanks for the feedback Raul > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From d.s.seljebotn at astro.uio.no Fri Jan 4 03:00:42 2013 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 04 Jan 2013 09:00:42 +0100 Subject: [Numpy-discussion] Insights / lessons learned from NumPy design In-Reply-To: References: Message-ID: <50E68C2A.9060400@astro.uio.no> On 01/04/2013 07:29 AM, Mike Anderson wrote: > Hello all, > > In the Clojure community there has been some discussion about creating a > common matrix maths library / API. Currently there are a few different > fledgeling matrix libraries in Clojure, so it seemed like a worthwhile > effort to unify them and have a common base on which to build on. > > NumPy has been something of an inspiration for this, so I though I'd ask > here to see what lessons have been learned. > > We're thinking of a matrix library with roughly the following design > (subject to change!) > - Support for multi-dimensional matrices (but with fast paths for 1D > vectors and 2D matrices as the common cases) Food for thought: Myself I have vectors that are naturally stored in 2D, "matrices" that can be naturally stored in 4D and so on (you can't view them that way when doing linear algebra, it's just that the indices can have multiple components) -- I like that NumPy calls everything "array"; I think vector and matrix are higher-level mathematical concepts. > - Immutability by default, i.e. matrix operations are pure functions > that create new matrices. There could be a "backdoor" option to mutate > matrices, but that would be unidiomatic in Clojure Sounds very promising (assuming you can reuse the buffer if the input matrix had no other references and is not used again?). It's very common for NumPy arrays to fill a large chunk of the available memory (think 20-100 GB), so for those users this would need to be coupled with buffer reuse and good diagnostics that help remove references to old generations of a matrix. > - Support for 64-bit double precision floats only (this is the standard > float type in Clojure) > - Ability to support multiple different back-end matrix implementations > (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.) > - A full range of matrix operations. Operations would be delegated to > back end implementations where they are supported, otherwise generic > implementations could be used. > > Any thoughts on this topic based on the NumPy experience? In particular > would be very interesting to know: > - Features in NumPy which proved to be redundant / not worth the effort > - Features that you wish had been designed in at the start > - Design decisions that turned out to be a particularly big mistake / > success > > Would love to hear your insights, any ideas+advice greatly appreciated! Travis Oliphant noted some of his thoughts on this in the recent thread "DARPA funding for Blaze and passing the NumPy torch" which is a must-read. Dag Sverre From d.s.seljebotn at astro.uio.no Fri Jan 4 03:13:06 2013 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 04 Jan 2013 09:13:06 +0100 Subject: [Numpy-discussion] Insights / lessons learned from NumPy design In-Reply-To: <50E68C2A.9060400@astro.uio.no> References: <50E68C2A.9060400@astro.uio.no> Message-ID: <50E68F12.90804@astro.uio.no> On 01/04/2013 09:00 AM, Dag Sverre Seljebotn wrote: > On 01/04/2013 07:29 AM, Mike Anderson wrote: >> Hello all, >> >> In the Clojure community there has been some discussion about creating a >> common matrix maths library / API. Currently there are a few different >> fledgeling matrix libraries in Clojure, so it seemed like a worthwhile >> effort to unify them and have a common base on which to build on. >> >> NumPy has been something of an inspiration for this, so I though I'd ask >> here to see what lessons have been learned. >> >> We're thinking of a matrix library with roughly the following design >> (subject to change!) >> - Support for multi-dimensional matrices (but with fast paths for 1D >> vectors and 2D matrices as the common cases) > > Food for thought: Myself I have vectors that are naturally stored in 2D, > "matrices" that can be naturally stored in 4D and so on (you can't view > them that way when doing linear algebra, it's just that the indices can > have multiple components) -- I like that NumPy calls everything "array"; > I think vector and matrix are higher-level mathematical concepts. > >> - Immutability by default, i.e. matrix operations are pure functions >> that create new matrices. There could be a "backdoor" option to mutate >> matrices, but that would be unidiomatic in Clojure > > Sounds very promising (assuming you can reuse the buffer if the input > matrix had no other references and is not used again?). It's very common > for NumPy arrays to fill a large chunk of the available memory (think > 20-100 GB), so for those users this would need to be coupled with buffer > reuse and good diagnostics that help remove references to old > generations of a matrix. Oh: Depending on your amibitions, it's worth thinking hard about i) storage format, and ii) lazy evaluation. Storage format: The new trend is for more flexible formats than just column-major/row-major, e.g., storing cache-sized n-dimensional tiles. Lazy evaluation: The big problem with numpy is that "a + b + np.sqrt(c)" will first make a temporary result for "a + b", rather than doing the whole expression on the fly, which is *very* bad for performance. So if you want immutability, I urge you to consider every operation to build up an expression tree/"program", and then either find out the smart points where you interpret that program automatically, or make explicit eval() of an expression tree the default mode. Of course this depends all on how ambitious you are. It's probably best to have a look at all the projects designed in order to get around NumPy's short-comings: - Blaze (in development, continuum.io) - Theano - Numexpr Related: - HDF chunks - To some degree Cython Dag Sverre > >> - Support for 64-bit double precision floats only (this is the standard >> float type in Clojure) >> - Ability to support multiple different back-end matrix implementations >> (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.) >> - A full range of matrix operations. Operations would be delegated to >> back end implementations where they are supported, otherwise generic >> implementations could be used. >> >> Any thoughts on this topic based on the NumPy experience? In particular >> would be very interesting to know: >> - Features in NumPy which proved to be redundant / not worth the effort >> - Features that you wish had been designed in at the start >> - Design decisions that turned out to be a particularly big mistake / >> success >> >> Would love to hear your insights, any ideas+advice greatly appreciated! > > Travis Oliphant noted some of his thoughts on this in the recent thread > "DARPA funding for Blaze and passing the NumPy torch" which is a must-read. > > Dag Sverre From matthew.brett at gmail.com Fri Jan 4 06:09:23 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 4 Jan 2013 11:09:23 +0000 Subject: [Numpy-discussion] Scalar casting rules use-case reprise Message-ID: Hi, Reading the discussion on the scalar casting rule change I realized I was hazy on the use-cases that led to the rule that scalars cast differently from arrays. My impression was that the primary use-case was for lower-precision floats. That is, when you have a large float32 arr, you do not want to double your memory use with: >>> large_float32 + 1.0 # please no float64 here Probably also: >>> large_int8 + 1 # please no int32 / int64 here. That makes sense. On the other hand these are more ambiguous: >>> large_float32 + np.float64(1) # really - you don't want float64? >>> large_int8 + np.int32(1) # ditto I wonder whether the main use-case was to deal with the automatic types of Python floats and scalars? That is, I wonder whether it would be worth considering (in the distant long term), doing fancy guess-what-you-mean stuff with Python scalars, on the basis that they are of unspecified dtype, and make 0 dimensional scalars follow the array casting rules. As in: >>> large_float32 + 1.0 # no upcast - we don't know what float type you meant for the scalar >>> large_float32 + np.float64(1) # upcast - you clearly meant the scalar to be float64 In any case, can anyone remember the original use-cases well enough to record them for future decision making? Best, Matthew From njs at pobox.com Fri Jan 4 08:46:40 2013 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 4 Jan 2013 13:46:40 +0000 Subject: [Numpy-discussion] Scalar casting rules use-case reprise In-Reply-To: References: Message-ID: On Fri, Jan 4, 2013 at 11:09 AM, Matthew Brett wrote: > Hi, > > Reading the discussion on the scalar casting rule change I realized I > was hazy on the use-cases that led to the rule that scalars cast > differently from arrays. > > My impression was that the primary use-case was for lower-precision > floats. That is, when you have a large float32 arr, you do not want to > double your memory use with: > >>>> large_float32 + 1.0 # please no float64 here > > Probably also: > >>>> large_int8 + 1 # please no int32 / int64 here. > > That makes sense. On the other hand these are more ambiguous: > >>>> large_float32 + np.float64(1) # really - you don't want float64? > >>>> large_int8 + np.int32(1) # ditto > > I wonder whether the main use-case was to deal with the automatic > types of Python floats and scalars? That is, I wonder whether it > would be worth considering (in the distant long term), doing fancy > guess-what-you-mean stuff with Python scalars, on the basis that they > are of unspecified dtype, and make 0 dimensional scalars follow the > array casting rules. As in: > >>>> large_float32 + 1.0 > # no upcast - we don't know what float type you meant for the scalar >>>> large_float32 + np.float64(1) > # upcast - you clearly meant the scalar to be float64 Hmm, but consider this, which is exactly the operation in your example: In [9]: a = np.arange(3, dtype=np.float32) In [10]: a / np.mean(a) # normalize Out[10]: array([ 0., 1., 2.], dtype=float32) In [11]: type(np.mean(a)) Out[11]: numpy.float64 Obviously the most common situation where it's useful to have the rule to ignore scalar width is for avoiding "width contamination" from Python float and int literals. But you can easily end up with numpy scalars from indexing, high-precision operations like np.mean, etc., where you don't "really mean" you want high-precision. And at least it's easy to understand the rule: same-kind scalars don't affect precision. ...Though arguably the bug here is that np.mean actually returns a value with higher precision. Interestingly, we seem to have some special cases so that if you want to normalize each row of a matrix, then again the dtype is preserved, but for a totally different reasons. In a = np.arange(4, dtype=np.float32).reshape((2, 2)) a / np.mean(a, axis=0, keepdims=True) the result has float32 type, even though this is an array/array operation, not an array/scalar operation. The reason is: In [32]: np.mean(a).dtype Out[32]: dtype('float64') But: In [33]: np.mean(a, axis=0).dtype Out[33]: dtype('float32') In this respect np.var and np.std behave like np.mean, but np.sum always preserves the input dtype. (Which is curious because np.sum is just like np.mean in terms of potential loss of precision, right? The problem in np.mean is the accumulating error over many addition operations, not the divide-by-n at the end.) It is very disturbing that even after this discussion none of us here seem to actually have a precise understanding of how the numpy type selection system actually works :-(. We really need a formal description... -n From d.s.seljebotn at astro.uio.no Fri Jan 4 09:01:15 2013 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 04 Jan 2013 15:01:15 +0100 Subject: [Numpy-discussion] Scalar casting rules use-case reprise In-Reply-To: References: Message-ID: <50E6E0AB.5090904@astro.uio.no> On 01/04/2013 02:46 PM, Nathaniel Smith wrote: > On Fri, Jan 4, 2013 at 11:09 AM, Matthew Brett wrote: >> Hi, >> >> Reading the discussion on the scalar casting rule change I realized I >> was hazy on the use-cases that led to the rule that scalars cast >> differently from arrays. >> >> My impression was that the primary use-case was for lower-precision >> floats. That is, when you have a large float32 arr, you do not want to >> double your memory use with: >> >>>>> large_float32 + 1.0 # please no float64 here >> >> Probably also: >> >>>>> large_int8 + 1 # please no int32 / int64 here. >> >> That makes sense. On the other hand these are more ambiguous: >> >>>>> large_float32 + np.float64(1) # really - you don't want float64? >> >>>>> large_int8 + np.int32(1) # ditto >> >> I wonder whether the main use-case was to deal with the automatic >> types of Python floats and scalars? That is, I wonder whether it >> would be worth considering (in the distant long term), doing fancy >> guess-what-you-mean stuff with Python scalars, on the basis that they >> are of unspecified dtype, and make 0 dimensional scalars follow the >> array casting rules. As in: >> >>>>> large_float32 + 1.0 >> # no upcast - we don't know what float type you meant for the scalar >>>>> large_float32 + np.float64(1) >> # upcast - you clearly meant the scalar to be float64 > > Hmm, but consider this, which is exactly the operation in your example: > > In [9]: a = np.arange(3, dtype=np.float32) > > In [10]: a / np.mean(a) # normalize > Out[10]: array([ 0., 1., 2.], dtype=float32) > > In [11]: type(np.mean(a)) > Out[11]: numpy.float64 > > Obviously the most common situation where it's useful to have the rule > to ignore scalar width is for avoiding "width contamination" from > Python float and int literals. But you can easily end up with numpy > scalars from indexing, high-precision operations like np.mean, etc., > where you don't "really mean" you want high-precision. And at least > it's easy to understand the rule: same-kind scalars don't affect > precision. > > ...Though arguably the bug here is that np.mean actually returns a > value with higher precision. Interestingly, we seem to have some > special cases so that if you want to normalize each row of a matrix, > then again the dtype is preserved, but for a totally different > reasons. In > > a = np.arange(4, dtype=np.float32).reshape((2, 2)) > a / np.mean(a, axis=0, keepdims=True) > > the result has float32 type, even though this is an array/array > operation, not an array/scalar operation. The reason is: > > In [32]: np.mean(a).dtype > Out[32]: dtype('float64') > > But: > > In [33]: np.mean(a, axis=0).dtype > Out[33]: dtype('float32') > > In this respect np.var and np.std behave like np.mean, but np.sum > always preserves the input dtype. (Which is curious because np.sum is > just like np.mean in terms of potential loss of precision, right? The > problem in np.mean is the accumulating error over many addition > operations, not the divide-by-n at the end.) > > It is very disturbing that even after this discussion none of us here > seem to actually have a precise understanding of how the numpy type > selection system actually works :-(. We really need a formal > description... I think this is a usability wart -- if you don't understand, then newcomers certainly don't. Very naive question: If one is re-doing this anyway, how important are the primitive (non-record) NumPy scalars at all? How much would break if one simply always uses Python's int and double, declare that scalars never interacts with the dtype? a) any computation returning scalars can return float()/int() b) float() are silently truncated to float32 c) integral values that don't fit either wrap around/truncates/raises error d) the only things that determines dtype is the dtypes of arrays, never scalars Too naive? I guess the opposite idea is what Travis mentioned in his passing-the-torch post, about making scalars and 0-d-arrays the same. Dag Sverre From shish at keba.be Fri Jan 4 09:03:02 2013 From: shish at keba.be (Olivier Delalleau) Date: Fri, 4 Jan 2013 09:03:02 -0500 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: 2013/1/3 Andrew Collette : >> Another solution is to forget about trying to be smart and always >> upcast the operation. That would be my 2nd preferred solution, but it >> would make it very annoying to deal with Python scalars (typically >> int64 / float64) that would be upcasting lots of things, potentially >> breaking a significant amount of existing code. >> >> So, personally, I don't see a straightforward solution without >> warning/error, that would be safe enough for programmers. > > I guess what's really confusing me here is that I had assumed that this: > > result = myarray + scalar > > was equivalent to this: > > result = myarray + numpy.array(scalar) > > where the dtype of the converted scalar was chosen to be "just big > enough" for it to fit. Then you proceed using the normal rules for > array addition. Yes, you can have upcasting or rollover depending on > the values involved, but you have that anyway with array addition; > it's just how arrays work in NumPy. A key difference is that with arrays, the dtype is not chosen "just big enough" for your data to fit. Either you set the dtype yourself, or you're using the default inferred dtype (int/float). In both cases you should know what to expect, and it doesn't depend on the actual numeric values (except for the auto int/float distinction). > > Also, have I got this (proposed behavior) right? > > array([127], dtype=int8) + 128 -> ValueError > array([127], dtype=int8) + 127 -> -2 > > It seems like all this does is raise an error when the current rules > would require upcasting, but still allows rollover for smaller values. > What error condition, specifically, is the ValueError designed to > tell me about? You can still get "unexpected" data (if you're not > expecting rollover) with no exception. The ValueError is here to warn you that the operation may not be doing what you want. The rollover for smaller values would be the documented (and thus hopefully expected) behavior. Taking the addition as an example may be misleading, as it makes it look like we could just "always rollover" to obtain consistent behavior, and programmers are to some extent used to integer rollover on this kind of operation. However, I gave examples with "maximum" that I believe show it's not that easy (this behavior would just appear "wrong"). Another example is with the integer division, where casting the scalar silently would result in array([-128], dtype=int8) // 128 -> [1] which is unlikely to be something someone would like to obtain. To summarize the goals of the proposal (in my mind): 1. Low cognitive load (simple and consistent across ufuncs). 2. Low risk of doing something unexpected. 3. Efficient by default. 4. Most existing (non buggy) code should not be affected. If we always do the silent cast, it will significantly break existing code relying on the 1.6 behavior, and increases the risk of doing something unexpected (bad on #2 & #4) If we always upcast, we may break existing code and lose efficiency (bad on #3 and #4). If we keep current behavior, we stay with something that's difficult to understand and has high risk of doing weird things (bad on #1 and #2). -=- Olivier From shish at keba.be Fri Jan 4 09:34:29 2013 From: shish at keba.be (Olivier Delalleau) Date: Fri, 4 Jan 2013 09:34:29 -0500 Subject: [Numpy-discussion] Scalar casting rules use-case reprise In-Reply-To: References: Message-ID: 2013/1/4 Nathaniel Smith : > On Fri, Jan 4, 2013 at 11:09 AM, Matthew Brett wrote: >> Hi, >> >> Reading the discussion on the scalar casting rule change I realized I >> was hazy on the use-cases that led to the rule that scalars cast >> differently from arrays. >> >> My impression was that the primary use-case was for lower-precision >> floats. That is, when you have a large float32 arr, you do not want to >> double your memory use with: >> >>>>> large_float32 + 1.0 # please no float64 here >> >> Probably also: >> >>>>> large_int8 + 1 # please no int32 / int64 here. >> >> That makes sense. On the other hand these are more ambiguous: >> >>>>> large_float32 + np.float64(1) # really - you don't want float64? >> >>>>> large_int8 + np.int32(1) # ditto >> >> I wonder whether the main use-case was to deal with the automatic >> types of Python floats and scalars? That is, I wonder whether it >> would be worth considering (in the distant long term), doing fancy >> guess-what-you-mean stuff with Python scalars, on the basis that they >> are of unspecified dtype, and make 0 dimensional scalars follow the >> array casting rules. As in: >> >>>>> large_float32 + 1.0 >> # no upcast - we don't know what float type you meant for the scalar >>>>> large_float32 + np.float64(1) >> # upcast - you clearly meant the scalar to be float64 > > Hmm, but consider this, which is exactly the operation in your example: > > In [9]: a = np.arange(3, dtype=np.float32) > > In [10]: a / np.mean(a) # normalize > Out[10]: array([ 0., 1., 2.], dtype=float32) > > In [11]: type(np.mean(a)) > Out[11]: numpy.float64 > > Obviously the most common situation where it's useful to have the rule > to ignore scalar width is for avoiding "width contamination" from > Python float and int literals. But you can easily end up with numpy > scalars from indexing, high-precision operations like np.mean, etc., > where you don't "really mean" you want high-precision. And at least > it's easy to understand the rule: same-kind scalars don't affect > precision. > > ...Though arguably the bug here is that np.mean actually returns a > value with higher precision. Interestingly, we seem to have some > special cases so that if you want to normalize each row of a matrix, > then again the dtype is preserved, but for a totally different > reasons. In > > a = np.arange(4, dtype=np.float32).reshape((2, 2)) > a / np.mean(a, axis=0, keepdims=True) > > the result has float32 type, even though this is an array/array > operation, not an array/scalar operation. The reason is: > > In [32]: np.mean(a).dtype > Out[32]: dtype('float64') > > But: > > In [33]: np.mean(a, axis=0).dtype > Out[33]: dtype('float32') > > In this respect np.var and np.std behave like np.mean, but np.sum > always preserves the input dtype. (Which is curious because np.sum is > just like np.mean in terms of potential loss of precision, right? The > problem in np.mean is the accumulating error over many addition > operations, not the divide-by-n at the end.) IMO having a different dtype depending on whether or not you provide the "axis" argument to mean() should be considered as a bug. As to what the correct dtype should be... it's not such an easy question. Personally I would go with float64 by default to be consistent across all int / float dtypes. Then someone who wants to downcast it can use the "out" argument to mean(). To come back to Matthew's use-case question, I agree the most common use case is to prevent a float32 or small int array from being upcasted, and most of the time this would come from Python scalars. However I don't think it's a good idea to have a behavior that is different between Python and Numpy scalars, because it's a subtle difference that users could have trouble understanding & foreseeing. The expected behavior of numpy functions when providing them with non-numpy objects is they should behave the same as if we had called numpy.asarray() on these objects, and straying away from this behavior seems dangerous to me. As far as I'm concerned, in a world where numpy would be brand new with no existing codebase using it, I would probably prefer to use the same casting rules for array/array and array/scalar operations. It may cause some unwanted array upcasting, but it's a lot simpler to understand. However, given that there may be a lot of code relying on the current dtype-preserving behavior, doing it now doesn't sound like a good idea to me. -=- Olivier From williamj at tenbase2.com Fri Jan 4 10:26:09 2013 From: williamj at tenbase2.com (William Johnston) Date: Fri, 4 Jan 2013 10:26:09 -0500 Subject: [Numpy-discussion] still need DLR support Message-ID: <8BA33575ACC64F408670B645AD89C584@leviathan> Hello, I posted some time ago that I need Numpy for .NET for a C# DLR app. Has anyone made any progress on this? May I suggest this as a project? Thank you. Sincerely, William Johnston -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew.collette at gmail.com Fri Jan 4 11:01:23 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Fri, 4 Jan 2013 09:01:23 -0700 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi Olivier, > A key difference is that with arrays, the dtype is not chosen "just > big enough" for your data to fit. Either you set the dtype yourself, > or you're using the default inferred dtype (int/float). In both cases > you should know what to expect, and it doesn't depend on the actual > numeric values (except for the auto int/float distinction). Yes, certainly; for example, you would get an int32/int64 if you simply do "array(4)". What I mean is, when you do "a+b" and b is a scalar, I had assumed that the normal array rules for addition apply, if you treat the dtype of b as being the smallest precision possible which can hold that value. E.g. 1 (int8) + 42 would treat 42 as an int8, and 1 (int8) + 200 would treat 200 as an int16. If I'm not mistaken, this is what happens currently. As far as knowing what to expect, well, as a library author I don't control what my users supply. I have to write conditional code to deal with things like this, and that's my interest in this issue. One way or another I have to handle it, correctly, and I'm trying to get a handle on what that means. > The ValueError is here to warn you that the operation may not be doing > what you want. The rollover for smaller values would be the documented > (and thus hopefully expected) behavior. Right, but what confuses me is that the only thing this prevents is the current upcast behavior. Why is that so evil it should be replaced with an exception? > Taking the addition as an example may be misleading, as it makes it > look like we could just "always rollover" to obtain consistent > behavior, and programmers are to some extent used to integer rollover > on this kind of operation. However, I gave examples with "maximum" > that I believe show it's not that easy (this behavior would just > appear "wrong"). Another example is with the integer division, where > casting the scalar silently would result in > array([-128], dtype=int8) // 128 -> [1] > which is unlikely to be something someone would like to obtain. But with the rule I outlined, this would be treated as: array([-128], dtype=int8) // array([128], dtype=int16) -> -1 (int16) > To summarize the goals of the proposal (in my mind): > 1. Low cognitive load (simple and consistent across ufuncs). > 2. Low risk of doing something unexpected. > 3. Efficient by default. > 4. Most existing (non buggy) code should not be affected. > > If we always do the silent cast, it will significantly break existing > code relying on the 1.6 behavior, and increases the risk of doing > something unexpected (bad on #2 & #4) > If we always upcast, we may break existing code and lose efficiency > (bad on #3 and #4). > If we keep current behavior, we stay with something that's difficult > to understand and has high risk of doing weird things (bad on #1 and > #2). I suppose what really concerns me here is, with respect to #2, addition raising ValueError is really unexpected (at least to me). I don't have control over the values my users pass to me, which means that I am going to have to carefully check for the presence of scalars and use either numpy.add or explicitly cast to a single-element array before performing addition (or, as you point out, any similar operation). >From a more basic perspective, I think that adding a number to an array should never raise an exception. I've not used any other language in which this behavior takes place. In C, you have rollover behavior, in IDL you roll over or clip, and in NumPy you either roll or upcast, depending on the version. IDL, etc. manage to handle things like max() or total() in a sensible (or at least defensible) fashion, and without raising an error. Andrew From shish at keba.be Fri Jan 4 11:34:34 2013 From: shish at keba.be (Olivier Delalleau) Date: Fri, 4 Jan 2013 11:34:34 -0500 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: (sorry, no time for full reply, so for now just answering what I believe is the main point) 2013/1/4 Andrew Collette : >> The ValueError is here to warn you that the operation may not be doing >> what you want. The rollover for smaller values would be the documented >> (and thus hopefully expected) behavior. > > Right, but what confuses me is that the only thing this prevents is > the current upcast behavior. Why is that so evil it should be > replaced with an exception? The evilness lies in the silent switch between the rollover and upcast behavior, as in the example I gave previously: In [50]: np.array([2], dtype='int8') + 127 Out[50]: array([-127], dtype=int8) In [51]: np.array([2], dtype='int8') + 128 Out[51]: array([130], dtype=int16) If the scalar is the user-supplied value, it's likely you actually want a fixed behavior (either rollover or upcast) regardless of the numeric value being provided. Looking at what other numeric libraries are doing is definitely a good suggestion. -=- Olivier From matthew.brett at gmail.com Fri Jan 4 11:54:09 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 4 Jan 2013 16:54:09 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi, On Fri, Jan 4, 2013 at 4:01 PM, Andrew Collette wrote: > >From a more basic perspective, I think that adding a number to an > array should never raise an exception. I've not used any other > language in which this behavior takes place. In C, you have rollover > behavior, in IDL you roll over or clip, and in NumPy you either roll > or upcast, depending on the version. IDL, etc. manage to handle > things like max() or total() in a sensible (or at least defensible) > fashion, and without raising an error. That's a reasonable point. Looks like we lost consensus. What about returning to the 1.5 behavior instead? Best, Matthew From njs at pobox.com Fri Jan 4 11:54:27 2013 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 4 Jan 2013 16:54:27 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: On Fri, Jan 4, 2013 at 4:01 PM, Andrew Collette wrote: > Hi Olivier, > >> A key difference is that with arrays, the dtype is not chosen "just >> big enough" for your data to fit. Either you set the dtype yourself, >> or you're using the default inferred dtype (int/float). In both cases >> you should know what to expect, and it doesn't depend on the actual >> numeric values (except for the auto int/float distinction). > > Yes, certainly; for example, you would get an int32/int64 if you > simply do "array(4)". What I mean is, when you do "a+b" and b is a > scalar, I had assumed that the normal array rules for addition apply, > if you treat the dtype of b as being the smallest precision possible > which can hold that value. E.g. 1 (int8) + 42 would treat 42 as an > int8, and 1 (int8) + 200 would treat 200 as an int16. If I'm not > mistaken, this is what happens currently. Well, that's the thing... there is actually *no* version of numpy where the "normal rules" apply to scalars. If a = np.array([1, 2, 3], dtype=np.uint8) then in numpy 1.5 and earlier we had # Python scalars (a / 1).dtype == np.uint8 (a / 300).dtype == np.uint8 # Numpy scalars (a / np.int_(1)) == np.uint8 (a / np.int_(300)) == np.uint8 # Arrays (a / [1]).dtype == np.int_ (a / [300]).dtype == np.int_ In 1.6 we have: # Python scalars (a / 1).dtype == np.uint8 (a / 300).dtype == np.uint16 # Numpy scalars (a / np.int_(1)) == np.uint8 (a / np.int_(300)) == np.uint16 # Arrays (a / [1]).dtype == np.int_ (a / [1]).dtype == np.int_ In fact in 1.6 there is no assignment of a dtype to '1' which makes the way 1.6 handles it consistent with the array rules: # Ah-hah, it looks like '1' has a uint8 dtype: (np.ones(2, dtype=np.uint8) / np.ones(2, dtype=np.uint8)).dtype == np.uint8 (np.ones(2, dtype=np.uint8) / 1).dtype == np.uint8 # But wait! No it doesn't! (np.ones(2, dtype=np.int8) / np.ones(2, dtype=np.uint8)).dtype == np.int16 (np.ones(2, dtype=np.int8) / 1).dtype == np.int8 # Apparently in this case it has an int8 dtype instead. (np.ones(2, dtype=np.int8) / np.ones(2, dtype=np.int8)).dtype == np.int8 In 1.5, the special rule for (same-kind) scalars is that we always cast them to the array's type. In 1.6, the special rule for (same-kind) scalars is that we cast them to some type which is a function of the array's type, and the scalar's value, but not the scalar's type. This is especially confusing because normally in numpy the *only* way to get a dtype that is not in the set [np.bool, np.int_, np.float64, np.complex128, np.object_] (the dtypes produced by np.array(pyobj)) is to explicitly request it by name. So if you're memory-constrained, a useful mental model is to think that there are two types of arrays: your compact ones that use the specific limited-precision type you've picked (uint8, float32, whichever), and "regular" arrays, which use machine precision. And all you have to keep track of is the interaction between these. But in 1.6, as soon as you have a uint8 array, suddenly all the other precisions might spring magically into being at any moment. So options: If we require that new dtypes shouldn't be suddenly introduced then we have to pick from: 1) a / 300 silently rolls over the 300 before attempting the operation (1.5-style) 2) a / 300 upcasts to machine precision (use the same rules for arrays and scalars) 3) a / 300 gives an error (the proposal you don't like) If we instead treat a Python scalar like 1 as having the smallest precision dtype that can hold its value, then we have to accept either uint8 + 1 -> uint16 or int8 + 1 -> int16 Or there's the current code, whose behaviour no-one actually understands. (And I mean that both figuratively -- it's clearly confusing enough that people won't be able to remember it well in practice -- and literally -- even we developers don't know what it will do without running it to see.) -n From andrew.collette at gmail.com Fri Jan 4 11:57:05 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Fri, 4 Jan 2013 09:57:05 -0700 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi, > (sorry, no time for full reply, so for now just answering what I > believe is the main point) Thanks for taking the time to discuss/explain this at all... I appreciate it. > The evilness lies in the silent switch between the rollover and upcast > behavior, as in the example I gave previously: > > In [50]: np.array([2], dtype='int8') + 127 > Out[50]: array([-127], dtype=int8) > In [51]: np.array([2], dtype='int8') + 128 > Out[51]: array([130], dtype=int16) Right, but for better or for worse this is how *array* addition works. If I have an int16 array in my program, and I add a user-supplied array to it, I get rollover if they supply an int16 array and upcasting if they provide an int32. The answer may simply be that we consider scalar addition a special case; I think that's really what tripping me up here. Granted, one is a type-dependent change while the other is a value-dependent change; but in my head they were connected by the rules for choosing a "effective" dtype for a scalar based on its value. > If the scalar is the user-supplied value, it's likely you actually > want a fixed behavior (either rollover or upcast) regardless of the > numeric value being provided. This is a good point; thanks. > Looking at what other numeric libraries are doing is definitely a good > suggestion. I just double-checked IDL, and for addition it seems to convert to the larger type: a = bytarr(10) help, a+fix(0) INT = Array[10] help, a+long(0) LONG = Array[10] Of course, IDL and Python scalars likely work differently. Andrew From andrew.collette at gmail.com Fri Jan 4 12:25:07 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Fri, 4 Jan 2013 10:25:07 -0700 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi, > In fact in 1.6 there is no assignment of a dtype to '1' which makes > the way 1.6 handles it consistent with the array rules: I guess I'm a little out of my depth here... what are the array rules? > # Ah-hah, it looks like '1' has a uint8 dtype: > (np.ones(2, dtype=np.uint8) / np.ones(2, dtype=np.uint8)).dtype == np.uint8 > (np.ones(2, dtype=np.uint8) / 1).dtype == np.uint8 > # But wait! No it doesn't! > (np.ones(2, dtype=np.int8) / np.ones(2, dtype=np.uint8)).dtype == np.int16 > (np.ones(2, dtype=np.int8) / 1).dtype == np.int8 > # Apparently in this case it has an int8 dtype instead. > (np.ones(2, dtype=np.int8) / np.ones(2, dtype=np.int8)).dtype == np.int8 Yes, this is a good point... I hadn't thought about whether it should be unsigned or signed. In the case of something like "1", where it's ambiguous, couldn't we prefer the sign of the other participant in the addition? > interaction between these. But in 1.6, as soon as you have a uint8 > array, suddenly all the other precisions might spring magically into > being at any moment. I can see how this would be really annoying for someone close to the max memory on their machine. > So options: > If we require that new dtypes shouldn't be suddenly introduced then we > have to pick from: > 1) a / 300 silently rolls over the 300 before attempting the > operation (1.5-style) Were people really not happy with this behavior? My reading of this thread: http://thread.gmane.org/gmane.comp.python.numeric.general/47986 was that the change was, although not an accident, certainly unexpected for most people. I don't have a strong preference either way, but I'm interested in why we're so eager to keep the "corrected" behavior. > 2) a / 300 upcasts to machine precision (use the same rules for > arrays and scalars) > 3) a / 300 gives an error (the proposal you don't like) > > If we instead treat a Python scalar like 1 as having the smallest > precision dtype that can hold its value, then we have to accept either > uint8 + 1 -> uint16 > or > int8 + 1 -> int16 Is there any consistent way we could prefer the "signedness" of the other participant? That would lead to both uint8 +1 -> uint8 and int8 + 1 -> int8. > Or there's the current code, whose behaviour no-one actually > understands. (And I mean that both figuratively -- it's clearly > confusing enough that people won't be able to remember it well in > practice -- and literally -- even we developers don't know what it > will do without running it to see.) I agree the current behavior is confusing. Regardless of the details of what to do, I suppose my main objection is that, to me, it's really unexpected that adding a number to an array could result in an exception. Andrew From raul at virtualmaterials.com Fri Jan 4 14:10:02 2013 From: raul at virtualmaterials.com (Raul Cota) Date: Fri, 04 Jan 2013 12:10:02 -0700 Subject: [Numpy-discussion] Numpy speed ups to simple tasks - final findings and suggestions In-Reply-To: References: <50D4B69A.7000409@virtualmaterials.com> Message-ID: <50E7290A.2000705@virtualmaterials.com> In my previous email I sent an image but I just thought that maybe the mailing list does not accept attachments or need approval. I put a couple of images related to my profiling results (referenced to my previous email) here. Sorted by time per function with a graph of calls at the bottom http://raul-playground.appspot.com/static/images/numpy-profile-time.png Sorted by Time with Children http://raul-playground.appspot.com/static/images/numpy-profile-timewchildren.png The test is a loop of val = float64 * float64 * float64 * float64 Raul On 02/01/2013 7:56 AM, Nathaniel Smith wrote: > On Fri, Dec 21, 2012 at 7:20 PM, Raul Cota wrote: >> Hello, >> >> >> On Dec/2/2012 I sent an email about some meaningful speed problems I was >> facing when porting our core program from Numeric (Python 2.2) to Numpy >> (Python 2.6). Some of our tests went from 30 seconds to 90 seconds for >> example. > > Hi Raul, > > This is great work! Sorry you haven't gotten any feedback yet -- I > guess it's a busy time of year for most people; and, the way you've > described your changes makes it hard for us to use our usual workflow > to discuss them. > >> These are the actual changes to the C code, >> For bottleneck (a) >> >> In general, >> - avoid calls to PyObject_GetAttrString when I know the type is >> List, None, Tuple, Float, Int, String or Unicode >> >> - avoid calls to PyObject_GetBuffer when I know the type is >> List, None or Tuple > > This definitely seems like a worthwhile change. There are possible > quibbles about coding style -- the macros could have better names, and > would probably be better as (inline) functions instead of macros -- > but that can be dealt with. > > Can you make a pull request on github with these changes? I guess you > haven't used git before, but I think you'll find it makes things > *much* easier (in particular, you'll never have to type out long > awkward english descriptions of the changes you made ever again!) We > have docs here: > http://docs.scipy.org/doc/numpy/dev/gitwash/git_development.html > and your goal is to get to the point where you can file a "pull request": > http://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html#asking-for-your-changes-to-be-merged-with-the-main-repo > Feel free to ask on the list if you get stuck of course. > >> For bottleneck (b) >> >> b.1) >> I noticed that PyFloat * Float64 resulted in an unnecessary "on the fly" >> conversion of the PyFloat into a Float64 to extract its underlying C >> double value. This happened in the function >> _double_convert_to_ctype which comes from the pattern, >> _ at name@_convert_to_ctype > > This also sounds like an excellent change, and perhaps should be > extended to ints and bools as well... again, can you file a pull > request? > >> b.2) This is the change that may not be very popular among Numpy users. >> I modified Float64 operations to return a Float instead of Float64. I >> could not think or see any ill effects and I got a fairly decent speed >> boost. > > Yes, unfortunately, there's no way we'll be able to make this change > upstream -- there's too much chance of it breaking people's code. (And > numpy float64's do act different than python floats in at least some > cases, e.g., numpy gives more powerful control over floating point > error handling, see np.seterr.) > > But, it's almost certainly possible to optimize numpy's float64 (and > friends), so that they are themselves (almost) as fast as the native > python objects. And that would help all the code that uses them, not > just the ones where regular python floats could be substituted > instead. Have you tried profiling, say, float64 * float64 to figure > out where the bottlenecks are? > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From njs at pobox.com Fri Jan 4 14:59:38 2013 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 4 Jan 2013 19:59:38 +0000 Subject: [Numpy-discussion] Scalar casting rules use-case reprise In-Reply-To: References: Message-ID: On Fri, Jan 4, 2013 at 11:09 AM, Matthew Brett wrote: > In any case, can anyone remember the original use-cases well enough to > record them for future decision making? Heh. Everything new is old again. Here's a discussion from 2002 which quotes the rationale: http://mail.scipy.org/pipermail/numpy-discussion/2002-September/014002.html Note that in context: - numpy means the old Numeric library - AFAICT neither numeric nor numarray had special "scalar" types at this point, and they didn't have 0d arrays either, so in fact indexing an array would just return the closest python type (int or float). In fact this is a thread about the problems this causes. (So the question Dag raised downthread was prescient! Or, well, postscient, I guess.) So it looks like the main reason was actually that back then, you *couldn't* preserve non-native widths in operations involving scalars, because there was no such thing as a non-native width scalar. As soon as you called 'sum' or indexed an array, you reverted to native width. -n From mw at eml.cc Fri Jan 4 15:42:52 2013 From: mw at eml.cc (mw at eml.cc) Date: Fri, 04 Jan 2013 21:42:52 +0100 Subject: [Numpy-discussion] Embedded NumPy LAPACK errors Message-ID: <50E73ECC.8050803@eml.cc> Hiall, I am trying to embed numerical code in a mexFunction, as called by MATLAB, written as a Cython function. NumPy core functions and BLAS work fine, but calls to LAPACK function such as SVD seem to be made against to MATLAB's linked MKL, and this generates MKL errors. When I try this with Octave, it works fine, presumably because it is compiled against the same LAPACK as the NumPy I am embedding. Assuming I haven't made big mistakes up to here, I have the following questions: Is there a way to request numpy.linalg to use a particular LAPACK library, e.g. /usr/lib/liblapack.so ? If not, is there a reasonable way to build numpy.linalg such that it interfaces with MKL correctly ? thanks in advance for any help, Marmaduke [1] The Cython code in question : https://gist.github.com/4433635 Please see the mexFunction at the bottom. From njs at pobox.com Fri Jan 4 16:33:25 2013 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 4 Jan 2013 21:33:25 +0000 Subject: [Numpy-discussion] Numpy speed ups to simple tasks - final findings and suggestions In-Reply-To: <50E67BBD.7090804@virtualmaterials.com> References: <50D4B69A.7000409@virtualmaterials.com> <50E67BBD.7090804@virtualmaterials.com> Message-ID: On Fri, Jan 4, 2013 at 6:50 AM, Raul Cota wrote: > > On 02/01/2013 7:56 AM, Nathaniel Smith wrote: >> But, it's almost certainly possible to optimize numpy's float64 (and >> friends), so that they are themselves (almost) as fast as the native >> python objects. And that would help all the code that uses them, not >> just the ones where regular python floats could be substituted >> instead. Have you tried profiling, say, float64 * float64 to figure >> out where the bottlenecks are? > > Seems to be split between > - (primarily) the memory allocation/deallocation of the float64 that is > created from the operation float64 * float64. This is the reason why float64 > * Pyfloat got improved with one of my changes because PyFloat was being > internally converted into a float64 before doing the multiplication. > > - the rest of the time is the actual multiplication path way. Running a quick profile on Linux x86-64 of x = np.float64(5.5) for i in xrange(n): x * x I find that ~50% of the total CPU time is inside feclearexcept(), the function which resets the floating point error checking registers -- and most of this is inside a single instruction, stmxcsr ("store sse control register"). It's possible that this is different on windows (esp. since apparently our fpe exception handling apparently doesn't work on windows[1]), but the total time you measure for both PyFloat*PyFloat and Float64*Float64 match mine almost exactly, so most likely we have similar CPUs that are doing a similar amount of work in both cases. The way we implement floating point error checking is basically: PyUFunc_clearfperr() if (PyUFunc_getfperror() & BAD_STUFF) { } Some points that you may find interesting though: - The way we define these functions, both PyUFunc_clearfperr() and PyUFunc_getfperror() clear the flags. However, for PyUFunc_getfperror, this is just pointless. We could simply remove this, and expect to see a ~25% speedup in Float64*Float64 without any downside. - Numpy's default behaviour is to always check for an warn on floating point errors. This seems like it's probably the correct default. However, if you aren't worried about this for your use code, you could disable these warnings with np.seterr(all="ignore"). (And you'll get similar error-checking to what PyFloat does.) At the moment, that won't speed anything up. But we could easily then fix it so that the PyUFunc_clearfperr/PyUFunc_getfperror code checks for whether errors are ignored, and disables itself. This together with the previous change should get you a ~50% speedup in Float64*Float64, without having to change any of numpy's semantics. - Bizarrely, Numpy still checks the floating point flags on integer operations, at least for integer scalars. So 50% of the time in Int64*Int64 is also spent in fiddling with floating point exception flags. That's also some low-hanging fruit right there... (to be fair, this isn't *quite* as trivial to fix as it could be, because the integer overflow checking code sets the floating point unit's "overflow" flag to signal a problem, and we'd need to pull this out to a thread-local variable or something before disabling the floating point checks entirely in integer code. But still, not a huge problem.) -n [1] https://github.com/numpy/numpy/issues/2350 From sebastian at sipsolutions.net Fri Jan 4 18:17:47 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 05 Jan 2013 00:17:47 +0100 Subject: [Numpy-discussion] Howto bisect old commits correctly Message-ID: <1357341467.12993.6.camel@sebastian-laptop> Hey, this is probably just because I do not have any experience with bisect and the like, but when I try running a bisect keep running into: ImportError: /home/sebastian/.../lib/python2.7/site-packages/numpy/core/multiarray.so: undefined symbol: PyDataMem_NEW or: RuntimeError: module compiled against API version 8 but this version of numpy is 7 I am sure I am missing something simple, but I have no idea where to look. Am I just forgetting to delete some things and my version is not clean!? Regards, Sebastian From raul at virtualmaterials.com Fri Jan 4 18:36:28 2013 From: raul at virtualmaterials.com (Raul Cota) Date: Fri, 04 Jan 2013 16:36:28 -0700 Subject: [Numpy-discussion] Numpy speed ups to simple tasks - final findings and suggestions In-Reply-To: References: <50D4B69A.7000409@virtualmaterials.com> <50E67BBD.7090804@virtualmaterials.com> Message-ID: <50E7677C.9090008@virtualmaterials.com> On 04/01/2013 2:33 PM, Nathaniel Smith wrote: > On Fri, Jan 4, 2013 at 6:50 AM, Raul Cota wrote: >> On 02/01/2013 7:56 AM, Nathaniel Smith wrote: >>> But, it's almost certainly possible to optimize numpy's float64 (and >>> friends), so that they are themselves (almost) as fast as the native >>> python objects. And that would help all the code that uses them, not >>> just the ones where regular python floats could be substituted >>> instead. Have you tried profiling, say, float64 * float64 to figure >>> out where the bottlenecks are? >> Seems to be split between >> - (primarily) the memory allocation/deallocation of the float64 that is >> created from the operation float64 * float64. This is the reason why float64 >> * Pyfloat got improved with one of my changes because PyFloat was being >> internally converted into a float64 before doing the multiplication. >> >> - the rest of the time is the actual multiplication path way. > Running a quick profile on Linux x86-64 of > x = np.float64(5.5) > for i in xrange(n): > x * x > I find that ~50% of the total CPU time is inside feclearexcept(), the > function which resets the floating point error checking registers -- > and most of this is inside a single instruction, stmxcsr ("store sse > control register"). I find strange you don't see bottleneck in allocation of a float64. is it easy for you to profile this ? x = np.float64(5.5) y = 5.5 for i in xrange(n): x * y numpy internally translates y into a float64 temporarily and then discards it and I seem to remember is a bit over two times slower than x * x I will try to do your suggestions on PyUFunc_clearfperr/PyUFunc_getfperror and see what I get. Haven't gotten around to get going with being able to do a pull request for the previous stuff. if changes are worth while would it be ok if I also create one for this ? Thanks again, Raul > It's possible that this is different on windows > (esp. since apparently our fpe exception handling apparently doesn't > work on windows[1]), but the total time you measure for both > PyFloat*PyFloat and Float64*Float64 match mine almost exactly, so most > likely we have similar CPUs that are doing a similar amount of work in > both cases. > > The way we implement floating point error checking is basically: > PyUFunc_clearfperr() > > if (PyUFunc_getfperror() & BAD_STUFF) { > > } > > Some points that you may find interesting though: > > - The way we define these functions, both PyUFunc_clearfperr() and > PyUFunc_getfperror() clear the flags. However, for PyUFunc_getfperror, > this is just pointless. We could simply remove this, and expect to see > a ~25% speedup in Float64*Float64 without any downside. > > - Numpy's default behaviour is to always check for an warn on floating > point errors. This seems like it's probably the correct default. > However, if you aren't worried about this for your use code, you could > disable these warnings with np.seterr(all="ignore"). (And you'll get > similar error-checking to what PyFloat does.) At the moment, that > won't speed anything up. But we could easily then fix it so that the > PyUFunc_clearfperr/PyUFunc_getfperror code checks for whether errors > are ignored, and disables itself. This together with the previous > change should get you a ~50% speedup in Float64*Float64, without > having to change any of numpy's semantics. > > - Bizarrely, Numpy still checks the floating point flags on integer > operations, at least for integer scalars. So 50% of the time in > Int64*Int64 is also spent in fiddling with floating point exception > flags. That's also some low-hanging fruit right there... (to be fair, > this isn't *quite* as trivial to fix as it could be, because the > integer overflow checking code sets the floating point unit's > "overflow" flag to signal a problem, and we'd need to pull this out to > a thread-local variable or something before disabling the floating > point checks entirely in integer code. But still, not a huge problem.) > > -n > > [1] https://github.com/numpy/numpy/issues/2350 > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From njs at pobox.com Fri Jan 4 19:44:28 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 5 Jan 2013 00:44:28 +0000 Subject: [Numpy-discussion] Numpy speed ups to simple tasks - final findings and suggestions In-Reply-To: <50E7677C.9090008@virtualmaterials.com> References: <50D4B69A.7000409@virtualmaterials.com> <50E67BBD.7090804@virtualmaterials.com> <50E7677C.9090008@virtualmaterials.com> Message-ID: On Fri, Jan 4, 2013 at 11:36 PM, Raul Cota wrote: > On 04/01/2013 2:33 PM, Nathaniel Smith wrote: >> On Fri, Jan 4, 2013 at 6:50 AM, Raul Cota wrote: >>> On 02/01/2013 7:56 AM, Nathaniel Smith wrote: >>>> But, it's almost certainly possible to optimize numpy's float64 (and >>>> friends), so that they are themselves (almost) as fast as the native >>>> python objects. And that would help all the code that uses them, not >>>> just the ones where regular python floats could be substituted >>>> instead. Have you tried profiling, say, float64 * float64 to figure >>>> out where the bottlenecks are? >>> Seems to be split between >>> - (primarily) the memory allocation/deallocation of the float64 that is >>> created from the operation float64 * float64. This is the reason why float64 >>> * Pyfloat got improved with one of my changes because PyFloat was being >>> internally converted into a float64 before doing the multiplication. >>> >>> - the rest of the time is the actual multiplication path way. >> Running a quick profile on Linux x86-64 of >> x = np.float64(5.5) >> for i in xrange(n): >> x * x >> I find that ~50% of the total CPU time is inside feclearexcept(), the >> function which resets the floating point error checking registers -- >> and most of this is inside a single instruction, stmxcsr ("store sse >> control register"). > > I find strange you don't see bottleneck in allocation of a float64. > > is it easy for you to profile this ? > > x = np.float64(5.5) > y = 5.5 > for i in xrange(n): > x * y > > numpy internally translates y into a float64 temporarily and then > discards it and I seem to remember is a bit over two times slower than x * x Yeah, seems to be dramatically slower. Using ipython's handy interface to the timeit[1] library: In [1]: x = np.float64(5.5) In [2]: y = 5.5 In [3]: timeit x * y 1000000 loops, best of 3: 725 ns per loop In [4]: timeit x * x 1000000 loops, best of 3: 283 ns per loop But we already figured out how to (mostly) fix this part, right? I was curious about the Float64*Float64 case, because that's the one that was still slow after those first two patches. (And, yes, like you say, when I run x*y in the profiler then there's a huge amount of overhead in PyArray_GetPriority and object allocation/deallocation). > I will try to do your suggestions on > > PyUFunc_clearfperr/PyUFunc_getfperror > > and see what I get. Haven't gotten around to get going with being able > to do a pull request for the previous stuff. if changes are worth while > would it be ok if I also create one for this ? First, to be clear, it's always OK to do a pull request -- the worst that can happen is that we all look it over carefully and decide that it's the wrong approach and don't merge. In my email before I just wanted to give you some clear suggestions on a good way to get started, we wouldn't have like kicked you out or something if you did it differently :-) And, yes, assuming my analysis so far is correct we would definitely be interested in major speedups that have no other user-visible effects... ;-) -n [1] http://docs.python.org/2/library/timeit.html From sebastian at sipsolutions.net Fri Jan 4 20:29:51 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 05 Jan 2013 02:29:51 +0100 Subject: [Numpy-discussion] Howto bisect old commits correctly In-Reply-To: <1357341467.12993.6.camel@sebastian-laptop> References: <1357341467.12993.6.camel@sebastian-laptop> Message-ID: <1357349391.12993.8.camel@sebastian-laptop> On Sat, 2013-01-05 at 00:17 +0100, Sebastian Berg wrote: > Hey, > > this is probably just because I do not have any experience with bisect > and the like, but when I try running a bisect keep running into: > Nevermind that. Probably I just stumbled on some bad versions... > ImportError: /home/sebastian/.../lib/python2.7/site-packages/numpy/core/multiarray.so: undefined symbol: PyDataMem_NEW > or: > RuntimeError: module compiled against API version 8 but this version of numpy is 7 > > I am sure I am missing something simple, but I have no idea where to > look. Am I just forgetting to delete some things and my version is not > clean!? > > Regards, > > Sebastian > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From raul at virtualmaterials.com Sat Jan 5 01:09:17 2013 From: raul at virtualmaterials.com (Raul Cota) Date: Fri, 04 Jan 2013 23:09:17 -0700 Subject: [Numpy-discussion] Numpy speed ups to simple tasks - final findings and suggestions In-Reply-To: References: <50D4B69A.7000409@virtualmaterials.com> <50E67BBD.7090804@virtualmaterials.com> <50E7677C.9090008@virtualmaterials.com> Message-ID: <50E7C38D.2040306@virtualmaterials.com> On 04/01/2013 5:44 PM, Nathaniel Smith wrote: > On Fri, Jan 4, 2013 at 11:36 PM, Raul Cota wrote: >> On 04/01/2013 2:33 PM, Nathaniel Smith wrote: >>> On Fri, Jan 4, 2013 at 6:50 AM, Raul Cota wrote: >>>> On 02/01/2013 7:56 AM, Nathaniel Smith wrote: >>>>> But, it's almost certainly possible to optimize numpy's float64 (and >>>>> friends), so that they are themselves (almost) as fast as the native >>>>> python objects. And that would help all the code that uses them, not >>>>> just the ones where regular python floats could be substituted >>>>> instead. Have you tried profiling, say, float64 * float64 to figure >>>>> out where the bottlenecks are? >>>> Seems to be split between >>>> - (primarily) the memory allocation/deallocation of the float64 that is >>>> created from the operation float64 * float64. This is the reason why float64 >>>> * Pyfloat got improved with one of my changes because PyFloat was being >>>> internally converted into a float64 before doing the multiplication. >>>> >>>> - the rest of the time is the actual multiplication path way. >>> Running a quick profile on Linux x86-64 of >>> x = np.float64(5.5) >>> for i in xrange(n): >>> x * x >>> I find that ~50% of the total CPU time is inside feclearexcept(), the >>> function which resets the floating point error checking registers -- >>> and most of this is inside a single instruction, stmxcsr ("store sse >>> control register"). >> I find strange you don't see bottleneck in allocation of a float64. >> >> is it easy for you to profile this ? >> >> x = np.float64(5.5) >> y = 5.5 >> for i in xrange(n): >> x * y >> >> numpy internally translates y into a float64 temporarily and then >> discards it and I seem to remember is a bit over two times slower than x * x > Yeah, seems to be dramatically slower. Using ipython's handy interface > to the timeit[1] library: > > In [1]: x = np.float64(5.5) > > In [2]: y = 5.5 > > In [3]: timeit x * y > 1000000 loops, best of 3: 725 ns per loop > > In [4]: timeit x * x > 1000000 loops, best of 3: 283 ns per loop I haven't been using timeit because the bulk of what we are doing includes comparing against Python 2.2 and Numeric and timeit did not exist then. Can't wait to finally officially upgrade our main product. > But we already figured out how to (mostly) fix this part, right? Correct Cheers, Raul > I was > curious about the Float64*Float64 case, because that's the one that > was still slow after those first two patches. (And, yes, like you say, > when I run x*y in the profiler then there's a huge amount of overhead > in PyArray_GetPriority and object allocation/deallocation). > >> I will try to do your suggestions on >> >> PyUFunc_clearfperr/PyUFunc_getfperror >> >> and see what I get. Haven't gotten around to get going with being able >> to do a pull request for the previous stuff. if changes are worth while >> would it be ok if I also create one for this ? > First, to be clear, it's always OK to do a pull request -- the worst > that can happen is that we all look it over carefully and decide that > it's the wrong approach and don't merge. In my email before I just > wanted to give you some clear suggestions on a good way to get > started, we wouldn't have like kicked you out or something if you did > it differently :-) > > And, yes, assuming my analysis so far is correct we would definitely > be interested in major speedups that have no other user-visible > effects... ;-) > > -n > > [1] http://docs.python.org/2/library/timeit.html > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From paul.anton.letnes at gmail.com Sat Jan 5 05:42:13 2013 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Sat, 5 Jan 2013 11:42:13 +0100 Subject: [Numpy-discussion] Embedded NumPy LAPACK errors In-Reply-To: <50E73ECC.8050803@eml.cc> References: <50E73ECC.8050803@eml.cc> Message-ID: <3FF2E38B-6A93-4AC6-B28B-CD1C50784AD5@gmail.com> On 4. jan. 2013, at 21:42, mw at eml.cc wrote: > Hiall, > > > I am trying to embed numerical code in a mexFunction, > as called by MATLAB, written as a Cython function. > > NumPy core functions and BLAS work fine, but calls to LAPACK > function such as SVD seem to be made against to MATLAB's linked > MKL, and this generates MKL errors. When I try this with > Octave, it works fine, presumably because it is compiled against > the same LAPACK as the NumPy I am embedding. > > > Assuming I haven't made big mistakes up to here, I have the > following questions: > > Is there a way to request numpy.linalg to use a particular > LAPACK library, e.g. /usr/lib/liblapack.so ? > > If not, is there a reasonable way to build numpy.linalg such that > it interfaces with MKL correctly ? It's possible, but it's much easier to install one of the pre-built python distributions. Enthought, WinPython and others include precompiled python/numpy/scipy/etc with MKL. If that works for you, I'd recommend that route, as it involves less work. Good luck, Paul From matthew.brett at gmail.com Sat Jan 5 07:15:55 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 5 Jan 2013 12:15:55 +0000 Subject: [Numpy-discussion] Rank-0 arrays - reprise Message-ID: Hi, Following on from Nathaniel's explorations of the scalar - array casting rules, some resources on rank-0 arrays. The discussion that Nathaniel tracked down on "rank-0 arrays"; it also makes reference to casting. The rank-0 arrays seem to have been one way of solving the problem of maintaining array dtypes other than bool / float / int: http://mail.scipy.org/pipermail/numpy-discussion/2002-September/001612.html Quoting from an email from Travis in that thread, replying to an email from Tim Hochberg: http://mail.scipy.org/pipermail/numpy-discussion/2002-September/001647.html > Frankly, I have no idea what the implimentation details would be, but > could we get rid of rank-0 arrays altogether? I have always simply found > them strange and confusing... What are they really neccesary for > (besides holding scalar values of different precision that standard > Pyton scalars)? With new coercion rules this becomes a possibility. Arguments against it are that special rank-0 arrays behave as more consistent numbers with the rest of Numeric than Python scalars. In other words they have a length and a shape and one can right N-dimensional code that works the same even when the result is a scalar. Another advantage of having a Numeric scalar is that we can control the behavior of floating point operations better. e.g. if only Python scalars were available and sum(a) returned 0, then 1 / sum(a) would behave as Python behaves (always raises error). while with our own scalars 1 / sum(a) could potentially behave however the user wanted. There seemed then to be some impetus to remove rank-0 arrays and replace them with Python scalar types with the various numpy precisions : http://mail.scipy.org/pipermail/numpy-discussion/2002-September/013983.html Travis' recent email hints at something that seems similar, but I don't understand what he means: http://mail.scipy.org/pipermail/numpy-discussion/2012-December/064795.html Don't create array-scalars. Instead, make the data-type object a meta-type object whose instances are the items returned from NumPy arrays. There is no need for a separate array-scalar object and in fact it's confusing to the type-system. I understand that now. I did not understand that 5 years ago. Travis - can you expand? I remember rank-0 arrays being confusing in that I sometimes get a python scalar and sometimes a numpy scalar, and I may want a python scalar, and have to special-case the rank-0 array, but I don't remember precisely why I needed the python scalar. Any other comments / records of rank-0 arrays being confusing? Best, Matthew From matthew.brett at gmail.com Sat Jan 5 07:27:02 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 5 Jan 2013 12:27:02 +0000 Subject: [Numpy-discussion] Rank-0 arrays - reprise In-Reply-To: References: Message-ID: Hi, On Sat, Jan 5, 2013 at 12:15 PM, Matthew Brett wrote: > Hi, > > Following on from Nathaniel's explorations of the scalar - array > casting rules, some resources on rank-0 arrays. > > The discussion that Nathaniel tracked down on "rank-0 arrays"; it also > makes reference to casting. The rank-0 arrays seem to have been one > way of solving the problem of maintaining array dtypes other than bool > / float / int: > > http://mail.scipy.org/pipermail/numpy-discussion/2002-September/001612.html > > Quoting from an email from Travis in that thread, replying to an email > from Tim Hochberg: > > http://mail.scipy.org/pipermail/numpy-discussion/2002-September/001647.html > > >> Frankly, I have no idea what the implimentation details would be, but >> could we get rid of rank-0 arrays altogether? I have always simply found >> them strange and confusing... What are they really neccesary for >> (besides holding scalar values of different precision that standard >> Pyton scalars)? > > With new coercion rules this becomes a possibility. Arguments against it > are that special rank-0 arrays behave as more consistent numbers with the > rest of Numeric than Python scalars. In other words they have a length > and a shape and one can right N-dimensional code that works the same even > when the result is a scalar. > > Another advantage of having a Numeric scalar is that we can control the > behavior of floating point operations better. > > e.g. > > if only Python scalars were available and sum(a) returned 0, then > > 1 / sum(a) would behave as Python behaves (always raises error). > > while with our own scalars > > 1 / sum(a) could potentially behave however the user wanted. > > > There seemed then to be some impetus to remove rank-0 arrays and > replace them with Python scalar types with the various numpy > precisions : > > http://mail.scipy.org/pipermail/numpy-discussion/2002-September/013983.html > > Travis' recent email hints at something that seems similar, but I > don't understand what he means: > > http://mail.scipy.org/pipermail/numpy-discussion/2012-December/064795.html > > > Don't create array-scalars. Instead, make the data-type object a > meta-type object whose instances are the items returned from NumPy > arrays. There is no need for a separate array-scalar object and in > fact it's confusing to the type-system. I understand that now. I > did not understand that 5 years ago. > > > Travis - can you expand? > > I remember rank-0 arrays being confusing in that I sometimes get a > python scalar and sometimes a numpy scalar, and I may want a python > scalar, and have to special-case the rank-0 array, but I don't > remember precisely why I needed the python scalar. Any other comments > / records of rank-0 arrays being confusing? Adding: Comments by Konrad Hinsen on desirable methods for rank-0 arrays, all of which seem to have got into numpy: http://mail.scipy.org/pipermail/numpy-discussion/2002-September/001635.html Best, Matthew From matthew.brett at gmail.com Sat Jan 5 07:32:09 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 5 Jan 2013 12:32:09 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi, On Fri, Jan 4, 2013 at 4:54 PM, Matthew Brett wrote: > Hi, > > On Fri, Jan 4, 2013 at 4:01 PM, Andrew Collette > wrote: >> >From a more basic perspective, I think that adding a number to an >> array should never raise an exception. I've not used any other >> language in which this behavior takes place. In C, you have rollover >> behavior, in IDL you roll over or clip, and in NumPy you either roll >> or upcast, depending on the version. IDL, etc. manage to handle >> things like max() or total() in a sensible (or at least defensible) >> fashion, and without raising an error. > > That's a reasonable point. > > Looks like we lost consensus. > > What about returning to the 1.5 behavior instead? If we do return to the 1.5 behavior, we would need to think about doing this in 1.7. If there are a large number of 1.5.x and previous users who would upgrade to 1.7, leaving the 1.6 behavior in 1.7 will mean that they will get double the confusion: 1) The behavior has changed to something they weren't expecting 2) The behavior is going to change back very soon Best, Matthew From robince at gmail.com Sat Jan 5 08:03:41 2013 From: robince at gmail.com (Robin) Date: Sat, 5 Jan 2013 13:03:41 +0000 Subject: [Numpy-discussion] Embedded NumPy LAPACK errors In-Reply-To: <3FF2E38B-6A93-4AC6-B28B-CD1C50784AD5@gmail.com> References: <50E73ECC.8050803@eml.cc> <3FF2E38B-6A93-4AC6-B28B-CD1C50784AD5@gmail.com> Message-ID: Coincidently I have been having the same problem this week. Unrelated to the problem, I would suggest looking at pymex which 'wraps' python inside Matlab very nicely, although it has the same problem with duplicate lapack symbols. https://github.com/kw/pymex I have the same problem with Enthough EPD which is built against MKL - but I think the problem is that Intel provide two different interfaces - ILP64 with 64 bit integer indices and LP64 with 32 bit integers. Matlab link against the ILP64 version, whereas Enthought use the LP64 version - so there are still incompatible. Cheers Robin On Sat, Jan 5, 2013 at 10:42 AM, Paul Anton Letnes wrote: > > On 4. jan. 2013, at 21:42, mw at eml.cc wrote: > >> Hiall, >> >> >> I am trying to embed numerical code in a mexFunction, >> as called by MATLAB, written as a Cython function. >> >> NumPy core functions and BLAS work fine, but calls to LAPACK >> function such as SVD seem to be made against to MATLAB's linked >> MKL, and this generates MKL errors. When I try this with >> Octave, it works fine, presumably because it is compiled against >> the same LAPACK as the NumPy I am embedding. >> >> >> Assuming I haven't made big mistakes up to here, I have the >> following questions: >> >> Is there a way to request numpy.linalg to use a particular >> LAPACK library, e.g. /usr/lib/liblapack.so ? >> >> If not, is there a reasonable way to build numpy.linalg such that >> it interfaces with MKL correctly ? > > It's possible, but it's much easier to install one of the pre-built python distributions. Enthought, WinPython and others include precompiled python/numpy/scipy/etc with MKL. If that works for you, I'd recommend that route, as it involves less work. > > Good luck, > Paul > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From pierre.raybaut at gmail.com Sat Jan 5 08:38:06 2013 From: pierre.raybaut at gmail.com (Pierre Raybaut) Date: Sat, 5 Jan 2013 14:38:06 +0100 Subject: [Numpy-discussion] ANN: Spyder v2.1.13 Message-ID: Hi all, On the behalf of Spyder's development team (http://code.google.com/p/spyderlib/people/list), I'm pleased to announce that Spyder v2.1.13 has been released and is available for Windows XP/Vista/7, GNU/Linux and MacOS X: http://code.google.com/p/spyderlib/ This is a pure maintenance release -- a lot of bugs were fixed since v2.1.11 (v2.1.12 was released exclusively inside WinPython distribution): http://code.google.com/p/spyderlib/wiki/ChangeLog Spyder is a free, open-source (MIT license) interactive development environment for the Python language with advanced editing, interactive testing, debugging and introspection features. Originally designed to provide MATLAB-like features (integrated help, interactive console, variable explorer with GUI-based editors for dictionaries, NumPy arrays, ...), it is strongly oriented towards scientific computing and software development. Thanks to the `spyderlib` library, Spyder also provides powerful ready-to-use widgets: embedded Python console (example: http://packages.python.org/guiqwt/_images/sift3.png), NumPy array editor (example: http://packages.python.org/guiqwt/_images/sift2.png), dictionary editor, source code editor, etc. Description of key features with tasty screenshots can be found at: http://code.google.com/p/spyderlib/wiki/Features On Windows platforms, Spyder is also available as a stand-alone executable (don't forget to disable UAC on Vista/7). This all-in-one portable version is still experimental (for example, it does not embed sphinx -- meaning no rich text mode for the object inspector) but it should provide a working version of Spyder for Windows platforms without having to install anything else (except Python 2.x itself, of course). Don't forget to follow Spyder updates/news: * on the project website: http://code.google.com/p/spyderlib/ * and on our official blog: http://spyder-ide.blogspot.com/ Last, but not least, we welcome any contribution that helps making Spyder an efficient scientific development/computing environment. Join us to help creating your favourite environment! (http://code.google.com/p/spyderlib/wiki/NoteForContributors) Enjoy! -Pierre From eric.emsellem at eso.org Sat Jan 5 09:15:02 2013 From: eric.emsellem at eso.org (Eric Emsellem) Date: Sat, 05 Jan 2013 15:15:02 +0100 Subject: [Numpy-discussion] Invalid value encoutered : how to prevent numpy.where to do this? Message-ID: <50E83566.2080404@eso.org> Dear all, I have a code using lots of "numpy.where" to make some constrained calculations as in: data = arange(10) result = np.where(data == 0, 0., 1./data) # or data1 = arange(10) data2 = arange(10)+1.0 result = np.where(data1 > data2, np.sqrt(data1-data2), np.sqrt(data2-data2)) which then produces warnings like: /usr/bin/ipython:1: RuntimeWarning: invalid value encountered in sqrt or for the first example: /usr/bin/ipython:1: RuntimeWarning: divide by zero encountered in divide How do I avoid these messages to appear? I know that I could in principle use numpy.seterr. However, I do NOT want to remove these warnings for other potential divide/multiply/sqrt etc errors. Only when I am using a "where", to in fact avoid such warnings! Note that the warnings only happen once, but since I am going to release that code, I would like to avoid the user to get such messages which are irrelevant here (because I am testing, with the where, when NOT to divide by zero or take a sqrt of a negative number). thanks! Eric From njs at pobox.com Sat Jan 5 09:27:42 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 5 Jan 2013 14:27:42 +0000 Subject: [Numpy-discussion] Invalid value encoutered : how to prevent numpy.where to do this? In-Reply-To: <50E83566.2080404@eso.org> References: <50E83566.2080404@eso.org> Message-ID: On Sat, Jan 5, 2013 at 2:15 PM, Eric Emsellem wrote: > Dear all, > > I have a code using lots of "numpy.where" to make some constrained > calculations as in: > > data = arange(10) > result = np.where(data == 0, 0., 1./data) > > # or > data1 = arange(10) > data2 = arange(10)+1.0 > result = np.where(data1 > data2, np.sqrt(data1-data2), np.sqrt(data2-data2)) > > which then produces warnings like: > /usr/bin/ipython:1: RuntimeWarning: invalid value encountered in sqrt > > or for the first example: > > /usr/bin/ipython:1: RuntimeWarning: divide by zero encountered in divide > > How do I avoid these messages to appear? > > I know that I could in principle use numpy.seterr. However, I do NOT > want to remove these warnings for other potential divide/multiply/sqrt > etc errors. Only when I am using a "where", to in fact avoid such > warnings! Note that the warnings only happen once, but since I am going > to release that code, I would like to avoid the user to get such > messages which are irrelevant here (because I am testing, with the > where, when NOT to divide by zero or take a sqrt of a negative number). You can't avoid it while using np.where like this, because the warning is being issued before np.where is even called. It's basically doing: # Calculate all possible sqrts tmp1 = np.sqrt(data1-data2) tmp2 = np.sqrt(data2-data2) # let's pretend this isn't just all zeros... # Use np.where to pick out the useful ones and put them together into one array mashed_up = np.where(data1 > data2, tmp1, tmp2) So you need to somehow apply the indexing while doing the sqrt. In this case the easiest way would just be np.sqrt(np.where(data1 > data2, data1 - data2, data2 - data2)) Or, slightly faster (avoiding some temporaries): np.sqrt(np.where(data1 > data2, data1, data2) - data2) If your operation doesn't factor like this though then you can always use something more cumbersome like result = np.empty_like(data) mask = (data == 0) result[mask] = 0 result[~mask] = 1.0/data[~mask] Or in 1.7 this could be written result = np.zeros_like(data) np.divide(1.0, data, where=(data != 0), out=result) -n From njs at pobox.com Sat Jan 5 09:38:07 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 5 Jan 2013 14:38:07 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: On Sat, Jan 5, 2013 at 12:32 PM, Matthew Brett wrote: > Hi, > > On Fri, Jan 4, 2013 at 4:54 PM, Matthew Brett wrote: >> Hi, >> >> On Fri, Jan 4, 2013 at 4:01 PM, Andrew Collette >> wrote: >>> >From a more basic perspective, I think that adding a number to an >>> array should never raise an exception. I've not used any other >>> language in which this behavior takes place. In C, you have rollover >>> behavior, in IDL you roll over or clip, and in NumPy you either roll >>> or upcast, depending on the version. IDL, etc. manage to handle >>> things like max() or total() in a sensible (or at least defensible) >>> fashion, and without raising an error. >> >> That's a reasonable point. >> >> Looks like we lost consensus. >> >> What about returning to the 1.5 behavior instead? > > If we do return to the 1.5 behavior, we would need to think about > doing this in 1.7. > > If there are a large number of 1.5.x and previous users who would > upgrade to 1.7, leaving the 1.6 behavior in 1.7 will mean that they > will get double the confusion: > > 1) The behavior has changed to something they weren't expecting > 2) The behavior is going to change back very soon I disagree. 1.7 is basically done, the 1.6 changes are out there already, and we still have work to do just to get consensus on how we want to handle this, plus implement the changes. Basically, the way I think about it in general is, you have the first release that contains some bug, and then you have the first release that doesn't contain it. Minimizing the amount of *time* between those releases is important. Minimizing the *number of releases* in between does not -- according to that logic, we shouldn't have released 1.6.1 and 1.6.2 until we were confident that we'd fixed *all* the bugs, because otherwise they might have misled people into upgrading too soon. Holding 1.7 back for this isn't going to get this change done or to users any faster; it's just going to hold back all the other changes in 1.7. I do think we ought to aim to shorten our release cycle drastically. Like release 1.8 within 2-3 months after 1.7. But let's talk about that after 1.7 is out. -n From ralf.gommers at gmail.com Sat Jan 5 10:55:33 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 5 Jan 2013 16:55:33 +0100 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: On Sat, Jan 5, 2013 at 3:38 PM, Nathaniel Smith wrote: > On Sat, Jan 5, 2013 at 12:32 PM, Matthew Brett > wrote: > > Hi, > > > > On Fri, Jan 4, 2013 at 4:54 PM, Matthew Brett > wrote: > >> Hi, > >> > >> On Fri, Jan 4, 2013 at 4:01 PM, Andrew Collette > >> wrote: > >>> >From a more basic perspective, I think that adding a number to an > >>> array should never raise an exception. I've not used any other > >>> language in which this behavior takes place. In C, you have rollover > >>> behavior, in IDL you roll over or clip, and in NumPy you either roll > >>> or upcast, depending on the version. IDL, etc. manage to handle > >>> things like max() or total() in a sensible (or at least defensible) > >>> fashion, and without raising an error. > >> > >> That's a reasonable point. > >> > >> Looks like we lost consensus. > >> > >> What about returning to the 1.5 behavior instead? > > > > If we do return to the 1.5 behavior, we would need to think about > > doing this in 1.7. > > > > If there are a large number of 1.5.x and previous users who would > > upgrade to 1.7, leaving the 1.6 behavior in 1.7 will mean that they > > will get double the confusion: > > > > 1) The behavior has changed to something they weren't expecting > > 2) The behavior is going to change back very soon > > I disagree. 1.7 is basically done, the 1.6 changes are out there > already, and we still have work to do just to get consensus on how we > want to handle this, plus implement the changes. > I agree with Nathaniel. 1.7.0rc1 is out, so all that should go into 1.7.x from now on is bug fixes. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Jan 5 10:59:25 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 5 Jan 2013 15:59:25 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi, On Sat, Jan 5, 2013 at 2:38 PM, Nathaniel Smith wrote: > On Sat, Jan 5, 2013 at 12:32 PM, Matthew Brett wrote: >> Hi, >> >> On Fri, Jan 4, 2013 at 4:54 PM, Matthew Brett wrote: >>> Hi, >>> >>> On Fri, Jan 4, 2013 at 4:01 PM, Andrew Collette >>> wrote: >>>> >From a more basic perspective, I think that adding a number to an >>>> array should never raise an exception. I've not used any other >>>> language in which this behavior takes place. In C, you have rollover >>>> behavior, in IDL you roll over or clip, and in NumPy you either roll >>>> or upcast, depending on the version. IDL, etc. manage to handle >>>> things like max() or total() in a sensible (or at least defensible) >>>> fashion, and without raising an error. >>> >>> That's a reasonable point. >>> >>> Looks like we lost consensus. >>> >>> What about returning to the 1.5 behavior instead? >> >> If we do return to the 1.5 behavior, we would need to think about >> doing this in 1.7. >> >> If there are a large number of 1.5.x and previous users who would >> upgrade to 1.7, leaving the 1.6 behavior in 1.7 will mean that they >> will get double the confusion: >> >> 1) The behavior has changed to something they weren't expecting >> 2) The behavior is going to change back very soon > > I disagree. 1.7 is basically done, the 1.6 changes are out there > already, and we still have work to do just to get consensus on how we > want to handle this, plus implement the changes. > > Basically, the way I think about it in general is, you have the first > release that contains some bug, and then you have the first release > that doesn't contain it. Minimizing the amount of *time* between those > releases is important. Minimizing the *number of releases* in between > does not -- according to that logic, we shouldn't have released 1.6.1 > and 1.6.2 until we were confident that we'd fixed *all* the bugs, > because otherwise they might have misled people into upgrading too > soon. Holding 1.7 back for this isn't going to get this change done or > to users any faster; it's just going to hold back all the other > changes in 1.7. > > I do think we ought to aim to shorten our release cycle drastically. > Like release 1.8 within 2-3 months after 1.7. But let's talk about > that after 1.7 is out. Yes, I was imagining that resolving this question would be rather quick, and therefore any delay to 1.7 would be very small, but if it takes more than a few days to come to a solution, it's possible there would not be net benefit. To Ralf - I think a 'bugfix only' metric doesn't help all that much in this case, because if we revert to 1.5 behavior, this could very reasonably be described as a bugfix. Cheers, Matthew From njs at pobox.com Sat Jan 5 11:16:25 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 5 Jan 2013 16:16:25 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: On 5 Jan 2013 15:59, "Matthew Brett" wrote: > > Hi, > > On Sat, Jan 5, 2013 at 2:38 PM, Nathaniel Smith wrote: > > On Sat, Jan 5, 2013 at 12:32 PM, Matthew Brett wrote: > >> Hi, > >> > >> On Fri, Jan 4, 2013 at 4:54 PM, Matthew Brett wrote: > >>> Hi, > >>> > >>> On Fri, Jan 4, 2013 at 4:01 PM, Andrew Collette > >>> wrote: > >>>> >From a more basic perspective, I think that adding a number to an > >>>> array should never raise an exception. I've not used any other > >>>> language in which this behavior takes place. In C, you have rollover > >>>> behavior, in IDL you roll over or clip, and in NumPy you either roll > >>>> or upcast, depending on the version. IDL, etc. manage to handle > >>>> things like max() or total() in a sensible (or at least defensible) > >>>> fashion, and without raising an error. > >>> > >>> That's a reasonable point. > >>> > >>> Looks like we lost consensus. > >>> > >>> What about returning to the 1.5 behavior instead? > >> > >> If we do return to the 1.5 behavior, we would need to think about > >> doing this in 1.7. > >> > >> If there are a large number of 1.5.x and previous users who would > >> upgrade to 1.7, leaving the 1.6 behavior in 1.7 will mean that they > >> will get double the confusion: > >> > >> 1) The behavior has changed to something they weren't expecting > >> 2) The behavior is going to change back very soon > > > > I disagree. 1.7 is basically done, the 1.6 changes are out there > > already, and we still have work to do just to get consensus on how we > > want to handle this, plus implement the changes. > > > > Basically, the way I think about it in general is, you have the first > > release that contains some bug, and then you have the first release > > that doesn't contain it. Minimizing the amount of *time* between those > > releases is important. Minimizing the *number of releases* in between > > does not -- according to that logic, we shouldn't have released 1.6.1 > > and 1.6.2 until we were confident that we'd fixed *all* the bugs, > > because otherwise they might have misled people into upgrading too > > soon. Holding 1.7 back for this isn't going to get this change done or > > to users any faster; it's just going to hold back all the other > > changes in 1.7. > > > > I do think we ought to aim to shorten our release cycle drastically. > > Like release 1.8 within 2-3 months after 1.7. But let's talk about > > that after 1.7 is out. > > Yes, I was imagining that resolving this question would be rather > quick, and therefore any delay to 1.7 would be very small, but if it > takes more than a few days to come to a solution, it's possible there > would not be net benefit. > > To Ralf - I think a 'bugfix only' metric doesn't help all that much in > this case, because if we revert to 1.5 behavior, this could very > reasonably be described as a bugfix. It's not just the time to make the change, it's the time to make sure that we haven't created any new unexpected problems in the process. 1.7's already gone through many weeks of stabilization and testing. Really at this point the criterion isn't really even bug fixes only, but release critical bugs and doc fixes only (and the only RC bugs left should be ones discovered through the beta/rc cycle). -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Jan 5 13:56:13 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 5 Jan 2013 18:56:13 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi, On Sat, Jan 5, 2013 at 4:16 PM, Nathaniel Smith wrote: > On 5 Jan 2013 15:59, "Matthew Brett" wrote: >> >> Hi, >> >> On Sat, Jan 5, 2013 at 2:38 PM, Nathaniel Smith wrote: >> > On Sat, Jan 5, 2013 at 12:32 PM, Matthew Brett >> > wrote: >> >> Hi, >> >> >> >> On Fri, Jan 4, 2013 at 4:54 PM, Matthew Brett >> >> wrote: >> >>> Hi, >> >>> >> >>> On Fri, Jan 4, 2013 at 4:01 PM, Andrew Collette >> >>> wrote: >> >>>> >From a more basic perspective, I think that adding a number to an >> >>>> array should never raise an exception. I've not used any other >> >>>> language in which this behavior takes place. In C, you have rollover >> >>>> behavior, in IDL you roll over or clip, and in NumPy you either roll >> >>>> or upcast, depending on the version. IDL, etc. manage to handle >> >>>> things like max() or total() in a sensible (or at least defensible) >> >>>> fashion, and without raising an error. >> >>> >> >>> That's a reasonable point. >> >>> >> >>> Looks like we lost consensus. >> >>> >> >>> What about returning to the 1.5 behavior instead? >> >> >> >> If we do return to the 1.5 behavior, we would need to think about >> >> doing this in 1.7. >> >> >> >> If there are a large number of 1.5.x and previous users who would >> >> upgrade to 1.7, leaving the 1.6 behavior in 1.7 will mean that they >> >> will get double the confusion: >> >> >> >> 1) The behavior has changed to something they weren't expecting >> >> 2) The behavior is going to change back very soon >> > >> > I disagree. 1.7 is basically done, the 1.6 changes are out there >> > already, and we still have work to do just to get consensus on how we >> > want to handle this, plus implement the changes. >> > >> > Basically, the way I think about it in general is, you have the first >> > release that contains some bug, and then you have the first release >> > that doesn't contain it. Minimizing the amount of *time* between those >> > releases is important. Minimizing the *number of releases* in between >> > does not -- according to that logic, we shouldn't have released 1.6.1 >> > and 1.6.2 until we were confident that we'd fixed *all* the bugs, >> > because otherwise they might have misled people into upgrading too >> > soon. Holding 1.7 back for this isn't going to get this change done or >> > to users any faster; it's just going to hold back all the other >> > changes in 1.7. >> > >> > I do think we ought to aim to shorten our release cycle drastically. >> > Like release 1.8 within 2-3 months after 1.7. But let's talk about >> > that after 1.7 is out. >> >> Yes, I was imagining that resolving this question would be rather >> quick, and therefore any delay to 1.7 would be very small, but if it >> takes more than a few days to come to a solution, it's possible there >> would not be net benefit. >> >> To Ralf - I think a 'bugfix only' metric doesn't help all that much in >> this case, because if we revert to 1.5 behavior, this could very >> reasonably be described as a bugfix. > > It's not just the time to make the change, it's the time to make sure that > we haven't created any new unexpected problems in the process. 1.7's already > gone through many weeks of stabilization and testing. Really at this point > the criterion isn't really even bug fixes only, but release critical bugs > and doc fixes only (and the only RC bugs left should be ones discovered > through the beta/rc cycle). OK, I understand. This must influence the decision on what to do about the scalar casting. Further from 1.5.x makes reverting to 1.5.x less attractive. The longer the 1.6.x changes have been in the wild, the stronger the argument for leaving things as they are. Best, Matthew From nouiz at nouiz.org Sat Jan 5 15:36:01 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Sat, 5 Jan 2013 15:36:01 -0500 Subject: [Numpy-discussion] Howto bisect old commits correctly In-Reply-To: <1357349391.12993.8.camel@sebastian-laptop> References: <1357341467.12993.6.camel@sebastian-laptop> <1357349391.12993.8.camel@sebastian-laptop> Message-ID: Hi, I had many error when tring to the checkedout version and recompile. the problem I had is that I didn't erased the build directory each time. This cause some problem as not all is recompiled correctly in that case. Just deleting this directory manually fixed my problem. HTH Fred On Fri, Jan 4, 2013 at 8:29 PM, Sebastian Berg wrote: > On Sat, 2013-01-05 at 00:17 +0100, Sebastian Berg wrote: >> Hey, >> >> this is probably just because I do not have any experience with bisect >> and the like, but when I try running a bisect keep running into: >> > > Nevermind that. Probably I just stumbled on some bad versions... > >> ImportError: /home/sebastian/.../lib/python2.7/site-packages/numpy/core/multiarray.so: undefined symbol: PyDataMem_NEW >> or: >> RuntimeError: module compiled against API version 8 but this version of numpy is 7 >> >> I am sure I am missing something simple, but I have no idea where to >> look. Am I just forgetting to delete some things and my version is not >> clean!? >> >> Regards, >> >> Sebastian >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Sat Jan 5 16:31:12 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 5 Jan 2013 21:31:12 +0000 Subject: [Numpy-discussion] Rank-0 arrays - reprise In-Reply-To: References: Message-ID: On 5 Jan 2013 12:16, "Matthew Brett" wrote: > > Hi, > > Following on from Nathaniel's explorations of the scalar - array > casting rules, some resources on rank-0 arrays. > > The discussion that Nathaniel tracked down on "rank-0 arrays"; it also > makes reference to casting. The rank-0 arrays seem to have been one > way of solving the problem of maintaining array dtypes other than bool > / float / int: > > http://mail.scipy.org/pipermail/numpy-discussion/2002-September/001612.html > > Quoting from an email from Travis in that thread, replying to an email > from Tim Hochberg: > > http://mail.scipy.org/pipermail/numpy-discussion/2002-September/001647.html > > > > Frankly, I have no idea what the implimentation details would be, but > > could we get rid of rank-0 arrays altogether? I have always simply found > > them strange and confusing... What are they really neccesary for > > (besides holding scalar values of different precision that standard > > Pyton scalars)? > > With new coercion rules this becomes a possibility. Arguments against it > are that special rank-0 arrays behave as more consistent numbers with the > rest of Numeric than Python scalars. In other words they have a length > and a shape and one can right N-dimensional code that works the same even > when the result is a scalar. > > Another advantage of having a Numeric scalar is that we can control the > behavior of floating point operations better. > > e.g. > > if only Python scalars were available and sum(a) returned 0, then > > 1 / sum(a) would behave as Python behaves (always raises error). > > while with our own scalars > > 1 / sum(a) could potentially behave however the user wanted. > > > There seemed then to be some impetus to remove rank-0 arrays and > replace them with Python scalar types with the various numpy > precisions : > > http://mail.scipy.org/pipermail/numpy-discussion/2002-September/013983.html > > Travis' recent email hints at something that seems similar, but I > don't understand what he means: > > http://mail.scipy.org/pipermail/numpy-discussion/2012-December/064795.html > > > Don't create array-scalars. Instead, make the data-type object a > meta-type object whose instances are the items returned from NumPy > arrays. There is no need for a separate array-scalar object and in > fact it's confusing to the type-system. I understand that now. I > did not understand that 5 years ago. > > > Travis - can you expand? Numpy has 3 partially overlapping concepts: A) scalars (what Travis calls "array scalars"): Things like "float64", "int32". These are ordinary Python classes; usually when you subscript an array, what you get back is an instance of one of these classes: In [1]: a = np.array([1, 2, 3]) In [2]: a[0] Out[2]: 1 In [3]: type(a[0]) Out[3]: numpy.int64 Note that even though they are called "array scalars", they have nothing to do with the actual ndarray type -- they are totally separate objects. B) dtypes: These are instances of class np.dtype. For every scalar type, there is a corresponding dtype object; plus you can create new dtype objects for things like record arrays (which correspond to scalars of type "np.void"; I don't really understand how void scalars work in detail): In [8]: int64_dtype = np.dtype(np.int64) In [9]: int64_dtype Out[9]: dtype('int64') In [10]: type(int64_dtype) Out[10]: numpy.dtype In [11]: int64_dtype.type Out[11]: numpy.int64 C) rank-0 arrays: Plain old ndarray objects that happen to have ndim == 0, shape == (). These are arrays which are scalars, but they are not array scalars. Arrays HAVE-A dtype. In [15]: int64_arr = np.array(1) In [16]: int64_arr Out[16]: array(1) In [17]: int64_arr.dtype Out[17]: dtype('int64') ------------ Okay given that background: What Travis was saying in that email was that he thought (A) and (B) should be combined. Instead of having np.float64-the-class and dtype(np.float64)-the-dtype-object, we should make dtype objects actually *be* the scalar classes. (They would still be dtype objects, which means they would be "metaclasses", which is just a fancy way to say, dtype would be a subclass of the Python class "type", and dtype objects would be class objects that had extra functionality.) Those old mailing list threads are debating about (A) versus (C). What we ended up with is what I described above -- we have "rank-0" (0-dimensional) arrays, and we have array scalar objects that are a different set of python types and objects entirely. The actual implementation is totally different -- to the point that we a 35,000 line auto-generated C file implementing arithmetic for scalars, *and* a 10,000 line auto-generated C file implementing arithmetic for arrays (including 0-dim arrays), and these have different functionality and bugs: https://github.com/numpy/numpy/issues/593 However, the actual goal of all this code is to make array scalars and 0-dim arrays entirely indistinguishable. Supposedly they have the same APIs and generally behave exactly the same, modulo bugs (but surely there can't be many of those...), and two things: 1) isinstance(scalar, np.int64) is a sorta-legitimate way to do a type check. But isinstance(zerodim_arr, np.int64) is always false. Instead you have to use issubdtype(zerodim_arr, np.int64). (I mean, obviously, right?) 2) Scalars are always read-only, like regular Python scalars. 0-dim arrays are in general writeable... unless you set them to read-only. I think the only behavioural difference between an array scalar and a read-only 0-dim array is that for read-only 0-dim arrays, in-place operations raise an exception: In [5]: scalar = np.int64(1) # same as 'scalar = scalar + 2', i.e., creates a new object In [6]: scalar += 2 In [7]: scalar Out[7]: 3 In [10]: zerodim = np.array(1) In [11]: zerodim.flags.writeable = False In [12]: zerodim += 2 ValueError: return array is not writeable Also, scalar indexing of ndarrays returns scalar objects. Except when it returns a 0-dim array -- I'm pretty sure this can happen when the moon is right, though I forget the details. ndarray subclasses? custom dtypes? Maybe someone will remember. Q: We could make += work on read-only arrays with, like, a 2 line fix. So wouldn't it be simpler to throw away the tens of thousands of lines of code used to implement scalars, and just use 0-dim arrays everywhere instead? So like, np.array([1, 2, 3])[1] would return a read-only 0-dim array, which acted just like the current scalar objects in basically every way? A: Excellent question! So ndarrays would be similar to Python strings -- indexing an ndarray would return another ndarray, just like indexing a string returns another string? Q: Yeah. I mean, I remember that seemed weird when I first learned Python, but when have you ever felt the Python was really missing a "character" type like C has? A: That's true, I don't think I ever have. Plus if you wanted a "real" float/int/whatever object you could just call float() or int() or use .item(), just like now. Can you think any problems this would cause, though? Q: Well, what about speed? 0-dim arrays are stupidly slow: In [2]: x = 1.5 In [3]: zerodim = np.array(x) In [4]: scalar = zerodim[()] In [5]: timeit x * x 10000000 loops, best of 3: 64.2 ns per loop In [6]: timeit scalar * scalar 1000000 loops, best of 3: 299 ns per loop In [7]: timeit zerodim * zerodim 1000000 loops, best of 3: 1.78 us per loop A: True! Q: So before we could throw away that code, we'd have to make arrays faster? A: Is that an objection? Q: Well, maybe they're already going as fast as they possibly can be? Part of the motivation for having array scalars in the first place was that they could be more optimized. A: It's true, reducing overhead might be hard! For example, with arrays, you have to look up which ufunc inner loop to use. That requires considering all kinds of different casts (like it has to consider, maybe we should cast both arrays to integers and then multiply those?), and this currently takes up about 700 ns all by itself! Q: It takes 700 ns to figure out that to multiply two arrays of doubles you should use the double-multiplication loop? A: Well, we support 24 different dtypes out-of-the-box. Caching all the different combinations so we could skip the ufunc lookup time would create memory overhead of nearly *600 bytes per ufunc!* So instead we re-do it from scratch each time. Q: Uh.... A: C'mon, that's not a question. Q: Right, okay, how about the isinstance() thing. There are probably people relying on isinstance(scalar, np.float64) working (even if this is unwise) -- but if we get rid of scalars, then how could we possibly make isinstance(zerodim_array, np.float64) work? All 0-dim arrays have the same type -- ndarray! A: Well, it turns out that starting in Python 2.6 -- which, coincidentally, is now our minimum required version! -- you can make isinstance() and issubclass() do whatever arbitrary checks you want. Check it out: class MetaEven(type): def __instancecheck__(self, obj): return obj % 2 == 0 class Even(object): __metaclass__ = MetaEven assert not isinstance(1, Even) assert isinstance(2, Even) So we could just decide that isinstance(foo, some_dtype) returns True whenever foo is an array with the given dtype, and define np.float64 to be correct dtype. (Thus also fulfilling Travis's idea of getting rid of the distinction between scalar types and dtypes.) Q: So basically all the dtypes, including the weird ones like 'np.integer' and 'np.number'[1], would use the standard Python abstract base class machinery, and we could throw out all the issubdtype/issubsctype/issctype nonsense, and just use isinstance/issubclass everywhere instead? [1] http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html A: Yeah. Q: Huh. That does sound nice. I don't know. What other problems can you think of with this scheme? -n From eric.emsellem at eso.org Sat Jan 5 17:04:34 2013 From: eric.emsellem at eso.org (Eric Emsellem) Date: Sat, 05 Jan 2013 23:04:34 +0100 Subject: [Numpy-discussion] Invalid value encoutered : how to, prevent numpy.where to do this? Message-ID: <50E8A372.5080808@eso.org> Thanks! This makes sense of course. And yes the operation I am trying to do is rather complicated so I need to rely on a prior selection. Now I would need to optimise this for large arrays and the code does go through these command line many many times. When I have to operate on the two different parts of the array, I guess just using the following is the fastest way (as you indicated) : result = np.empty_like(data) mask = (data == 0) result[mask] = 0.0 result[~mask] = 1.0/data[~mask] But if I only need to do this on one side of the selection, I guess I would just do: result = np.empty_like(data) mask = (data != 0) result[mask] += 1.0 / data[mask] I have tried using three version of "mask = " with the rest of the code being the same: 1- mask = where(data != 0) 2- mask = np.where(data != 0) 3- mask = (data != 0) and it looks like #3 is the fastest, then #2 (20% slower) then #1 (50% slower than #3). I am not sure why, but Is that making sense? Or is there even a faster way (for large data arrays, and complicated operations)? thanks Eric > If your operation doesn't factor like this though then you can always > use something more cumbersome like > result = np.empty_like(data) > mask = (data == 0) > result[mask] = 0 > result[~mask] = 1.0/data[~mask] > > Or in 1.7 this could be written > result = np.zeros_like(data) > np.divide(1.0, data, where=(data != 0), out=result) > > -n > From eric.emsellem at eso.org Sat Jan 5 17:04:55 2013 From: eric.emsellem at eso.org (Eric Emsellem) Date: Sat, 05 Jan 2013 23:04:55 +0100 Subject: [Numpy-discussion] Invalid value encoutered : how to, prevent numpy.where to do this? Message-ID: <50E8A387.7050706@eso.org> Thanks! This makes sense of course. And yes the operation I am trying to do is rather complicated so I need to rely on a prior selection. Now I would need to optimise this for large arrays and the code does go through these command line many many times. When I have to operate on the two different parts of the array, I guess just using the following is the fastest way (as you indicated) : result = np.empty_like(data) mask = (data == 0) result[mask] = 0.0 result[~mask] = 1.0/data[~mask] But if I only need to do this on one side of the selection, I guess I would just do: result = np.empty_like(data) mask = (data != 0) result[mask] += 1.0 / data[mask] I have tried using three version of "mask = " with the rest of the code being the same: 1- mask = where(data != 0) 2- mask = np.where(data != 0) 3- mask = (data != 0) and it looks like #3 is the fastest, then #2 (20% slower) then #1 (50% slower than #3). I am not sure why, but Is that making sense? Or is there even a faster way (for large data arrays, and complicated operations)? thanks Eric > If your operation doesn't factor like this though then you can always > use something more cumbersome like > result = np.empty_like(data) > mask = (data == 0) > result[mask] = 0 > result[~mask] = 1.0/data[~mask] > > Or in 1.7 this could be written > result = np.zeros_like(data) > np.divide(1.0, data, where=(data != 0), out=result) > > -n > From eric.emsellem at eso.org Sat Jan 5 17:07:24 2013 From: eric.emsellem at eso.org (Eric Emsellem) Date: Sat, 05 Jan 2013 23:07:24 +0100 Subject: [Numpy-discussion] Invalid value encoutered : how to, prevent numpy.where to do this? Message-ID: <50E8A41C.7090201@eso.org> Thanks! This makes sense of course. And yes the operation I am trying to do is rather complicated so I need to rely on a prior selection. Now I would need to optimise this for large arrays and the code does go through these command line many many times. When I have to operate on the two different parts of the array, I guess just using the following is the fastest way (as you indicated) : result = np.empty_like(data) mask = (data == 0) result[mask] = 0.0 result[~mask] = 1.0/data[~mask] But if I only need to do this on one side of the selection, I guess I would just do: result = np.empty_like(data) mask = (data != 0) result[mask] += 1.0 / data[mask] I have tried using three version of "mask = " with the rest of the code being the same: 1- mask = where(data != 0) 2- mask = np.where(data != 0) 3- mask = (data != 0) and it looks like #3 is the fastest, then #2 (20% slower) then #1 (50% slower than #3). I am not sure why, but Is that making sense? Or is there even a faster way (for large data arrays, and complicated operations)? thanks Eric > If your operation doesn't factor like this though then you can always > use something more cumbersome like > result = np.empty_like(data) > mask = (data == 0) > result[mask] = 0 > result[~mask] = 1.0/data[~mask] > > Or in 1.7 this could be written > result = np.zeros_like(data) > np.divide(1.0, data, where=(data != 0), out=result) > > -n > From cournape at gmail.com Sat Jan 5 17:10:04 2013 From: cournape at gmail.com (David Cournapeau) Date: Sat, 5 Jan 2013 16:10:04 -0600 Subject: [Numpy-discussion] Rank-0 arrays - reprise In-Reply-To: References: Message-ID: On Sat, Jan 5, 2013 at 3:31 PM, Nathaniel Smith wrote: > On 5 Jan 2013 12:16, "Matthew Brett" wrote: >> >> Hi, >> >> Following on from Nathaniel's explorations of the scalar - array >> casting rules, some resources on rank-0 arrays. >> >> The discussion that Nathaniel tracked down on "rank-0 arrays"; it also >> makes reference to casting. The rank-0 arrays seem to have been one >> way of solving the problem of maintaining array dtypes other than bool >> / float / int: >> >> http://mail.scipy.org/pipermail/numpy-discussion/2002-September/001612.html >> >> Quoting from an email from Travis in that thread, replying to an email >> from Tim Hochberg: >> >> http://mail.scipy.org/pipermail/numpy-discussion/2002-September/001647.html >> >> >> > Frankly, I have no idea what the implimentation details would be, but >> > could we get rid of rank-0 arrays altogether? I have always simply found >> > them strange and confusing... What are they really neccesary for >> > (besides holding scalar values of different precision that standard >> > Pyton scalars)? >> >> With new coercion rules this becomes a possibility. Arguments against it >> are that special rank-0 arrays behave as more consistent numbers with the >> rest of Numeric than Python scalars. In other words they have a length >> and a shape and one can right N-dimensional code that works the same even >> when the result is a scalar. >> >> Another advantage of having a Numeric scalar is that we can control the >> behavior of floating point operations better. >> >> e.g. >> >> if only Python scalars were available and sum(a) returned 0, then >> >> 1 / sum(a) would behave as Python behaves (always raises error). >> >> while with our own scalars >> >> 1 / sum(a) could potentially behave however the user wanted. >> >> >> There seemed then to be some impetus to remove rank-0 arrays and >> replace them with Python scalar types with the various numpy >> precisions : >> >> http://mail.scipy.org/pipermail/numpy-discussion/2002-September/013983.html >> >> Travis' recent email hints at something that seems similar, but I >> don't understand what he means: >> >> http://mail.scipy.org/pipermail/numpy-discussion/2012-December/064795.html >> >> >> Don't create array-scalars. Instead, make the data-type object a >> meta-type object whose instances are the items returned from NumPy >> arrays. There is no need for a separate array-scalar object and in >> fact it's confusing to the type-system. I understand that now. I >> did not understand that 5 years ago. >> >> >> Travis - can you expand? > > Numpy has 3 partially overlapping concepts: > > A) scalars (what Travis calls "array scalars"): Things like "float64", > "int32". These are ordinary Python classes; usually when you subscript > an array, what you get back is an instance of one of these classes: > > In [1]: a = np.array([1, 2, 3]) > > In [2]: a[0] > Out[2]: 1 > > In [3]: type(a[0]) > Out[3]: numpy.int64 > > Note that even though they are called "array scalars", they have > nothing to do with the actual ndarray type -- they are totally > separate objects. > > B) dtypes: These are instances of class np.dtype. For every scalar > type, there is a corresponding dtype object; plus you can create new > dtype objects for things like record arrays (which correspond to > scalars of type "np.void"; I don't really understand how void scalars > work in detail): > > In [8]: int64_dtype = np.dtype(np.int64) > > In [9]: int64_dtype > Out[9]: dtype('int64') > > In [10]: type(int64_dtype) > Out[10]: numpy.dtype > > In [11]: int64_dtype.type > Out[11]: numpy.int64 > > C) rank-0 arrays: Plain old ndarray objects that happen to have ndim > == 0, shape == (). These are arrays which are scalars, but they are > not array scalars. Arrays HAVE-A dtype. > > In [15]: int64_arr = np.array(1) > > In [16]: int64_arr > Out[16]: array(1) > > In [17]: int64_arr.dtype > Out[17]: dtype('int64') > > ------------ > > Okay given that background: > > What Travis was saying in that email was that he thought (A) and (B) > should be combined. Instead of having np.float64-the-class and > dtype(np.float64)-the-dtype-object, we should make dtype objects > actually *be* the scalar classes. (They would still be dtype objects, > which means they would be "metaclasses", which is just a fancy way to > say, dtype would be a subclass of the Python class "type", and dtype > objects would be class objects that had extra functionality.) > > Those old mailing list threads are debating about (A) versus (C). What > we ended up with is what I described above -- we have "rank-0" > (0-dimensional) arrays, and we have array scalar objects that are a > different set of python types and objects entirely. The actual > implementation is totally different -- to the point that we a 35,000 > line auto-generated C file implementing arithmetic for scalars, *and* > a 10,000 line auto-generated C file implementing arithmetic for arrays > (including 0-dim arrays), and these have different functionality and > bugs: > https://github.com/numpy/numpy/issues/593 > > However, the actual goal of all this code is to make array scalars and > 0-dim arrays entirely indistinguishable. Supposedly they have the same > APIs and generally behave exactly the same, modulo bugs (but surely > there can't be many of those...), and two things: > > 1) isinstance(scalar, np.int64) is a sorta-legitimate way to do a type > check. But isinstance(zerodim_arr, np.int64) is always false. Instead > you have to use issubdtype(zerodim_arr, np.int64). (I mean, obviously, > right?) > > 2) Scalars are always read-only, like regular Python scalars. 0-dim > arrays are in general writeable... unless you set them to read-only. I > think the only behavioural difference between an array scalar and a > read-only 0-dim array is that for read-only 0-dim arrays, in-place > operations raise an exception: > > In [5]: scalar = np.int64(1) > > # same as 'scalar = scalar + 2', i.e., creates a new object > In [6]: scalar += 2 > > In [7]: scalar > Out[7]: 3 > > In [10]: zerodim = np.array(1) > > In [11]: zerodim.flags.writeable = False > > In [12]: zerodim += 2 > ValueError: return array is not writeable > > Also, scalar indexing of ndarrays returns scalar objects. Except when > it returns a 0-dim array -- I'm pretty sure this can happen when the > moon is right, though I forget the details. ndarray subclasses? custom > dtypes? Maybe someone will remember. > > Q: We could make += work on read-only arrays with, like, a 2 line fix. > So wouldn't it be simpler to throw away the tens of thousands of lines > of code used to implement scalars, and just use 0-dim arrays > everywhere instead? So like, np.array([1, 2, 3])[1] would return a > read-only 0-dim array, which acted just like the current scalar > objects in basically every way? > > A: Excellent question! So ndarrays would be similar to Python strings > -- indexing an ndarray would return another ndarray, just like > indexing a string returns another string? > > Q: Yeah. I mean, I remember that seemed weird when I first learned > Python, but when have you ever felt the Python was really missing a > "character" type like C has? > > A: That's true, I don't think I ever have. Plus if you wanted a "real" > float/int/whatever object you could just call float() or int() or use > .item(), just like now. Can you think any problems this would cause, > though? > > Q: Well, what about speed? 0-dim arrays are stupidly slow: > > In [2]: x = 1.5 > > In [3]: zerodim = np.array(x) > > In [4]: scalar = zerodim[()] > > In [5]: timeit x * x > 10000000 loops, best of 3: 64.2 ns per loop > > In [6]: timeit scalar * scalar > 1000000 loops, best of 3: 299 ns per loop > > In [7]: timeit zerodim * zerodim > 1000000 loops, best of 3: 1.78 us per loop > > A: True! > > Q: So before we could throw away that code, we'd have to make arrays faster? > > A: Is that an objection? > > Q: Well, maybe they're already going as fast as they possibly can be? > Part of the motivation for having array scalars in the first place was > that they could be more optimized. > > A: It's true, reducing overhead might be hard! For example, with > arrays, you have to look up which ufunc inner loop to use. That > requires considering all kinds of different casts (like it has to > consider, maybe we should cast both arrays to integers and then > multiply those?), and this currently takes up about 700 ns all by > itself! > > Q: It takes 700 ns to figure out that to multiply two arrays of > doubles you should use the double-multiplication loop? > > A: Well, we support 24 different dtypes out-of-the-box. Caching all > the different combinations so we could skip the ufunc lookup time > would create memory overhead of nearly *600 bytes per ufunc!* So > instead we re-do it from scratch each time. > > Q: Uh.... > > A: C'mon, that's not a question. > > Q: Right, okay, how about the isinstance() thing. There are probably > people relying on isinstance(scalar, np.float64) working (even if this > is unwise) -- but if we get rid of scalars, then how could we possibly > make isinstance(zerodim_array, np.float64) work? All 0-dim arrays have > the same type -- ndarray! > > A: Well, it turns out that starting in Python 2.6 -- which, > coincidentally, is now our minimum required version! -- you can make > isinstance() and issubclass() do whatever arbitrary checks you want. > Check it out: > > class MetaEven(type): > def __instancecheck__(self, obj): > return obj % 2 == 0 > > class Even(object): > __metaclass__ = MetaEven > > assert not isinstance(1, Even) > assert isinstance(2, Even) > > So we could just decide that isinstance(foo, some_dtype) returns True > whenever foo is an array with the given dtype, and define np.float64 > to be correct dtype. (Thus also fulfilling Travis's idea of getting > rid of the distinction between scalar types and dtypes.) > > Q: So basically all the dtypes, including the weird ones like > 'np.integer' and 'np.number'[1], would use the standard Python > abstract base class machinery, and we could throw out all the > issubdtype/issubsctype/issctype nonsense, and just use > isinstance/issubclass everywhere instead? > [1] http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html > > A: Yeah. > > Q: Huh. That does sound nice. I don't know. What other problems can > you think of with this scheme? Thanks for the entertaining explanation. I don't think 0-dim array being slow is such a big drawback. I would be really surprised if there was no way to make them faster, and having unspecified, nearly duplicated type handling code in multiple places is likely one reason why nobody took time to really make them faster. Regarding ufunc combination caching, couldn't we do the caching on demand ? I am not sure how you arrived at a 600 bytes per ufunc, but in many real world use cases, I would suspect only a few combinations would be used. Scalar arrays are ones of the most esoteric feature of numpy, and a fairly complex one in terms of implementation. Getting rid of it would be a net plus on that side. Of course, there is the issue of backward compatibility, whose extend is hard to assess. cheers, David From njs at pobox.com Sat Jan 5 17:14:47 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 5 Jan 2013 22:14:47 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: On Fri, Jan 4, 2013 at 5:25 PM, Andrew Collette wrote: > I agree the current behavior is confusing. Regardless of the details > of what to do, I suppose my main objection is that, to me, it's really > unexpected that adding a number to an array could result in an > exception. I think the main objection to the 1.5 behaviour was that it violated "Errors should never pass silently." (from 'import this'). Granted there are tons of places where numpy violates this but this is the one we're thinking about right now... Okay, here's another idea I'll throw out, maybe it's a good compromise: 1) We go back to the 1.5 behaviour. 2) If this produces a rollover/overflow/etc., we signal that using the standard mechanisms (whatever is configured via np.seterr). So by default things like np.maximum(np.array([1, 2, 3], dtype=uint8), 256) would succeed (and produce [1, 2, 3] with dtype uint8), but also issue a warning that 256 had rolled over to become 0. Alternatively those who want to be paranoid could call np.seterr(overflow="raise") and then it would be an error. -n From njs at pobox.com Sat Jan 5 17:20:11 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 5 Jan 2013 22:20:11 +0000 Subject: [Numpy-discussion] Invalid value encoutered : how to, prevent numpy.where to do this? In-Reply-To: <50E8A41C.7090201@eso.org> References: <50E8A41C.7090201@eso.org> Message-ID: On Sat, Jan 5, 2013 at 10:07 PM, Eric Emsellem wrote: > Thanks! > > This makes sense of course. And yes the operation I am trying to do is > rather complicated so I need to rely on a prior selection. > > Now I would need to optimise this for large arrays and the code does go > through these command line many many times. > > When I have to operate on the two different parts of the array, I guess > just using the following is the fastest way (as you indicated) : > > result = np.empty_like(data) > mask = (data == 0) > result[mask] = 0.0 > result[~mask] = 1.0/data[~mask] > > But if I only need to do this on one side of the selection, I guess I > would just do: > > result = np.empty_like(data) > mask = (data != 0) > result[mask] += 1.0 / data[mask] Note that np.empty_like will return an array full of random memory contents, and this will leave those random values anywhere that mask == False. This may or may not be a problem for you. > I have tried using three version of "mask = " with the rest of the code > being the same: > > 1- mask = where(data != 0) > 2- mask = np.where(data != 0) > 3- mask = (data != 0) > > and it looks like #3 is the fastest, then #2 (20% slower) then #1 (50% > slower than #3). > > I am not sure why, but Is that making sense? Or is there even a faster > way (for large data arrays, and complicated operations)? Yes, these should all do the same thing. And calling a function is slower than not calling a function, and normal Python 'where' is slower (for numpy arrays) than the numpy 'where'. Once you can count on 1.7, using the new where= argument should be the fastest way to do this (since it totally avoids making temporary arrays). -n From njs at pobox.com Sat Jan 5 17:58:04 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 5 Jan 2013 22:58:04 +0000 Subject: [Numpy-discussion] Rank-0 arrays - reprise In-Reply-To: References: Message-ID: On Sat, Jan 5, 2013 at 10:10 PM, David Cournapeau wrote: > Thanks for the entertaining explanation. Procrastination is a hell of a drug. > I don't think 0-dim array being slow is such a big drawback. I would > be really surprised if there was no way to make them faster, and > having unspecified, nearly duplicated type handling code in multiple > places is likely one reason why nobody took time to really make them > faster. I agree! > Regarding ufunc combination caching, couldn't we do the caching on > demand ? I am not sure how you arrived at a 600 bytes per ufunc, but > in many real world use cases, I would suspect only a few combinations > would be used. 600 bytes is for an implementation that just kept a table like chosen_ufunc_offset = np.empty((24, 24), dtype=uint8) # 576 bytes and looked up the proper ufunc loop by doing ufunc_loops[chosen_ufunc_offset[left_arg_typenum, right_arg_typenum]] I suspect we could pre-fill such tables extremely quickly (basically just fill in all the exact matches, and then do a flood-fill along the can_cast graph), or we could fill them on-demand. (Also even 24 * 24 is currently an over-estimate since some of those 24 types are parametrized, and currently ufuncs can't handle parametrized types, but hopefully that will get fixed at some point.) The numpy main namespace and scipy.special together contain only 35 + 47 = 82 ufuncs that take 2 arguments[1], so loading those two modules using this scheme would add a total of *50 kilobytes* to numpy's memory overhead... (Interestingly, scipy.special does include 48 three-argument ufuncs, 15 four-argument ufuncs, and 6 five-argument ufuncs, which obviously cannot use a table lookup scheme. Maybe we can add a check for symmetry -- if all the loops are defined on matching types, like "dd->d" and "ff->f", then really it's a one-dimensional lookup problem -- find a common type for the inputs ("d" or "f") and then find the best loop for that one type.) Another option, like you suggest (?), would be to keep a little fixed-size LRU cache for each ufunc and do lookups in it by linear search. It's hard to know how many different types get used in real-world programs, though, and I'd worry about performance falling off a cliff as soon as someone tweaked their inner loop so it used 6 different (type1, type2) combinations instead of 5 or whatever (maybe in one place they do int * float and in another float * int, etc.). Anyway the point is yes, this particular thing is eminently fixable. > Scalar arrays are ones of the most esoteric feature of numpy, and a > fairly complex one in terms of implementation. Getting rid of it would > be a net plus on that side. Of course, there is the issue of backward > compatibility, whose extend is hard to assess. You mean array scalars, not scalar arrays, right?[2] -n [1] len([v for v in np.__dict__.values() if isinstance(v, np.ufunc) and v.nin == 2]) [2] The fact that this sentence means something is certainly evidence for... something. From ondrej.certik at gmail.com Sat Jan 5 21:21:04 2013 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Sat, 5 Jan 2013 18:21:04 -0800 Subject: [Numpy-discussion] Which Python to use for Mac binaries Message-ID: Hi, Currently the NumPy binaries are built using the pavement.py script, which uses the following Pythons: MPKG_PYTHON = { "2.5": ["/Library/Frameworks/Python.framework/Versions/2.5/bin/python"], "2.6": ["/Library/Frameworks/Python.framework/Versions/2.6/bin/python"], "2.7": ["/Library/Frameworks/Python.framework/Versions/2.7/bin/python"], "3.1": ["/Library/Frameworks/Python.framework/Versions/3.1/bin/python3"], "3.2": ["/Library/Frameworks/Python.framework/Versions/3.2/bin/python3"], "3.3": ["/Library/Frameworks/Python.framework/Versions/3.3/bin/python3"], } So for example I can easily create the 2.6 binary if that Python is pre-installed on the Mac box that I am using. On one of the Mac boxes that I am using, the 2.7 is missing, so are 3.1, 3.2 and 3.3. So I was thinking of updating my Fabric fab file to automatically install all Pythons from source and build against that, just like I do for Wine. Which exact Python do we need to use on Mac? Do we need to use the binary installer from python.org? Or can I install it from source? Finally, for which Python versions should we provide binary installers for Mac? For reference, the 1.6.2 had installers for 2.5, 2.6 and 2.7 only for OS X 10.3. There is only 2.7 version for OS X 10.6. Also, what is the meaning of the following piece of code in pavement.py: def _build_mpkg(pyver): # account for differences between Python 2.7.1 versions from python.org if os.environ.get('MACOSX_DEPLOYMENT_TARGET', None) == "10.6": ldflags = "-undefined dynamic_lookup -bundle -arch i386 -arch x86_64 -Wl,-search_paths_first" else: ldflags = "-undefined dynamic_lookup -bundle -arch i386 -arch ppc -Wl,-search_paths_first" ldflags += " -L%s" % os.path.join(os.path.dirname(__file__), "build") if pyver == "2.5": sh("CC=gcc-4.0 LDFLAGS='%s' %s setupegg.py bdist_mpkg" % (ldflags, " ".join(MPKG_PYTHON[pyver]))) else: sh("LDFLAGS='%s' %s setupegg.py bdist_mpkg" % (ldflags, " ".join(MPKG_PYTHON[pyver]))) In particular, the last line gets executed and it then fails with: paver dmg -p 2.6 ---> pavement.dmg ---> pavement.clean LDFLAGS='-undefined dynamic_lookup -bundle -arch i386 -arch ppc -Wl,-search_paths_first -Lbuild' /Library/Frameworks/Python.framework/Versions/2.6/bin/python setupegg.py bdist_mpkg Traceback (most recent call last): File "setupegg.py", line 17, in from setuptools import setup ImportError: No module named setuptools The reason is (I think) that if the Python binary is called explicitly with /Library/Frameworks/Python.framework/Versions/2.6/bin/python, then the paths are not setup properly in virtualenv, and thus setuptools (which is only installed in virtualenv, but not in system Python) fails to import. The solution is to simply apply this patch: diff --git a/pavement.py b/pavement.py index e693016..0c637f8 100644 --- a/pavement.py +++ b/pavement.py @@ -449,7 +449,7 @@ def _build_mpkg(pyver): if pyver == "2.5": sh("CC=gcc-4.0 LDFLAGS='%s' %s setupegg.py bdist_mpkg" % (ldflags, " ".join(MPKG_PYTHON[pyver]))) else: - sh("LDFLAGS='%s' %s setupegg.py bdist_mpkg" % (ldflags, " ".join(MPKG_PYTHON[pyver]))) + sh("python setupegg.py bdist_mpkg") @task def simple_dmg(): and then things work. So an obvious question is --- why do we need to fiddle with LDFLAGS and paths to the exact Python version? Here is a proposed simpler version of the build_mpkg() function: def _build_mpkg(pyver): sh("python setupegg.py bdist_mpkg") Thanks for any tips. Ondrej From charlesr.harris at gmail.com Sat Jan 5 21:38:40 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 5 Jan 2013 19:38:40 -0700 Subject: [Numpy-discussion] Remove support for numeric and numarray in 1.8 Message-ID: Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Sun Jan 6 02:58:58 2013 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 06 Jan 2013 08:58:58 +0100 Subject: [Numpy-discussion] Rank-0 arrays - reprise In-Reply-To: References: Message-ID: <50E92EC2.10404@astro.uio.no> On 01/05/2013 10:31 PM, Nathaniel Smith wrote: > On 5 Jan 2013 12:16, "Matthew Brett" wrote: >> >> Hi, >> >> Following on from Nathaniel's explorations of the scalar - array >> casting rules, some resources on rank-0 arrays. >> >> The discussion that Nathaniel tracked down on "rank-0 arrays"; it also >> makes reference to casting. The rank-0 arrays seem to have been one >> way of solving the problem of maintaining array dtypes other than bool >> / float / int: >> >> http://mail.scipy.org/pipermail/numpy-discussion/2002-September/001612.html >> >> Quoting from an email from Travis in that thread, replying to an email >> from Tim Hochberg: >> >> http://mail.scipy.org/pipermail/numpy-discussion/2002-September/001647.html >> >> >>> Frankly, I have no idea what the implimentation details would be, but >>> could we get rid of rank-0 arrays altogether? I have always simply found >>> them strange and confusing... What are they really neccesary for >>> (besides holding scalar values of different precision that standard >>> Pyton scalars)? >> >> With new coercion rules this becomes a possibility. Arguments against it >> are that special rank-0 arrays behave as more consistent numbers with the >> rest of Numeric than Python scalars. In other words they have a length >> and a shape and one can right N-dimensional code that works the same even >> when the result is a scalar. >> >> Another advantage of having a Numeric scalar is that we can control the >> behavior of floating point operations better. >> >> e.g. >> >> if only Python scalars were available and sum(a) returned 0, then >> >> 1 / sum(a) would behave as Python behaves (always raises error). >> >> while with our own scalars >> >> 1 / sum(a) could potentially behave however the user wanted. >> >> >> There seemed then to be some impetus to remove rank-0 arrays and >> replace them with Python scalar types with the various numpy >> precisions : >> >> http://mail.scipy.org/pipermail/numpy-discussion/2002-September/013983.html >> >> Travis' recent email hints at something that seems similar, but I >> don't understand what he means: >> >> http://mail.scipy.org/pipermail/numpy-discussion/2012-December/064795.html >> >> >> Don't create array-scalars. Instead, make the data-type object a >> meta-type object whose instances are the items returned from NumPy >> arrays. There is no need for a separate array-scalar object and in >> fact it's confusing to the type-system. I understand that now. I >> did not understand that 5 years ago. >> >> >> Travis - can you expand? > > Numpy has 3 partially overlapping concepts: > > A) scalars (what Travis calls "array scalars"): Things like "float64", > "int32". These are ordinary Python classes; usually when you subscript > an array, what you get back is an instance of one of these classes: > > In [1]: a = np.array([1, 2, 3]) > > In [2]: a[0] > Out[2]: 1 > > In [3]: type(a[0]) > Out[3]: numpy.int64 > > Note that even though they are called "array scalars", they have > nothing to do with the actual ndarray type -- they are totally > separate objects. > > B) dtypes: These are instances of class np.dtype. For every scalar > type, there is a corresponding dtype object; plus you can create new > dtype objects for things like record arrays (which correspond to > scalars of type "np.void"; I don't really understand how void scalars > work in detail): > > In [8]: int64_dtype = np.dtype(np.int64) > > In [9]: int64_dtype > Out[9]: dtype('int64') > > In [10]: type(int64_dtype) > Out[10]: numpy.dtype > > In [11]: int64_dtype.type > Out[11]: numpy.int64 > > C) rank-0 arrays: Plain old ndarray objects that happen to have ndim > == 0, shape == (). These are arrays which are scalars, but they are > not array scalars. Arrays HAVE-A dtype. > > In [15]: int64_arr = np.array(1) > > In [16]: int64_arr > Out[16]: array(1) > > In [17]: int64_arr.dtype > Out[17]: dtype('int64') > > ------------ > > Okay given that background: > > What Travis was saying in that email was that he thought (A) and (B) > should be combined. Instead of having np.float64-the-class and > dtype(np.float64)-the-dtype-object, we should make dtype objects > actually *be* the scalar classes. (They would still be dtype objects, > which means they would be "metaclasses", which is just a fancy way to > say, dtype would be a subclass of the Python class "type", and dtype > objects would be class objects that had extra functionality.) > > Those old mailing list threads are debating about (A) versus (C). What > we ended up with is what I described above -- we have "rank-0" > (0-dimensional) arrays, and we have array scalar objects that are a > different set of python types and objects entirely. The actual > implementation is totally different -- to the point that we a 35,000 > line auto-generated C file implementing arithmetic for scalars, *and* > a 10,000 line auto-generated C file implementing arithmetic for arrays > (including 0-dim arrays), and these have different functionality and > bugs: > https://github.com/numpy/numpy/issues/593 > > However, the actual goal of all this code is to make array scalars and > 0-dim arrays entirely indistinguishable. Supposedly they have the same > APIs and generally behave exactly the same, modulo bugs (but surely > there can't be many of those...), and two things: > > 1) isinstance(scalar, np.int64) is a sorta-legitimate way to do a type > check. But isinstance(zerodim_arr, np.int64) is always false. Instead > you have to use issubdtype(zerodim_arr, np.int64). (I mean, obviously, > right?) > > 2) Scalars are always read-only, like regular Python scalars. 0-dim > arrays are in general writeable... unless you set them to read-only. I > think the only behavioural difference between an array scalar and a > read-only 0-dim array is that for read-only 0-dim arrays, in-place > operations raise an exception: > > In [5]: scalar = np.int64(1) > > # same as 'scalar = scalar + 2', i.e., creates a new object > In [6]: scalar += 2 > > In [7]: scalar > Out[7]: 3 > > In [10]: zerodim = np.array(1) > > In [11]: zerodim.flags.writeable = False > > In [12]: zerodim += 2 > ValueError: return array is not writeable > > Also, scalar indexing of ndarrays returns scalar objects. Except when > it returns a 0-dim array -- I'm pretty sure this can happen when the > moon is right, though I forget the details. ndarray subclasses? custom > dtypes? Maybe someone will remember. > > Q: We could make += work on read-only arrays with, like, a 2 line fix. > So wouldn't it be simpler to throw away the tens of thousands of lines > of code used to implement scalars, and just use 0-dim arrays > everywhere instead? So like, np.array([1, 2, 3])[1] would return a > read-only 0-dim array, which acted just like the current scalar > objects in basically every way? > > A: Excellent question! So ndarrays would be similar to Python strings > -- indexing an ndarray would return another ndarray, just like > indexing a string returns another string? > > Q: Yeah. I mean, I remember that seemed weird when I first learned > Python, but when have you ever felt the Python was really missing a > "character" type like C has? str is immutable which makes this a lot easier to deal with without getting confused. So basically you have: a[0:1] # read-write view a[[0]] # read-write copy a[0] # read-only view AND, += are allowed on all read-only arrays, they just transparently create a copy instead of doing the operation in-place. Try to enumerate all the fundamentally different things (if you count memory use/running time) that can happen for ndarrays a, b, and arbitrary x here: a += b[x] That's already quite a lot, your proposal adds even more options. It's certainly a lot more complicated than str. To me it all sounds like a lot of rules introduced just to have the result of a[0] be "kind of a scalar" without actually choosing that option. BUT I should read up on that thread you posted on why that won't work, didn't have time yet... Dag Sverre From sebastian at sipsolutions.net Sun Jan 6 04:41:16 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 06 Jan 2013 10:41:16 +0100 Subject: [Numpy-discussion] Rank-0 arrays - reprise In-Reply-To: <50E92EC2.10404@astro.uio.no> References: <50E92EC2.10404@astro.uio.no> Message-ID: <1357465276.12993.20.camel@sebastian-laptop> On Sun, 2013-01-06 at 08:58 +0100, Dag Sverre Seljebotn wrote: > On 01/05/2013 10:31 PM, Nathaniel Smith wrote: > > On 5 Jan 2013 12:16, "Matthew Brett" wrote: > >> > >> Hi, > >> > >> Following on from Nathaniel's explorations of the scalar - array > >> casting rules, some resources on rank-0 arrays. > >> > > Q: Yeah. I mean, I remember that seemed weird when I first learned > > Python, but when have you ever felt the Python was really missing a > > "character" type like C has? > > str is immutable which makes this a lot easier to deal with without > getting confused. So basically you have: > > a[0:1] # read-write view > a[[0]] # read-write copy > a[0] # read-only view > > AND, += are allowed on all read-only arrays, they just transparently > create a copy instead of doing the operation in-place. > > Try to enumerate all the fundamentally different things (if you count > memory use/running time) that can happen for ndarrays a, b, and > arbitrary x here: > > a += b[x] > > That's already quite a lot, your proposal adds even more options. It's > certainly a lot more complicated than str. > > To me it all sounds like a lot of rules introduced just to have the > result of a[0] be "kind of a scalar" without actually choosing that option. > Yes, but I don't think there is an option to making the elements of an array being immutable. Firstly if you switch normal python code to numpy code you suddenly get numpy data types spilled into your code, and mutable objects are simply very different (also true for code updating to this new version). Do you expect: array = np.zeros(10, dtype=np.intp) b = arr[5] while condition: # might change the array?! b += 1 # This would not be possible and break: dictionary[b] = b**2 Because mutable objects are not hashable which important considering that dictionaries are a very central data type, making an element return mutable would be a bad idea. One could argue about structured datatypes, but maybe then it should be a datatype property whether its mutable or not, and even then the element should probably be a copy (though I did not check what happens here right now). > BUT I should read up on that thread you posted on why that won't work, > didn't have time yet... > > Dag Sverre > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ralf.gommers at gmail.com Sun Jan 6 05:04:20 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 6 Jan 2013 11:04:20 +0100 Subject: [Numpy-discussion] Which Python to use for Mac binaries In-Reply-To: References: Message-ID: On Sun, Jan 6, 2013 at 3:21 AM, Ond?ej ?ert?k wrote: > Hi, > > Currently the NumPy binaries are built using the pavement.py script, > which uses the following Pythons: > > MPKG_PYTHON = { > "2.5": > ["/Library/Frameworks/Python.framework/Versions/2.5/bin/python"], > "2.6": > ["/Library/Frameworks/Python.framework/Versions/2.6/bin/python"], > "2.7": > ["/Library/Frameworks/Python.framework/Versions/2.7/bin/python"], > "3.1": > ["/Library/Frameworks/Python.framework/Versions/3.1/bin/python3"], > "3.2": > ["/Library/Frameworks/Python.framework/Versions/3.2/bin/python3"], > "3.3": > ["/Library/Frameworks/Python.framework/Versions/3.3/bin/python3"], > } > > So for example I can easily create the 2.6 binary if that Python is > pre-installed on the Mac box that I am using. > On one of the Mac boxes that I am using, the 2.7 is missing, so are > 3.1, 3.2 and 3.3. So I was thinking > of updating my Fabric fab file to automatically install all Pythons > from source and build against that, just like I do for Wine. > > Which exact Python do we need to use on Mac? Do we need to use the > binary installer from python.org? > Yes, the one from python.org. > Or can I install it from source? Finally, for which Python versions > should we provide binary installers for Mac? > For reference, the 1.6.2 had installers for 2.5, 2.6 and 2.7 only for > OS X 10.3. There is only 2.7 version for OS X 10.6. > The provided installers and naming scheme should match what's done for Python itself on python.org. The 10.3 installers for 2.5, 2.6 and 2.7 should be compiled on OS X 10.5. This is kind of hard to come by these days, but Vincent Davis maintains a build machine for numpy and scipy. That's already set up correctly, so all you have to do is connect to it via ssh, check out v.17.0 in ~/Code/numpy, check in release.sh that the section for OS X 10.6 is disabled and for 10.5 enabled and run it. OS X 10.6 broke support for previous versions in some subtle ways, so even when using the 10.4 SDK numpy compiled on 10.6 won't run on 10.5. As long as we're supporting 10.5 you therefore need to compile on it. The 10.7 --> 10.6 support hasn't been checked, but I wouldn't trust it. I have a 10.6 machine, so I can compile those binaries if needed. > Also, what is the meaning of the following piece of code in pavement.py: > > def _build_mpkg(pyver): > # account for differences between Python 2.7.1 versions from > python.org > if os.environ.get('MACOSX_DEPLOYMENT_TARGET', None) == "10.6": > ldflags = "-undefined dynamic_lookup -bundle -arch i386 -arch > x86_64 -Wl,-search_paths_first" > else: > ldflags = "-undefined dynamic_lookup -bundle -arch i386 -arch > ppc -Wl,-search_paths_first" > ldflags += " -L%s" % os.path.join(os.path.dirname(__file__), "build") The 10.6 binaries support only Intel Macs, both 32-bit and 64-bit. The 10.3 binaries support PPC Macs and 32-bit Intel. That's what the above does. Note that we simply follow the choice made by the Python release managers here. > if pyver == "2.5": > sh("CC=gcc-4.0 LDFLAGS='%s' %s setupegg.py bdist_mpkg" % > (ldflags, " ".join(MPKG_PYTHON[pyver]))) > else: > sh("LDFLAGS='%s' %s setupegg.py bdist_mpkg" % (ldflags, " > ".join(MPKG_PYTHON[pyver]))) > This is necessary because in Python 2.5, distutils asks for "gcc" instead of "gcc-4.0", so you may get the wrong one without CC=gcc-4.0. From Python 2.6 on this was fixed. > In particular, the last line gets executed and it then fails with: > > paver dmg -p 2.6 > ---> pavement.dmg > ---> pavement.clean > LDFLAGS='-undefined dynamic_lookup -bundle -arch i386 -arch ppc > -Wl,-search_paths_first -Lbuild' > /Library/Frameworks/Python.framework/Versions/2.6/bin/python > setupegg.py bdist_mpkg > Traceback (most recent call last): > File "setupegg.py", line 17, in > from setuptools import setup > ImportError: No module named setuptools > > > The reason is (I think) that if the Python binary is called explicitly > with /Library/Frameworks/Python.framework/Versions/2.6/bin/python, > then the paths are not setup properly in virtualenv, and thus > setuptools (which is only installed in virtualenv, but not in system > Python) fails to import. The solution is to simply apply this patch: > Avoid using system Python for anything. The first thing to do on any new OS X system is install Python some other way, preferably from python.org. > diff --git a/pavement.py b/pavement.py > index e693016..0c637f8 100644 > --- a/pavement.py > +++ b/pavement.py > @@ -449,7 +449,7 @@ def _build_mpkg(pyver): > if pyver == "2.5": > sh("CC=gcc-4.0 LDFLAGS='%s' %s setupegg.py bdist_mpkg" % > (ldflags, " ".join(MPKG_PYTHON[pyver]))) > else: > - sh("LDFLAGS='%s' %s setupegg.py bdist_mpkg" % (ldflags, " > ".join(MPKG_PYTHON[pyver]))) > + sh("python setupegg.py bdist_mpkg") > This doesn't work unless using virtualenvs, you're just throwing away the version selection here. If you can support virtualenvs in addition to python.org pythons, that would be useful. But being able to build binaries when needed simply by "paver dmg -p 2.x" is quite useful. > > @task > def simple_dmg(): > > > and then things work. So an obvious question is --- why do we need to > fiddle with LDFLAGS and paths to the exact Python version? Here is a > proposed simpler version of the build_mpkg() function: > > def _build_mpkg(pyver): > sh("python setupegg.py bdist_mpkg") > > Thanks for any tips. > Did you see the release.sh script? Some of the answers to your questions were already documented there, and it should do the job out of the box. Last note: bdist_mpkg is unmaintained and doesn't support Python 3.x. Most recent version is at: https://github.com/matthew-brett/bdist_mpkg, for previous versions numpy releases I've used that at commit e81a58a471 If we want 3.x binaries, then we should fix that or (preferably) build binaries with Bento. Bento has grown support for mpkg's; I'm not sure how robust that is. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Jan 6 05:16:13 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 6 Jan 2013 10:16:13 +0000 Subject: [Numpy-discussion] Rank-0 arrays - reprise In-Reply-To: <50E92EC2.10404@astro.uio.no> References: <50E92EC2.10404@astro.uio.no> Message-ID: On 6 Jan 2013 07:59, "Dag Sverre Seljebotn" wrote: > Try to enumerate all the fundamentally different things (if you count > memory use/running time) that can happen for ndarrays a, b, and > arbitrary x here: > > a += b[x] > > That's already quite a lot, your proposal adds even more options. It's > certainly a lot more complicated than str. I agree it's complicated, but all the complications and options already exist - they're just split across two similar-but-not-quite-identical sets of data types. > To me it all sounds like a lot of rules introduced just to have the > result of a[0] be "kind of a scalar" without actually choosing that option. Not sure what you mean here. We know that whatever object a[0] returns is going to have scalar behaviour. Right now we have two totally different implementations of scalars. I'm not suggesting changing any (or hardly any) existing behaviour, just that we switch which implementation of that behavior we use. I actually wrote that email as kind of amusing exercise in "what if...?", but even after sleeping on it I'm still not thinking of any terrible downsides... -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Sun Jan 6 05:35:21 2013 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 06 Jan 2013 11:35:21 +0100 Subject: [Numpy-discussion] Rank-0 arrays - reprise In-Reply-To: <1357465276.12993.20.camel@sebastian-laptop> References: <50E92EC2.10404@astro.uio.no> <1357465276.12993.20.camel@sebastian-laptop> Message-ID: <50E95369.70602@astro.uio.no> On 01/06/2013 10:41 AM, Sebastian Berg wrote: > On Sun, 2013-01-06 at 08:58 +0100, Dag Sverre Seljebotn wrote: >> On 01/05/2013 10:31 PM, Nathaniel Smith wrote: >>> On 5 Jan 2013 12:16, "Matthew Brett" wrote: >>>> >>>> Hi, >>>> >>>> Following on from Nathaniel's explorations of the scalar - array >>>> casting rules, some resources on rank-0 arrays. >>>> > > > >>> Q: Yeah. I mean, I remember that seemed weird when I first learned >>> Python, but when have you ever felt the Python was really missing a >>> "character" type like C has? >> >> str is immutable which makes this a lot easier to deal with without >> getting confused. So basically you have: >> >> a[0:1] # read-write view >> a[[0]] # read-write copy >> a[0] # read-only view >> >> AND, += are allowed on all read-only arrays, they just transparently >> create a copy instead of doing the operation in-place. >> >> Try to enumerate all the fundamentally different things (if you count >> memory use/running time) that can happen for ndarrays a, b, and >> arbitrary x here: >> >> a += b[x] >> >> That's already quite a lot, your proposal adds even more options. It's >> certainly a lot more complicated than str. >> >> To me it all sounds like a lot of rules introduced just to have the >> result of a[0] be "kind of a scalar" without actually choosing that option. >> > > Yes, but I don't think there is an option to making the elements of an > array being immutable. Firstly if you switch normal python code to numpy > code you suddenly get numpy data types spilled into your code, and > mutable objects are simply very different (also true for code updating > to this new version). Do you expect: > > array = np.zeros(10, dtype=np.intp) > b = arr[5] > while condition: > # might change the array?! > b += 1 > # This would not be possible and break: > dictionary[b] = b**2 > > Because mutable objects are not hashable which important considering > that dictionaries are a very central data type, making an element return > mutable would be a bad idea. Indeed, this would be completely crazy. I should have been more precise: I like the proposal, but also believe the additional complexity introduced have significant costs that must be considered. a) Making += behave differently for readonly arrays should be carefully considered. If I have a 10 GB read-only array, I prefer an error to a copy for +=. (One could use an ISSCALAR flag instead that only affected +=...) b) Things seems simpler since "indexing away the last index" is no longer a special case, it is always true for a.ndim > 0 that "a[i]" is a new array such that a[i].ndim == a.ndim - 1 But in exchange, a new special-case is introduced since READONLY is only set when ndim becomes 0, so it doesn't really help with the learning curve IMO. In some ways I believe the "scalar-indexing" special case is simpler for newcomers to understand, and is what people already assume, and that a "readonly-indexing" special case is more complicated. It's dangerous to have a library which people only use correctly by accident, so to speak, it's much better if what people think they see is how things are. (With respect to arr[5] returning a good old Python scalar for floats and ints -- Travis' example from 2002 is division, and at least that example is much less serious now with the introduction of the // operator in Python.) > One could argue about structured datatypes, but maybe then it should be > a datatype property whether its mutable or not, and even then the > element should probably be a copy (though I did not check what happens > here right now). Elements from arrays with structured dtypes are already mutable (*and*, at least until recently, could still be used as dict keys...). This was discussed on the list a couple of months back I think. Dag Sverre > >> BUT I should read up on that thread you posted on why that won't work, >> didn't have time yet... >> >> Dag Sverre >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From d.s.seljebotn at astro.uio.no Sun Jan 6 05:40:00 2013 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 06 Jan 2013 11:40:00 +0100 Subject: [Numpy-discussion] Rank-0 arrays - reprise In-Reply-To: References: <50E92EC2.10404@astro.uio.no> Message-ID: <50E95480.7050103@astro.uio.no> On 01/06/2013 11:16 AM, Nathaniel Smith wrote: > On 6 Jan 2013 07:59, "Dag Sverre Seljebotn" > wrote: > > Try to enumerate all the fundamentally different things (if you count > > memory use/running time) that can happen for ndarrays a, b, and > > arbitrary x here: > > > > a += b[x] > > > > That's already quite a lot, your proposal adds even more options. It's > > certainly a lot more complicated than str. > > I agree it's complicated, but all the complications and options already > exist - they're just split across two similar-but-not-quite-identical > sets of data types. > > > To me it all sounds like a lot of rules introduced just to have the > > result of a[0] be "kind of a scalar" without actually choosing that > option. > > Not sure what you mean here. We know that whatever object a[0] returns > is going to have scalar behaviour. Right now we have two totally > different implementations of scalars. I'm not suggesting changing any > (or hardly any) existing behaviour, just that we switch which > implementation of that behavior we use. In that case, how about not changing += for READONLY but instead have a new ISSCALAR flag for that? I.e. semantics stay mostly as today, it's just about removing those 10,000 lines of C code. > I actually wrote that email as kind of amusing exercise in "what > if...?", but even after sleeping on it I'm still not thinking of any > terrible downsides... I should say that I am really happy with the direction it is taking though. (I wish I understood why using Python floats and ints is so horrible though, but I've probably not written enough library NumPy code that needs to consider all ndims and dtypes, just final-end-user-code where the array vs. scalar distinction is more clear.) Dag Sverre From njs at pobox.com Sun Jan 6 09:42:24 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 6 Jan 2013 14:42:24 +0000 Subject: [Numpy-discussion] Remove support for numeric and numarray in 1.8 In-Reply-To: References: Message-ID: On Sun, Jan 6, 2013 at 2:38 AM, Charles R Harris wrote: > Thoughts? To be clear, what you're talking about is basically deleting these two packages: numpy.oldnumeric numpy.numarray plus the compatibility C API in numpy/numarray/include ? So this would only affect Python code which explicitly imported one of those two packages (neither is imported by default), or C code which did #include "numpy/numarray/..."? (I'm not even sure how you would build such a C module, these headers are distributed in a weird directory not accessible via np.get_include(). So unless your build system does some special work to access it, you can't even see these headers.) -n From charlesr.harris at gmail.com Sun Jan 6 10:09:28 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 6 Jan 2013 08:09:28 -0700 Subject: [Numpy-discussion] Remove support for numeric and numarray in 1.8 In-Reply-To: References: Message-ID: On Sun, Jan 6, 2013 at 7:42 AM, Nathaniel Smith wrote: > On Sun, Jan 6, 2013 at 2:38 AM, Charles R Harris > wrote: > > Thoughts? > > To be clear, what you're talking about is basically deleting these two > packages: > numpy.oldnumeric > numpy.numarray > plus the compatibility C API in > numpy/numarray/include > ? > > Yep. > So this would only affect Python code which explicitly imported one of > those two packages (neither is imported by default), or C code which > did #include "numpy/numarray/..."? > > Those packages were intended to be an easy path for folks to port their numeric and numarray code to numpy. During the 2.4 discussion there was a fellow who said his group was just now moving their code from numeric to numpy, but I had the feeling they were rewriting it in the process. > (I'm not even sure how you would build such a C module, these headers > are distributed in a weird directory not accessible via > np.get_include(). So unless your build system does some special work > to access it, you can't even see these headers.) > > Never tried it myself. There is some C code in those packages and it easy to overlook its maintenance, so I'd like to solve the problem by nuking it. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Jan 6 11:52:13 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 6 Jan 2013 16:52:13 +0000 Subject: [Numpy-discussion] Rank-0 arrays - reprise In-Reply-To: <50E95369.70602@astro.uio.no> References: <50E92EC2.10404@astro.uio.no> <1357465276.12993.20.camel@sebastian-laptop> <50E95369.70602@astro.uio.no> Message-ID: On Sun, Jan 6, 2013 at 10:35 AM, Dag Sverre Seljebotn wrote: > I should have been more precise: I like the proposal, but also believe > the additional complexity introduced have significant costs that must be > considered. > > a) Making += behave differently for readonly arrays should be > carefully considered. If I have a 10 GB read-only array, I prefer an > error to a copy for +=. (One could use an ISSCALAR flag instead that > only affected +=...) Yes, definitely we would need to nail down the exact semantics here. My feeling is that we should see start by seeing if we can come up with a set of coherent rules for read-only arrays that does what we want before we add an ACT_LIKE_OLD_SCALARS flag, but either way is viable. (Or we could start with a PRETEND_TO_BE_SCALAR flag and then gradually migrate away from it.) > b) Things seems simpler since "indexing away the last index" is no > longer a special case, it is always true for a.ndim > 0 that "a[i]" is a > new array such that > > a[i].ndim == a.ndim - 1 > > But in exchange, a new special-case is introduced since READONLY is only > set when ndim becomes 0, so it doesn't really help with the learning > curve IMO. Yes, indexing with a scalar (as opposed to slicing or fancy-indexing) remains a special case just like now. And not just because the result is read-only -- it also returns a copy, not a view. I don't think the comparison to the a[i] special-case is very useful, really. Scalar indexing and the wacky one-dimensional indexing thing where a[i] -> a[i, ..] (unless a is one-dimensional) would still be different in general, even aside from the READONLY part, because the one-dimensional indexing thing only applies to one-dimensional indexes. For a 3-d array, a[i, j] gives an error; it's not the same as a[i, j, ...]. And while I understand why numpy does what it does for len() and __getitem__(int) on multi-dimensional arrays (it's to make multi-dimensional arrays act more like list-of-lists), this is IMO a confusing special case that we might be better off without, and in any case shouldn't be used as a guide for how to make the rest of the indexing system work. > In some ways I believe the "scalar-indexing" special case is simpler for > newcomers to understand, and is what people already assume, and that a > "readonly-indexing" special case is more complicated. It's dangerous to > have a library which people only use correctly by accident, so to speak, > it's much better if what people think they see is how things are. This is all true, but current scalars *are* readonly arrays, just weird ones with some limitations and that people don't realize are there. Heck, you can even reshape scalars: In [10]: a = np.float64(0) In [11]: a.reshape((1, 1)) Out[11]: array([[ 0.]]) And resizing is allowed... but silently does nothing: In [12]: a.resize((1, 1)) In [13]: a Out[13]: 0.0 > (With respect to arr[5] returning a good old Python scalar for floats > and ints -- Travis' example from 2002 is division, and at least that > example is much less serious now with the introduction of the // > operator in Python.) I thought Travis's example was (in current numpy terms): In [1]: a = np.array([-1.0, 1.0]) # Pretend that np.sum() returns a float, which uses Python's arithmetic: In [2]: 1 / float(np.sum(a)) ZeroDivisionError: float division by zero # It actually returns a numpy scalar, which uses numpy's arithmetic: In [3]: 1 / np.sum(a) /home/njs/.user-python2.7-64bit/bin/ipython:1: RuntimeWarning: divide by zero encountered in double_scalars #!/home/njs/.user-python2.7-64bit/bin/python Out[3]: inf Anyway, you still need to return some sort of special object for anything that's not part of python's type system (structured arrays, custom dtypes like enumerated values, etc.). So returning good-old Python scalars (GOPS?) for floats/ints/bools actually introduces a new special case. >> One could argue about structured datatypes, but maybe then it should be >> a datatype property whether its mutable or not, and even then the >> element should probably be a copy (though I did not check what happens >> here right now). > > Elements from arrays with structured dtypes are already mutable (*and*, > at least until recently, could still be used as dict keys...). This was > discussed on the list a couple of months back I think. Yeah, this is another weird wart we could fix up in the process... -n From charlesr.harris at gmail.com Sun Jan 6 12:53:47 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 6 Jan 2013 10:53:47 -0700 Subject: [Numpy-discussion] Rank-0 arrays - reprise In-Reply-To: References: Message-ID: On Sat, Jan 5, 2013 at 2:31 PM, Nathaniel Smith wrote: > On 5 Jan 2013 12:16, "Matthew Brett" wrote: > > > > Hi, > > > > Following on from Nathaniel's explorations of the scalar - array > > casting rules, some resources on rank-0 arrays. > > > > The discussion that Nathaniel tracked down on "rank-0 arrays"; it also > > makes reference to casting. The rank-0 arrays seem to have been one > > way of solving the problem of maintaining array dtypes other than bool > > / float / int: > > > > > http://mail.scipy.org/pipermail/numpy-discussion/2002-September/001612.html > > > > Quoting from an email from Travis in that thread, replying to an email > > from Tim Hochberg: > > > > > http://mail.scipy.org/pipermail/numpy-discussion/2002-September/001647.html > > > > > > > Frankly, I have no idea what the implimentation details would be, but > > > could we get rid of rank-0 arrays altogether? I have always simply > found > > > them strange and confusing... What are they really neccesary for > > > (besides holding scalar values of different precision that standard > > > Pyton scalars)? > > > > With new coercion rules this becomes a possibility. Arguments against it > > are that special rank-0 arrays behave as more consistent numbers with > the > > rest of Numeric than Python scalars. In other words they have a length > > and a shape and one can right N-dimensional code that works the same even > > when the result is a scalar. > > > > Another advantage of having a Numeric scalar is that we can control the > > behavior of floating point operations better. > > > > e.g. > > > > if only Python scalars were available and sum(a) returned 0, then > > > > 1 / sum(a) would behave as Python behaves (always raises error). > > > > while with our own scalars > > > > 1 / sum(a) could potentially behave however the user wanted. > > > > > > There seemed then to be some impetus to remove rank-0 arrays and > > replace them with Python scalar types with the various numpy > > precisions : > > > > > http://mail.scipy.org/pipermail/numpy-discussion/2002-September/013983.html > > > > Travis' recent email hints at something that seems similar, but I > > don't understand what he means: > > > > > http://mail.scipy.org/pipermail/numpy-discussion/2012-December/064795.html > > > > > > Don't create array-scalars. Instead, make the data-type object a > > meta-type object whose instances are the items returned from NumPy > > arrays. There is no need for a separate array-scalar object and in > > fact it's confusing to the type-system. I understand that now. I > > did not understand that 5 years ago. > > > > > > Travis - can you expand? > > Numpy has 3 partially overlapping concepts: > > A) scalars (what Travis calls "array scalars"): Things like "float64", > "int32". These are ordinary Python classes; usually when you subscript > an array, what you get back is an instance of one of these classes: > > In [1]: a = np.array([1, 2, 3]) > > In [2]: a[0] > Out[2]: 1 > > In [3]: type(a[0]) > Out[3]: numpy.int64 > > Note that even though they are called "array scalars", they have > nothing to do with the actual ndarray type -- they are totally > separate objects. > > B) dtypes: These are instances of class np.dtype. For every scalar > type, there is a corresponding dtype object; plus you can create new > dtype objects for things like record arrays (which correspond to > scalars of type "np.void"; I don't really understand how void scalars > work in detail): > While thinking about dtypes I started a post proposing that *all* arrays be considered as special cases of void arrays. A void array is basically a memory indexing construct combined with a view. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sun Jan 6 12:57:21 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 06 Jan 2013 18:57:21 +0100 Subject: [Numpy-discussion] high dimensional array -> python scalar/index Message-ID: <1357495041.3537.7.camel@sebastian-laptop> Question for everyone, is this really reasonable: >>> import numpy as np >>> from operator import index >>> index(np.array([[5]])) 5 >>> int(np.array([[5]])) 5 >>> [0,1,2,3][np.array([[2]])] 2 To me, this does not make sense, why should we allow to use a high dimensional object like a normal scalar (its ok for 0-d arrays I guess)? Personally I would be for deprecating these usages, even if that (probably) means you cannot reshape your array with a matrix (as it is 2D) ;-): >>> np.arange(10).reshape(np.matrix([5,-1]).T) array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]) From matthew.brett at gmail.com Sun Jan 6 13:16:32 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 6 Jan 2013 18:16:32 +0000 Subject: [Numpy-discussion] Rank-0 arrays - reprise In-Reply-To: References: Message-ID: Hi, On Sun, Jan 6, 2013 at 5:53 PM, Charles R Harris wrote: > > > On Sat, Jan 5, 2013 at 2:31 PM, Nathaniel Smith wrote: >> >> On 5 Jan 2013 12:16, "Matthew Brett" wrote: >> > >> > Hi, >> > >> > Following on from Nathaniel's explorations of the scalar - array >> > casting rules, some resources on rank-0 arrays. >> > >> > The discussion that Nathaniel tracked down on "rank-0 arrays"; it also >> > makes reference to casting. The rank-0 arrays seem to have been one >> > way of solving the problem of maintaining array dtypes other than bool >> > / float / int: >> > >> > >> > http://mail.scipy.org/pipermail/numpy-discussion/2002-September/001612.html >> > >> > Quoting from an email from Travis in that thread, replying to an email >> > from Tim Hochberg: >> > >> > >> > http://mail.scipy.org/pipermail/numpy-discussion/2002-September/001647.html >> > >> > >> > > Frankly, I have no idea what the implimentation details would be, but >> > > could we get rid of rank-0 arrays altogether? I have always simply >> > > found >> > > them strange and confusing... What are they really neccesary for >> > > (besides holding scalar values of different precision that standard >> > > Pyton scalars)? >> > >> > With new coercion rules this becomes a possibility. Arguments against >> > it >> > are that special rank-0 arrays behave as more consistent numbers with >> > the >> > rest of Numeric than Python scalars. In other words they have a length >> > and a shape and one can right N-dimensional code that works the same >> > even >> > when the result is a scalar. >> > >> > Another advantage of having a Numeric scalar is that we can control the >> > behavior of floating point operations better. >> > >> > e.g. >> > >> > if only Python scalars were available and sum(a) returned 0, then >> > >> > 1 / sum(a) would behave as Python behaves (always raises error). >> > >> > while with our own scalars >> > >> > 1 / sum(a) could potentially behave however the user wanted. >> > >> > >> > There seemed then to be some impetus to remove rank-0 arrays and >> > replace them with Python scalar types with the various numpy >> > precisions : >> > >> > >> > http://mail.scipy.org/pipermail/numpy-discussion/2002-September/013983.html >> > >> > Travis' recent email hints at something that seems similar, but I >> > don't understand what he means: >> > >> > >> > http://mail.scipy.org/pipermail/numpy-discussion/2012-December/064795.html >> > >> > >> > Don't create array-scalars. Instead, make the data-type object a >> > meta-type object whose instances are the items returned from NumPy >> > arrays. There is no need for a separate array-scalar object and in >> > fact it's confusing to the type-system. I understand that now. I >> > did not understand that 5 years ago. >> > >> > >> > Travis - can you expand? >> >> Numpy has 3 partially overlapping concepts: >> >> A) scalars (what Travis calls "array scalars"): Things like "float64", >> "int32". These are ordinary Python classes; usually when you subscript >> an array, what you get back is an instance of one of these classes: >> >> In [1]: a = np.array([1, 2, 3]) >> >> In [2]: a[0] >> Out[2]: 1 >> >> In [3]: type(a[0]) >> Out[3]: numpy.int64 >> >> Note that even though they are called "array scalars", they have >> nothing to do with the actual ndarray type -- they are totally >> separate objects. >> >> B) dtypes: These are instances of class np.dtype. For every scalar >> type, there is a corresponding dtype object; plus you can create new >> dtype objects for things like record arrays (which correspond to >> scalars of type "np.void"; I don't really understand how void scalars >> work in detail): > > > While thinking about dtypes I started a post proposing that *all* arrays be > considered as special cases of void arrays. A void array is basically a > memory indexing construct combined with a view. > > I'd be really interested to read that, I'm sure others would too, Cheers, Matthew From josef.pktd at gmail.com Sun Jan 6 13:28:40 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 6 Jan 2013 13:28:40 -0500 Subject: [Numpy-discussion] high dimensional array -> python scalar/index In-Reply-To: <1357495041.3537.7.camel@sebastian-laptop> References: <1357495041.3537.7.camel@sebastian-laptop> Message-ID: On Sun, Jan 6, 2013 at 12:57 PM, Sebastian Berg wrote: > Question for everyone, is this really reasonable: > >>>> import numpy as np >>>> from operator import index >>>> index(np.array([[5]])) > 5 >>>> int(np.array([[5]])) > 5 >>>> [0,1,2,3][np.array([[2]])] > 2 Not sure I understand the point looks reasonable to my int has an implied squeeze, if it succeeds not so python lists >>> int([[1]]) Traceback (most recent call last): File "", line 1, in TypeError: int() argument must be a string or a number, not 'list' >>> [0,1,2,3][np.array([[2, 2], [0, 1]])] Traceback (most recent call last): File "", line 1, in TypeError: only integer arrays with one element can be converted to an index but we can to more fun things with numpy >>> np.array([0,1,2,3])[np.array([[2, 2], [0, 1]])] array([[2, 2], [0, 1]]) Josef > > To me, this does not make sense, why should we allow to use a high > dimensional object like a normal scalar (its ok for 0-d arrays I guess)? > Personally I would be for deprecating these usages, even if that > (probably) means you cannot reshape your array with a matrix (as it is > 2D) ;-): >>>> np.arange(10).reshape(np.matrix([5,-1]).T) > array([[0, 1], > [2, 3], > [4, 5], > [6, 7], > [8, 9]]) > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From d.s.seljebotn at astro.uio.no Sun Jan 6 13:36:04 2013 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 06 Jan 2013 19:36:04 +0100 Subject: [Numpy-discussion] Rank-0 arrays - reprise In-Reply-To: References: <50E92EC2.10404@astro.uio.no> <1357465276.12993.20.camel@sebastian-laptop> <50E95369.70602@astro.uio.no> Message-ID: <50E9C414.8010406@astro.uio.no> On 01/06/2013 05:52 PM, Nathaniel Smith wrote: > On Sun, Jan 6, 2013 at 10:35 AM, Dag Sverre Seljebotn > wrote: >> I should have been more precise: I like the proposal, but also believe >> the additional complexity introduced have significant costs that must be >> considered. >> >> a) Making += behave differently for readonly arrays should be >> carefully considered. If I have a 10 GB read-only array, I prefer an >> error to a copy for +=. (One could use an ISSCALAR flag instead that >> only affected +=...) > > Yes, definitely we would need to nail down the exact semantics here. > My feeling is that we should see start by seeing if we can come up > with a set of coherent rules for read-only arrays that does what we > want before we add an ACT_LIKE_OLD_SCALARS flag, but either way is > viable. (Or we could start with a PRETEND_TO_BE_SCALAR flag and then > gradually migrate away from it.) Sounds like a good plan. > >> b) Things seems simpler since "indexing away the last index" is no >> longer a special case, it is always true for a.ndim > 0 that "a[i]" is a >> new array such that >> >> a[i].ndim == a.ndim - 1 >> >> But in exchange, a new special-case is introduced since READONLY is only >> set when ndim becomes 0, so it doesn't really help with the learning >> curve IMO. > > Yes, indexing with a scalar (as opposed to slicing or fancy-indexing) > remains a special case just like now. And not just because the result > is read-only -- it also returns a copy, not a view. > > I don't think the comparison to the a[i] special-case is very useful, > really. Scalar indexing and the wacky one-dimensional indexing thing > where a[i] -> a[i, ..] (unless a is one-dimensional) would still be > different in general, even aside from the READONLY part, because the > one-dimensional indexing thing only applies to one-dimensional > indexes. For a 3-d array, > a[i, j] > gives an error; it's not the same as a[i, j, ...]. And while I > understand why numpy does what it does for len() and __getitem__(int) > on multi-dimensional arrays (it's to make multi-dimensional arrays act > more like list-of-lists), this is IMO a confusing special case that we > might be better off without, and in any case shouldn't be used as a > guide for how to make the rest of the indexing system work. Removing the single-index special case would be great. I see people doing stuff like a[i][j][k] all the time, just because that's what they tried first when they came to NumPy and then the habit sticks for years. OTOH, that means that it might have to stay for backwards compatability reasons. Dag Sverre From sebastian at sipsolutions.net Sun Jan 6 13:56:22 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 06 Jan 2013 19:56:22 +0100 Subject: [Numpy-discussion] high dimensional array -> python scalar/index In-Reply-To: References: <1357495041.3537.7.camel@sebastian-laptop> Message-ID: <1357498582.3537.18.camel@sebastian-laptop> On Sun, 2013-01-06 at 13:28 -0500, josef.pktd at gmail.com wrote: > On Sun, Jan 6, 2013 at 12:57 PM, Sebastian Berg > wrote: > > Question for everyone, is this really reasonable: > > > >>>> import numpy as np > >>>> from operator import index > >>>> index(np.array([[5]])) > > 5 > >>>> int(np.array([[5]])) > > 5 > >>>> [0,1,2,3][np.array([[2]])] > > 2 > > Not sure I understand the point > > looks reasonable to my > > int has an implied squeeze, if it succeeds > Exactly *why* should it have an implied squeeze? Note I agree, the int(np.array([3])) is OK, since also int('10') works, however for index I think it is not OK, you simply cannot do list['10']. > not so python lists > > >>> int([[1]]) > Traceback (most recent call last): > File "", line 1, in > TypeError: int() argument must be a string or a number, not 'list' Exactly, so why should numpy be much more forgiving? > > >>> [0,1,2,3][np.array([[2, 2], [0, 1]])] > Traceback (most recent call last): > File "", line 1, in > TypeError: only integer arrays with one element can be converted to an index > > > but we can to more fun things with numpy > > >>> np.array([0,1,2,3])[np.array([[2, 2], [0, 1]])] > array([[2, 2], > [0, 1]]) > Of course... But if you compare to lists, thats actually a point why index should fail: >>> np.array([0,1,2,3])[np.array([[3]])] is very different from: >>> [0,1,2,3][np.array([[3]])] and in my opinion there is no reason why the latter should not simply fail. > Josef > > > > > To me, this does not make sense, why should we allow to use a high > > dimensional object like a normal scalar (its ok for 0-d arrays I guess)? > > Personally I would be for deprecating these usages, even if that > > (probably) means you cannot reshape your array with a matrix (as it is > > 2D) ;-): > >>>> np.arange(10).reshape(np.matrix([5,-1]).T) > > array([[0, 1], > > [2, 3], > > [4, 5], > > [6, 7], > > [8, 9]]) > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Sun Jan 6 17:36:43 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 6 Jan 2013 15:36:43 -0700 Subject: [Numpy-discussion] Remove support for numeric and numarray in 1.8 In-Reply-To: References: Message-ID: On Sun, Jan 6, 2013 at 8:09 AM, Charles R Harris wrote: > > > On Sun, Jan 6, 2013 at 7:42 AM, Nathaniel Smith wrote: > >> On Sun, Jan 6, 2013 at 2:38 AM, Charles R Harris >> wrote: >> > Thoughts? >> >> To be clear, what you're talking about is basically deleting these two >> packages: >> numpy.oldnumeric >> numpy.numarray >> plus the compatibility C API in >> numpy/numarray/include >> ? >> >> > Yep. > > >> So this would only affect Python code which explicitly imported one of >> those two packages (neither is imported by default), or C code which >> did #include "numpy/numarray/..."? >> >> > Those packages were intended to be an easy path for folks to port their > numeric and numarray code to numpy. During the 2.4 discussion there was a > fellow who said his group was just now moving their code from numeric to > numpy, but I had the feeling they were rewriting it in the process. > > >> (I'm not even sure how you would build such a C module, these headers >> are distributed in a weird directory not accessible via >> np.get_include(). So unless your build system does some special work >> to access it, you can't even see these headers.) >> >> > Never tried it myself. There is some C code in those packages and it easy > to overlook its maintenance, so I'd like to solve the problem by nuking it. > > Oops. The proposal is to only remove numarray support. The functions in oldnumeric have been taken over into numpy and we need to keep them. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Sun Jan 6 17:40:28 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Sun, 6 Jan 2013 14:40:28 -0800 Subject: [Numpy-discussion] Which Python to use for Mac binaries In-Reply-To: References: Message-ID: On Sun, Jan 6, 2013 at 2:04 AM, Ralf Gommers wrote: >> Which exact Python do we need to use on Mac? Do we need to use the >> binary installer from python.org? > > Yes, the one from python.org. > >> Or can I install it from source? you could install from source using the same method that the python.org binaries are built -- there is a script with the source to do that, though I'm not sure what the point of that would be. > The 10.3 installers for 2.5, 2.6 and 2.7 should be compiled on OS X 10.5. It would be great to continue support for that, though I wonder how many people still need it -- I don't think Apple supports 10.5 anymore, for instance. > The 10.7 --> 10.6 support hasn't been checked, but I wouldn't trust it. I > have a 10.6 machine, so I can compile those binaries if needed. That would be better, but it would also be nice to check how building on 10.7 works. > Avoid using system Python for anything. The first thing to do on any new OS > X system is install Python some other way, preferably from python.org. +1 > Last note: bdist_mpkg is unmaintained and doesn't support Python 3.x. Most > recent version is at: https://github.com/matthew-brett/bdist_mpkg, for > previous versions numpy releases I've used that at commit e81a58a471 There has been recent discussion on the pythonmac list about this -- some waffling about how important it is -- though I think it would be good to keep it up to date. > If we want 3.x binaries, then we should fix that or (preferably) build > binaries with Bento. Bento has grown support for mpkg's; I'm not sure how > robust that is. So maybe bento is a better route than bdist_mpkg -- this is worth discussion on teh pythonmac list. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From njs at pobox.com Sun Jan 6 19:07:21 2013 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 7 Jan 2013 00:07:21 +0000 Subject: [Numpy-discussion] Remove support for numeric and numarray in 1.8 In-Reply-To: References: Message-ID: On Sun, Jan 6, 2013 at 10:36 PM, Charles R Harris wrote: > > > On Sun, Jan 6, 2013 at 8:09 AM, Charles R Harris > wrote: >> >> >> >> On Sun, Jan 6, 2013 at 7:42 AM, Nathaniel Smith wrote: >>> >>> On Sun, Jan 6, 2013 at 2:38 AM, Charles R Harris >>> wrote: >>> > Thoughts? >>> >>> To be clear, what you're talking about is basically deleting these two >>> packages: >>> numpy.oldnumeric >>> numpy.numarray >>> plus the compatibility C API in >>> numpy/numarray/include >>> ? >>> >> >> Yep. >> >>> >>> So this would only affect Python code which explicitly imported one of >>> those two packages (neither is imported by default), or C code which >>> did #include "numpy/numarray/..."? >>> >> >> Those packages were intended to be an easy path for folks to port their >> numeric and numarray code to numpy. During the 2.4 discussion there was a >> fellow who said his group was just now moving their code from numeric to >> numpy, but I had the feeling they were rewriting it in the process. >> >>> >>> (I'm not even sure how you would build such a C module, these headers >>> are distributed in a weird directory not accessible via >>> np.get_include(). So unless your build system does some special work >>> to access it, you can't even see these headers.) >>> >> >> Never tried it myself. There is some C code in those packages and it easy >> to overlook its maintenance, so I'd like to solve the problem by nuking it. >> > > Oops. The proposal is to only remove numarray support. The functions in > oldnumeric have been taken over into numpy and we need to keep them. ...huh? The package name is mentioned nowhere in the numpy sources... ~/src/numpy/numpy$ find -type f | grep -Ev '^\./(numarray|oldnumeric)/' | xargs grep oldnumeric ./setupscons.py: config.add_subpackage('oldnumeric') ./bento.info: oldnumeric, ./core/setup.py: join('include', 'numpy', 'oldnumeric.h'), ./setup.py: config.add_subpackage('oldnumeric') ...and it's not even available unless a user explicitly does 'import numpy.oldnumeric' or 'import numpy.numarray', so no-one's using this stuff without knowing it: In [2]: import numpy In [3]: [m for m in sys.modules if m.startswith("numpy.oldnumeric")] Out[3]: [] -n From charlesr.harris at gmail.com Sun Jan 6 19:42:30 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 6 Jan 2013 17:42:30 -0700 Subject: [Numpy-discussion] Remove support for numeric and numarray in 1.8 In-Reply-To: References: Message-ID: On Sun, Jan 6, 2013 at 5:07 PM, Nathaniel Smith wrote: > On Sun, Jan 6, 2013 at 10:36 PM, Charles R Harris > wrote: > > > > > > On Sun, Jan 6, 2013 at 8:09 AM, Charles R Harris < > charlesr.harris at gmail.com> > > wrote: > >> > >> > >> > >> On Sun, Jan 6, 2013 at 7:42 AM, Nathaniel Smith wrote: > >>> > >>> On Sun, Jan 6, 2013 at 2:38 AM, Charles R Harris > >>> wrote: > >>> > Thoughts? > >>> > >>> To be clear, what you're talking about is basically deleting these two > >>> packages: > >>> numpy.oldnumeric > >>> numpy.numarray > >>> plus the compatibility C API in > >>> numpy/numarray/include > >>> ? > >>> > >> > >> Yep. > >> > >>> > >>> So this would only affect Python code which explicitly imported one of > >>> those two packages (neither is imported by default), or C code which > >>> did #include "numpy/numarray/..."? > >>> > >> > >> Those packages were intended to be an easy path for folks to port their > >> numeric and numarray code to numpy. During the 2.4 discussion there was > a > >> fellow who said his group was just now moving their code from numeric to > >> numpy, but I had the feeling they were rewriting it in the process. > >> > >>> > >>> (I'm not even sure how you would build such a C module, these headers > >>> are distributed in a weird directory not accessible via > >>> np.get_include(). So unless your build system does some special work > >>> to access it, you can't even see these headers.) > >>> > >> > >> Never tried it myself. There is some C code in those packages and it > easy > >> to overlook its maintenance, so I'd like to solve the problem by nuking > it. > >> > > > > Oops. The proposal is to only remove numarray support. The functions in > > oldnumeric have been taken over into numpy and we need to keep them. > > ...huh? The package name is mentioned nowhere in the numpy sources... > > ~/src/numpy/numpy$ find -type f | grep -Ev > '^\./(numarray|oldnumeric)/' | xargs grep oldnumeric > ./setupscons.py: config.add_subpackage('oldnumeric') > ./bento.info: oldnumeric, > ./core/setup.py: join('include', 'numpy', 'oldnumeric.h'), > ./setup.py: config.add_subpackage('oldnumeric') > > ...and it's not even available unless a user explicitly does 'import > numpy.oldnumeric' or 'import numpy.numarray', so no-one's using this > stuff without knowing it: > > In [2]: import numpy > > In [3]: [m for m in sys.modules if m.startswith("numpy.oldnumeric")] > Out[3]: [] > > Right. I mistakenly looked at numpy/core/fromoldnumeric.py. So yes, the oldnumeric directory and include also. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Sun Jan 6 20:43:42 2013 From: shish at keba.be (Olivier Delalleau) Date: Sun, 6 Jan 2013 20:43:42 -0500 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: 2013/1/5 Nathaniel Smith : > On Fri, Jan 4, 2013 at 5:25 PM, Andrew Collette > wrote: >> I agree the current behavior is confusing. Regardless of the details >> of what to do, I suppose my main objection is that, to me, it's really >> unexpected that adding a number to an array could result in an >> exception. > > I think the main objection to the 1.5 behaviour was that it violated > "Errors should never pass silently." (from 'import this'). Granted > there are tons of places where numpy violates this but this is the one > we're thinking about right now... > > Okay, here's another idea I'll throw out, maybe it's a good compromise: > > 1) We go back to the 1.5 behaviour. > > 2) If this produces a rollover/overflow/etc., we signal that using the > standard mechanisms (whatever is configured via np.seterr). So by > default things like > np.maximum(np.array([1, 2, 3], dtype=uint8), 256) > would succeed (and produce [1, 2, 3] with dtype uint8), but also issue > a warning that 256 had rolled over to become 0. Alternatively those > who want to be paranoid could call np.seterr(overflow="raise") and > then it would be an error. That'd work for me as well. Although I'm not sure about the name "overflow", it sounds generic enough that it may be associated to many different situations. If I want to have an error but only for this very specific scenario (an "unsafe" cast in a mixed scalar/array operation), would that be possible? Also, do we all agree that "float32 array + float64 scalar" should cast the scalar to float32 (thus resulting in a float32 array as output) without warning, even if the scalar can't be represented exactly in float32? -=- Olivier From njs at pobox.com Sun Jan 6 21:01:07 2013 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 7 Jan 2013 02:01:07 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: On Mon, Jan 7, 2013 at 1:43 AM, Olivier Delalleau wrote: > 2013/1/5 Nathaniel Smith : >> On Fri, Jan 4, 2013 at 5:25 PM, Andrew Collette >> wrote: >>> I agree the current behavior is confusing. Regardless of the details >>> of what to do, I suppose my main objection is that, to me, it's really >>> unexpected that adding a number to an array could result in an >>> exception. >> >> I think the main objection to the 1.5 behaviour was that it violated >> "Errors should never pass silently." (from 'import this'). Granted >> there are tons of places where numpy violates this but this is the one >> we're thinking about right now... >> >> Okay, here's another idea I'll throw out, maybe it's a good compromise: >> >> 1) We go back to the 1.5 behaviour. >> >> 2) If this produces a rollover/overflow/etc., we signal that using the >> standard mechanisms (whatever is configured via np.seterr). So by >> default things like >> np.maximum(np.array([1, 2, 3], dtype=uint8), 256) >> would succeed (and produce [1, 2, 3] with dtype uint8), but also issue >> a warning that 256 had rolled over to become 0. Alternatively those >> who want to be paranoid could call np.seterr(overflow="raise") and >> then it would be an error. > > That'd work for me as well. Although I'm not sure about the name > "overflow", it sounds generic enough that it may be associated to many > different situations. If I want to have an error but only for this > very specific scenario (an "unsafe" cast in a mixed scalar/array > operation), would that be possible? I suggested "overflow" because that's how we signal rollover in general right now: In [5]: np.int8(100) * np.int8(2) /home/njs/.user-python2.7-64bit/bin/ipython:1: RuntimeWarning: overflow encountered in byte_scalars #!/home/njs/.user-python2.7-64bit/bin/python Out[5]: -56 Two caveats on this: One, right now this is only implemented for scalars, not arrays -- which is bug #593 -- and two, I actually agree (?) that integer rollover and float overflow are different things we should probably add a new category to np.seterr() for integer rollover specifically. But the proposal here is that we not add a specific category for "unsafe cast" (which we would then have to define!), but instead just signal it using the standard mechanisms for the particular kind of corruption that happened. (Which right now is overflow, and might become something else later.) > Also, do we all agree that "float32 array + float64 scalar" should > cast the scalar to float32 (thus resulting in a float32 array as > output) without warning, even if the scalar can't be represented > exactly in float32? I guess for consistency, if this proposal is adopted then a float64 which ends up getting cast to 'inf' or 0.0 should trigger an overflow or underflow warning respectively... e.g.: In [12]: np.float64(1e300) Out[12]: 1.0000000000000001e+300 In [13]: np.float32(_12) Out[13]: inf ...but otherwise I think yes we agree. -n From shish at keba.be Sun Jan 6 21:17:14 2013 From: shish at keba.be (Olivier Delalleau) Date: Sun, 6 Jan 2013 21:17:14 -0500 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: 2013/1/6 Nathaniel Smith : > On Mon, Jan 7, 2013 at 1:43 AM, Olivier Delalleau wrote: >> 2013/1/5 Nathaniel Smith : >>> On Fri, Jan 4, 2013 at 5:25 PM, Andrew Collette >>> wrote: >>>> I agree the current behavior is confusing. Regardless of the details >>>> of what to do, I suppose my main objection is that, to me, it's really >>>> unexpected that adding a number to an array could result in an >>>> exception. >>> >>> I think the main objection to the 1.5 behaviour was that it violated >>> "Errors should never pass silently." (from 'import this'). Granted >>> there are tons of places where numpy violates this but this is the one >>> we're thinking about right now... >>> >>> Okay, here's another idea I'll throw out, maybe it's a good compromise: >>> >>> 1) We go back to the 1.5 behaviour. >>> >>> 2) If this produces a rollover/overflow/etc., we signal that using the >>> standard mechanisms (whatever is configured via np.seterr). So by >>> default things like >>> np.maximum(np.array([1, 2, 3], dtype=uint8), 256) >>> would succeed (and produce [1, 2, 3] with dtype uint8), but also issue >>> a warning that 256 had rolled over to become 0. Alternatively those >>> who want to be paranoid could call np.seterr(overflow="raise") and >>> then it would be an error. >> >> That'd work for me as well. Although I'm not sure about the name >> "overflow", it sounds generic enough that it may be associated to many >> different situations. If I want to have an error but only for this >> very specific scenario (an "unsafe" cast in a mixed scalar/array >> operation), would that be possible? > > I suggested "overflow" because that's how we signal rollover in > general right now: > > In [5]: np.int8(100) * np.int8(2) > /home/njs/.user-python2.7-64bit/bin/ipython:1: RuntimeWarning: > overflow encountered in byte_scalars > #!/home/njs/.user-python2.7-64bit/bin/python > Out[5]: -56 > > Two caveats on this: One, right now this is only implemented for > scalars, not arrays -- which is bug #593 -- and two, I actually agree > (?) that integer rollover and float overflow are different things we > should probably add a new category to np.seterr() for integer rollover > specifically. > > But the proposal here is that we not add a specific category for > "unsafe cast" (which we would then have to define!), but instead just > signal it using the standard mechanisms for the particular kind of > corruption that happened. (Which right now is overflow, and might > become something else later.) Hehe, I didn't even know there was supposed to be a warning for arrays... Ok. But I'm not convinced that re-using the "overflow" category is a good idea, because to me the overflow is typically associated to the result of an operation (when it goes beyond the dtype's supported range), while here the problem is with the unsafe cast an input (even if it makes no difference for addition, it does for some other ufuncs). I may also want to have different error settings for operation overflow vs. input overflow. It may just be me though... let's see what others think about it. > >> Also, do we all agree that "float32 array + float64 scalar" should >> cast the scalar to float32 (thus resulting in a float32 array as >> output) without warning, even if the scalar can't be represented >> exactly in float32? > > I guess for consistency, if this proposal is adopted then a float64 > which ends up getting cast to 'inf' or 0.0 should trigger an overflow > or underflow warning respectively... e.g.: > > In [12]: np.float64(1e300) > Out[12]: 1.0000000000000001e+300 > > In [13]: np.float32(_12) > Out[13]: inf > > ...but otherwise I think yes we agree. Sounds good to me. -=- Olivier From raul at virtualmaterials.com Sun Jan 6 23:15:24 2013 From: raul at virtualmaterials.com (Raul Cota) Date: Sun, 06 Jan 2013 21:15:24 -0700 Subject: [Numpy-discussion] Remove support for numeric and numarray in 1.8 In-Reply-To: References: Message-ID: <50EA4BDC.2060108@virtualmaterials.com> An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Jan 7 06:37:09 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 7 Jan 2013 11:37:09 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi, On Fri, Jan 4, 2013 at 5:25 PM, Andrew Collette wrote: > I agree the current behavior is confusing. Regardless of the details > of what to do, I suppose my main objection is that, to me, it's really > unexpected that adding a number to an array could result in an > exception. I realized when I thought about it, that I did not have a clear idea of your exact use case. How does the user specify the thing to add, and why do you need to avoid an error in the case that adding would overflow the type? Would you mind giving an idiot-level explanation? Best, Matthew From matthew.brett at gmail.com Mon Jan 7 08:31:59 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 7 Jan 2013 13:31:59 +0000 Subject: [Numpy-discussion] Which Python to use for Mac binaries In-Reply-To: References: Message-ID: Hi, On Sun, Jan 6, 2013 at 10:40 PM, Chris Barker - NOAA Federal wrote: > On Sun, Jan 6, 2013 at 2:04 AM, Ralf Gommers wrote: >>> Which exact Python do we need to use on Mac? Do we need to use the >>> binary installer from python.org? >> >> Yes, the one from python.org. >> >>> Or can I install it from source? > > you could install from source using the same method that the > python.org binaries are built -- there is a script with the source to > do that, though I'm not sure what the point of that would be. > >> The 10.3 installers for 2.5, 2.6 and 2.7 should be compiled on OS X 10.5. > > It would be great to continue support for that, though I wonder how > many people still need it -- I don't think Apple supports 10.5 > anymore, for instance. > >> The 10.7 --> 10.6 support hasn't been checked, but I wouldn't trust it. I >> have a 10.6 machine, so I can compile those binaries if needed. > > That would be better, but it would also be nice to check how building > on 10.7 works. > >> Avoid using system Python for anything. The first thing to do on any new OS >> X system is install Python some other way, preferably from python.org. > > +1 > >> Last note: bdist_mpkg is unmaintained and doesn't support Python 3.x. Most >> recent version is at: https://github.com/matthew-brett/bdist_mpkg, for >> previous versions numpy releases I've used that at commit e81a58a471 > > There has been recent discussion on the pythonmac list about this -- > some waffling about how important it is -- though I think it would be > good to keep it up to date. I updated my fork of bdist_mpkg with Python 3k support. It doesn't have any tests that I could see, but I've run it on python 2.6 and 3.2 and 3.3 on one of my packages as a first pass. >> If we want 3.x binaries, then we should fix that or (preferably) build >> binaries with Bento. Bento has grown support for mpkg's; I'm not sure how >> robust that is. > > So maybe bento is a better route than bdist_mpkg -- this is worth > discussion on teh pythonmac list. David - can you give a status update on that? Cheers, Matthew From charlesr.harris at gmail.com Mon Jan 7 11:22:51 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 7 Jan 2013 09:22:51 -0700 Subject: [Numpy-discussion] Remove support for numeric and numarray in 1.8 In-Reply-To: <50EA4BDC.2060108@virtualmaterials.com> References: <50EA4BDC.2060108@virtualmaterials.com> Message-ID: On Sun, Jan 6, 2013 at 9:15 PM, Raul Cota wrote: > I realize we may be a minority but it would be very nice if support for > numeric could be kept for a few more versions. We don't have any particular > needs for numarray. > > We just under went through an extremely long and painful process to > upgrade our software from Numeric to numpy and everything hinges on the > "oldnumeric" stuff. This was the classical 80-20 scenario where we got most > of the stuff to work in a just a few days and then we had to revisit > several areas of our software to iron out all the bugs and subtle but > meaningful differences between numpy and Numeric. The last round of work > was related to speed. We still have not released the upgrade to our > costumers therefore we expect still a few more bugs to surface. > > Bottom line, we are still about one or two years away from changing all > our imports to numpy. Yes, I know, we move fairly slowly but that is our > reality. > > Good to know. Have you tested oldnumeric in the upcoming 1.7 release? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew.collette at gmail.com Mon Jan 7 11:33:48 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Mon, 7 Jan 2013 09:33:48 -0700 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi Matthew, > I realized when I thought about it, that I did not have a clear idea > of your exact use case. How does the user specify the thing to add, > and why do you need to avoid an error in the case that adding would > overflow the type? Would you mind giving an idiot-level explanation? There isn't a specific use case I had in mind... from a developer's perspective, what bothers me about the proposed behavior is that every use of "+" on user-generated input becomes a time bomb. Since h5py deals with user-generated files, I have to deal with all kinds of dtypes, including low-precision ones like int8/uint8. They come from user-supplied function and methods arguments, sure, but also from datasets in files; attributes; virtually everywhere. I suppose what I'm really asking is that numpy provides (continues to provide) a default rule in this situation, as does every other scientific language I've used. One reason to avoid a ValueError in favor of default behavior (in addition to the large amount of work required to check every use of "+") is so there's an established behavior users know to expect. For example, one feature we're thinking of implementing involves adding an offset to a dataset when it's read. Should we roll over? Upcast? It seems to me there's great value in being able to say "We do what numpy does." If numpy doesn't answer the question, everybody makes up their own rules. There are certainly cases where the answer is obvious to the application: you have a huge number of int8's and don't want to upcast. Or you don't want to lose precision. But if numpy provides a default rule, nobody is prevented from making careful choices based on their application's requirements, and there's the additional value of having an common, documented default behavior. Andrew From andrew.collette at gmail.com Mon Jan 7 11:38:44 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Mon, 7 Jan 2013 09:38:44 -0700 Subject: [Numpy-discussion] ANN: HDF5 for Python (h5py) 2.1.1 Message-ID: Announcing HDF5 for Python (h5py) 2.1.1 ======================================= HDF5 for Python 2.1.1 is now available! This bugfix release also marks a number of changes for the h5py project intended to make the development process more responsive, including a move to GitHub and a switch to a rapid release model. Development has moved over to GitHub at http://github.com/h5py/h5py. We welcome bug reports and pull requests from anyone interested in contributing. Releases will now be made every 4-6 weeks, in order to get bugfixes and new features out to users quickly while still leaving time for testing. * New main website: http://www.h5py.org * Mailing list: http://groups.google.com/group/h5py What is h5py? ============= The h5py package is a Pythonic interface to the HDF5 binary data format. It lets you store huge amounts of numerical data, and easily manipulate that data from NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they were real NumPy arrays. Thousands of datasets can be stored in a single file, categorized and tagged however you want. H5py uses straightforward NumPy and Python metaphors, like dictionary and NumPy array syntax. For example, you can iterate over datasets in a file, or check out the .shape or .dtype attributes of datasets. You don't need to know anything special about HDF5 to get started. In addition to the easy-to-use high level interface, h5py rests on a object-oriented Cython wrapping of the HDF5 C API. Almost anything you can do from C in HDF5, you can do from h5py. Best of all, the files you create are in a widely-used standard binary format, which you can exchange with other people, including those who use programs like IDL and MATLAB. What's new in 2.1.1? ==================== This is a bugfix release. The most substantial changes were: * Fixed a memory leak related to variable-length strings (Thanks to Luke Campbell for extensive testing and bug reports) * Fixed a threading deadlock related to the use of H5Aiterate * Fixed a double INCREF memory leak affecting Unicode variable-length strings * Fixed an exception when taking the repr() of objects with non-ASCII names From raul at virtualmaterials.com Mon Jan 7 11:47:26 2013 From: raul at virtualmaterials.com (Raul Cota) Date: Mon, 07 Jan 2013 09:47:26 -0700 Subject: [Numpy-discussion] Remove support for numeric and numarray in 1.8 In-Reply-To: References: <50EA4BDC.2060108@virtualmaterials.com> Message-ID: <50EAFC1E.1010608@virtualmaterials.com> An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Jan 7 12:48:24 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Mon, 7 Jan 2013 09:48:24 -0800 Subject: [Numpy-discussion] Which Python to use for Mac binaries In-Reply-To: References: Message-ID: On Mon, Jan 7, 2013 at 5:31 AM, Matthew Brett wrote: > I updated my fork of bdist_mpkg with Python 3k support. It doesn't > have any tests that I could see, but I've run it on python 2.6 and 3.2 > and 3.3 on one of my packages as a first pass. Have you been in communication with Ronald Oussoren about this? I'm sure he'd be interested in bringing into the "official"repository. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From chris.barker at noaa.gov Mon Jan 7 13:08:03 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Mon, 7 Jan 2013 10:08:03 -0800 Subject: [Numpy-discussion] Insights / lessons learned from NumPy design In-Reply-To: References: Message-ID: On Thu, Jan 3, 2013 at 10:29 PM, Mike Anderson wrote: > In the Clojure community there has been some discussion about creating a > common matrix maths library / API. Currently there are a few different > fledgeling matrix libraries in Clojure, so it seemed like a worthwhile > effort to unify them and have a common base on which to build on. > > NumPy has been something of an inspiration for this, so I though I'd ask > here to see what lessons have been learned. A few thoughts: > We're thinking of a matrix library First -- is this a "matrix" library, or a general use nd-array library? That will drive your design a great deal. For my part, I came from MATLAB, which started our very focused on matrixes, then extended to be more generally useful. Personally, I found the matrix-focus to get in the way more than help -- in any "real" code, you're the actual matrix operations are likely to be a tiny fraction of the code. One reason I like numpy is that it is array-first, with secondary support for matrix stuff. That being said, there is the numpy matrix type, and there are those that find it very useful. particularly in teaching situations, though it feels a bit "tacked-on", and that does get in the way, so if you want a "real" matrix object, but also a general purpose array lib, thinking about both up front will be helpful. > - Support for multi-dimensional matrices (but with fast paths for 1D vectors > and 2D matrices as the common cases) what is a multi-dimensional matrix? -- is a 3-d something, a stack of matrixes? or something else? (note, numpy lacks this kind of object, but it is sometimes asked for -- i.e a way to do fast matrix multiplication with a lot of small matrixes) I think fast paths for 1-D and 2-D is secondary, though you may want "easy paths" for those. IN particular, if you want good support for linear algebra (matrixes), then having a clean and natural "row vector and "column vector" would be nice. See the archives of this list for a bunch of discussion about that -- and what the weaknesses are of the numpy matrix object. > - Immutability by default, i.e. matrix operations are pure functions that > create new matrices. I'd be careful about this -- the purity and predictability is nice, but these days a lot of time is spend allocating and moving memory around -- numpy array's mutability is a major key feature -- indeed, the key issues with performance with numpy surrond the fact that many copies may be made unnecessarily (note, Dag's suggesting of lazy evaluation may mitigate this to some extent). > - Support for 64-bit double precision floats only (this is the standard > float type in Clojure) not a bad start, but another major strength of numpy is the multiple data types - you may wantt to design that concept in from the start. > - Ability to support multiple different back-end matrix implementations > (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.) This ties in to another major strength of numpy -- ndarrays are both powerful python objects, and wrappers around standard C arrays -- that makes it pretty darn easy to interface with external libraries for core computation. HTH, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From rhl at astro.princeton.edu Mon Jan 7 13:19:00 2013 From: rhl at astro.princeton.edu (Robert Lupton the Good) Date: Mon, 7 Jan 2013 13:19:00 -0500 Subject: [Numpy-discussion] Which Python to use for Mac binaries In-Reply-To: References: Message-ID: <35A75EE5-AE01-4144-B7CB-54D492077E21@astro.princeton.edu> I am sympathetic with this attitude ("Avoid using system Python for anything"), but I don't think it's the right one. For example, the project I'm working on (HSC/LSST for astrofolk) is using python/C++ for astronomical imaging, and we expect to have the code running on a significant number of end-user laptops. If the instructions start out with: 0. Install a new version of python it's a significant barrier. What if they've already involved other packages into the system python? Python is a central part of modern operating systems, and people should not have to manage two versions of python to use numpy. It's tempting to say, "First install g++ 4.7 so we can use C++11 features" it's simply not viable, and I think that saying, "first install a new python" is comparable. Yes, I know that you can have more than one python/compiler suite installed simultaneously, but that's not something for casual users to have to get involved in. R > On Sun, Jan 6, 2013 at 2:04 AM, Ralf Gommers wrote: >>> Which exact Python do we need to use on Mac? Do we need to use the >>> binary installer from python.org? ... > >> Avoid using system Python for anything. The first thing to do on any new OS >> X system is install Python some other way, preferably from python.org. > > +1 From chris.barker at noaa.gov Mon Jan 7 13:35:28 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Mon, 7 Jan 2013 10:35:28 -0800 Subject: [Numpy-discussion] Which Python to use for Mac binaries In-Reply-To: <35A75EE5-AE01-4144-B7CB-54D492077E21@astro.princeton.edu> References: <35A75EE5-AE01-4144-B7CB-54D492077E21@astro.princeton.edu> Message-ID: On Mon, Jan 7, 2013 at 10:19 AM, Robert Lupton the Good wrote: > I am sympathetic with this attitude ("Avoid using system Python for anything"), but I don't think it's the right one. For example, the project I'm working on (HSC/LSST for astrofolk) is using python/C++ for astronomical imaging, and we expect to have the code running on a significant number of end-user laptops. If the instructions start out with: > 0. Install a new version of python > it's a significant barrier. What if they've already involved other packages into the system python? What if they've already installed other packages into the python.org python? or fink? macports? or homebrew? or build-your own? Unfortunately, python on the Mac is a bit of a mess--there are WAY too many ways to get Python working. It's great that Apple provides python as part of the system, but unfortunately: * Apple has NEVER upgraded python within a OS-X version. * Apple includes proprietary code with their build, so you are not allowed to re-distribute it (i.e. py2app). As a result, the MacPython community can not declare that Apple's build is the primary one we want to support. This has been hashed out a bunch on the PythonMac list, and there was more or less consensus that the python.org builds would be the ones that we as a community try to support with binaries, etc. All that being said, for the most part, the Apple builds and python.org builds are compatible. Robin Dunn has worked out a way to build installers for wxPython that work with both the python.org and Apple builds -- putting everything in /usr/local, and *.pth files in both of the python builds -- it's a hack, but it works, that may be an approach worth taking. It also wouldn't be that hard to build a duplicate set of binaries, but it does get to be a lot for users to figure out what they need. > Yes, I know that you can have more than one python/compiler suite installed simultaneously, but that's not something for casual users to have to get involved in. Installing a binary from python.org is not much of a challenge for anyone that is installing anything, actually. It's true that it's a pain for users to get a system all set up, then find out that to use the numpy binaries (or anything else...) they need to start over with a new python -- that's why we in the MacPython community encourage everyone to build binaries for the python.org builds -- standards are good, but standardizing on the Apple builds isn't viable. (NOTE: the "decision" was made a few years back -- it may be worth re-visiting, but I'm pretty sure that Apple's build is still not suitable as the default choice) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From matthew.brett at gmail.com Mon Jan 7 15:01:06 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 7 Jan 2013 20:01:06 +0000 Subject: [Numpy-discussion] Which Python to use for Mac binaries In-Reply-To: References: Message-ID: Hi, On Mon, Jan 7, 2013 at 5:48 PM, Chris Barker - NOAA Federal wrote: > On Mon, Jan 7, 2013 at 5:31 AM, Matthew Brett wrote: > >> I updated my fork of bdist_mpkg with Python 3k support. It doesn't >> have any tests that I could see, but I've run it on python 2.6 and 3.2 >> and 3.3 on one of my packages as a first pass. > > Have you been in communication with Ronald Oussoren about this? I'm > sure he'd be interested in bringing into the "official"repository. I just emailed him, thanks for the suggestion. Best, Matthew From matthew.brett at gmail.com Mon Jan 7 15:12:51 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 7 Jan 2013 20:12:51 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi, On Mon, Jan 7, 2013 at 4:33 PM, Andrew Collette wrote: > Hi Matthew, > >> I realized when I thought about it, that I did not have a clear idea >> of your exact use case. How does the user specify the thing to add, >> and why do you need to avoid an error in the case that adding would >> overflow the type? Would you mind giving an idiot-level explanation? > > There isn't a specific use case I had in mind... from a developer's > perspective, what bothers me about the proposed behavior is that every > use of "+" on user-generated input becomes a time bomb. Since h5py > deals with user-generated files, I have to deal with all kinds of > dtypes, including low-precision ones like int8/uint8. They come from > user-supplied function and methods arguments, sure, but also from > datasets in files; attributes; virtually everywhere. > > I suppose what I'm really asking is that numpy provides (continues to > provide) a default rule in this situation, as does every other > scientific language I've used. One reason to avoid a ValueError in > favor of default behavior (in addition to the large amount of work > required to check every use of "+") is so there's an established > behavior users know to expect. > > For example, one feature we're thinking of implementing involves > adding an offset to a dataset when it's read. Should we roll over? > Upcast? It seems to me there's great value in being able to say "We > do what numpy does." If numpy doesn't answer the question, everybody > makes up their own rules. There are certainly cases where the answer > is obvious to the application: you have a huge number of int8's and > don't want to upcast. Or you don't want to lose precision. But if > numpy provides a default rule, nobody is prevented from making careful > choices based on their application's requirements, and there's the > additional value of having an common, documented default behavior. Just to be clear, you mean you might have something like this? def my_func('array_name', some_offset): arr = load_somehow('array_name') # dtype hitherto unknown return arr + some_offset ? And the problem is that it fails late? Is it really better that something bad happens for the addition than that it raises an error? You'll also often get an error when trying to add structured dtypes, but maybe you cant return these from a 'load'? Best, Matthew From njs at pobox.com Mon Jan 7 15:16:03 2013 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 7 Jan 2013 20:16:03 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: On Mon, Jan 7, 2013 at 2:17 AM, Olivier Delalleau wrote: > Hehe, I didn't even know there was supposed to be a warning for arrays... Ok. > > But I'm not convinced that re-using the "overflow" category is a good > idea, because to me the overflow is typically associated to the result > of an operation (when it goes beyond the dtype's supported range), > while here the problem is with the unsafe cast an input (even if it > makes no difference for addition, it does for some other ufuncs). Right, there are two operations: casting the inputs to a common type, and then performing the addition. It's the first operation that rolls over and would trigger a warning/error/whatever, not the second. -n From andrew.collette at gmail.com Mon Jan 7 15:50:12 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Mon, 7 Jan 2013 13:50:12 -0700 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi Matthew, > Just to be clear, you mean you might have something like this? > > def my_func('array_name', some_offset): > arr = load_somehow('array_name') # dtype hitherto unknown > return arr + some_offset > > ? And the problem is that it fails late? Is it really better that > something bad happens for the addition than that it raises an error? > > You'll also often get an error when trying to add structured dtypes, > but maybe you cant return these from a 'load'? In this specific case I would like to just use "+" and say "We add your offset using the NumPy rules," which is a problem if there are no NumPy rules for addition in the specific case where some_offset happens to be a scalar and not an array, and also slightly larger than arr.dtype can hold. I personally prefer upcasting to some reasonable type big enough to hold some_offset, as I described earlier, although that's not crucial. But I think we're getting a little caught up in the details of this example. My basic point is: yes, people should be careful to check dtypes, etc. where it's important to their application; but people who want to rely on some reasonable NumPy-supplied default behavior should be excused from doing so. Andrew From matthew.brett at gmail.com Mon Jan 7 15:55:25 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 7 Jan 2013 20:55:25 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi, On Mon, Jan 7, 2013 at 8:50 PM, Andrew Collette wrote: > Hi Matthew, > >> Just to be clear, you mean you might have something like this? >> >> def my_func('array_name', some_offset): >> arr = load_somehow('array_name') # dtype hitherto unknown >> return arr + some_offset >> >> ? And the problem is that it fails late? Is it really better that >> something bad happens for the addition than that it raises an error? >> >> You'll also often get an error when trying to add structured dtypes, >> but maybe you cant return these from a 'load'? > > In this specific case I would like to just use "+" and say "We add > your offset using the NumPy rules," which is a problem if there are no > NumPy rules for addition in the specific case where some_offset > happens to be a scalar and not an array, and also slightly larger than > arr.dtype can hold. I personally prefer upcasting to some reasonable > type big enough to hold some_offset, as I described earlier, although > that's not crucial. > > But I think we're getting a little caught up in the details of this > example. My basic point is: yes, people should be careful to check > dtypes, etc. where it's important to their application; but people who > want to rely on some reasonable NumPy-supplied default behavior should > be excused from doing so. For myself, I find detailed examples helpful, because I find it difficult to think about more general rules without applying them to practical cases. In this case I think you'd probably agree it would be reasonable to raise an error - all other things being equal? Can you think of another practical case where it would be reasonably clear that it was the wrong thing to do? Cheers, Matthew From d.s.seljebotn at astro.uio.no Mon Jan 7 16:17:45 2013 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 07 Jan 2013 22:17:45 +0100 Subject: [Numpy-discussion] =?utf-8?q?Do_we_want_scalar_casting_to_behave_?= =?utf-8?q?as_it_does_at_the_moment=3F?= In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: On 2013-01-07 21:50, Andrew Collette wrote: > Hi Matthew, > >> Just to be clear, you mean you might have something like this? >> >> def my_func('array_name', some_offset): >> arr = load_somehow('array_name') # dtype hitherto unknown >> return arr + some_offset >> >> ? And the problem is that it fails late? Is it really better that >> something bad happens for the addition than that it raises an error? >> >> You'll also often get an error when trying to add structured dtypes, >> but maybe you cant return these from a 'load'? > > In this specific case I would like to just use "+" and say "We add > your offset using the NumPy rules," which is a problem if there are > no > NumPy rules for addition in the specific case where some_offset > happens to be a scalar and not an array, and also slightly larger > than > arr.dtype can hold. I personally prefer upcasting to some reasonable > type big enough to hold some_offset, as I described earlier, although > that's not crucial. > > But I think we're getting a little caught up in the details of this > example. My basic point is: yes, people should be careful to check > dtypes, etc. where it's important to their application; but people > who > want to rely on some reasonable NumPy-supplied default behavior > should > be excused from doing so. But the default float dtype is double, and default integer dtype is at least int32. So if you rely on NumPy-supplied default behaviour you are fine! If you specify a smaller dtype for your arrays, you have some reason to do that. If you had enough memory to not worry about automatic conversion from int8 to int16, you would have specified it as int16 in the first place when you created the array. Dag Sverre From andrew.collette at gmail.com Mon Jan 7 16:18:52 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Mon, 7 Jan 2013 14:18:52 -0700 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi Matthew, > In this case I think you'd probably agree it would be reasonable to > raise an error - all other things being equal? No, I don't agree. I want there to be some default semantics I can rely on. Preferably, I want it to do the same thing it would do if some_offset were an array with element-by-element offsets, which is the current behavior of numpy 1.6 if you assume a reasonable dtype for some_offset. > Can you think of another practical case where it would be reasonably > clear that it was the wrong thing to do? I consider "myarray + constant -> Error" clearly wrong no matter what the context. I've never seen it in any other analysis language I've used. But it's also possible that I'm alone in this... I haven't seen many other people here arguing against the change. Andrew From andrew.collette at gmail.com Mon Jan 7 16:24:15 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Mon, 7 Jan 2013 14:24:15 -0700 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi Dag, > But the default float dtype is double, and default integer dtype is at > least > int32. > > So if you rely on NumPy-supplied default behaviour you are fine! As I mentioned, this caught my interest because people routinely save data in HDF5 as int8 or int16 to save disk space. It's not at all unusual to end up with these precisions when you read from a file. > If you specify a smaller dtype for your arrays, you have some reason to do that. In this case, the reason is that the person who gave me the file chose to store the data as e.g. int16. Good default semantics for things like addition make it easy to write generic code. Andrew From matthew.brett at gmail.com Mon Jan 7 16:31:43 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 7 Jan 2013 21:31:43 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi, On Mon, Jan 7, 2013 at 9:18 PM, Andrew Collette wrote: > Hi Matthew, > >> In this case I think you'd probably agree it would be reasonable to >> raise an error - all other things being equal? > > No, I don't agree. I want there to be some default semantics I can > rely on. Preferably, I want it to do the same thing it would do if > some_offset were an array with element-by-element offsets, which is > the current behavior of numpy 1.6 if you assume a reasonable dtype for > some_offset. Ah - well - I only meant that raising an error in the example would be no more surprising than raising an error at the python prompt. Do you agree with that? I mean, if the user knew that: >>> np.array([1], dtype=np.int8) + 128 would raise an error, they'd probably expect your offset routine to do the same. >> Can you think of another practical case where it would be reasonably >> clear that it was the wrong thing to do? > > I consider "myarray + constant -> Error" clearly wrong no matter what > the context. I've never seen it in any other analysis language I've > used. But it's also possible that I'm alone in this... I haven't seen > many other people here arguing against the change. I agree it kind of feels funny, but that's why I wanted to ask you for some silly but specific example where the funniness would be more apparent. Cheers, Matthew From andrew.collette at gmail.com Mon Jan 7 17:03:49 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Mon, 7 Jan 2013 15:03:49 -0700 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi Matthew, > Ah - well - I only meant that raising an error in the example would be > no more surprising than raising an error at the python prompt. Do you > agree with that? I mean, if the user knew that: > >>>> np.array([1], dtype=np.int8) + 128 > > would raise an error, they'd probably expect your offset routine to do the same. I think they would be surprised in both cases, considering this works fine: np.array([1], dtype=np.int8) + np.array([128]) > I agree it kind of feels funny, but that's why I wanted to ask you for > some silly but specific example where the funniness would be more > apparent. Here are a couple of examples I slapped together, specifically highlighting the value of the present (or similar) upcasting behavior. Granted, they are contrived and can all be fixed by conditional code, but this is my best effort at illustrating the "real-world" problems people may run into. Note that there is no easy way for the user to force upcasting to avoid the error, unless e.g. an "upcast" keyword were added to these functions, or code added to inspect the data dtype and use numpy.add to simulate the current behavior. def map_heights(self, dataset_name, heightmap): """ Correct altitudes by adding a custom heightmap dataset_name: Name of HDF5 dataset containing altitude data heightmap: Corrections in meters. Must match shape of the dataset (or be a scalar). """ # TODO: scattered reports of errors when a constant heightmap value is used return self.f[dataset_name][...] + heightmap def perform_analysis(self, dataset_name, kernel_offset=128): """ Apply Frobnication analysis, using optional linear offset dataset_name: Name of dataset in file kernel_offset: Optional sequencing parameter. Must be a power of 2 and at least 16 (default 128) """ # TODO: people report certain files frobnicate fine in IDL but not in Python... import frob data = self.f[dataset_name][...] try: return frob.frobnicate(data + kernel_offset) except ValueError: raise AnalysisFailed("Invalid input data") From matthew.brett at gmail.com Mon Jan 7 17:26:08 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 7 Jan 2013 22:26:08 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi, On Mon, Jan 7, 2013 at 10:03 PM, Andrew Collette wrote: > Hi Matthew, > >> Ah - well - I only meant that raising an error in the example would be >> no more surprising than raising an error at the python prompt. Do you >> agree with that? I mean, if the user knew that: >> >>>>> np.array([1], dtype=np.int8) + 128 >> >> would raise an error, they'd probably expect your offset routine to do the same. > > I think they would be surprised in both cases, considering this works fine: > > np.array([1], dtype=np.int8) + np.array([128]) > >> I agree it kind of feels funny, but that's why I wanted to ask you for >> some silly but specific example where the funniness would be more >> apparent. > > Here are a couple of examples I slapped together, specifically > highlighting the value of the present (or similar) upcasting behavior. > Granted, they are contrived and can all be fixed by conditional code, > but this is my best effort at illustrating the "real-world" problems > people may run into. > > Note that there is no easy way for the user to force upcasting to > avoid the error, unless e.g. an "upcast" keyword were added to these > functions, or code added to inspect the data dtype and use numpy.add > to simulate the current behavior. > > def map_heights(self, dataset_name, heightmap): > """ Correct altitudes by adding a custom heightmap > > dataset_name: Name of HDF5 dataset containing altitude data > heightmap: Corrections in meters. Must match shape of the > dataset (or be a scalar). > """ > # TODO: scattered reports of errors when a constant heightmap value is used > > return self.f[dataset_name][...] + heightmap > > def perform_analysis(self, dataset_name, kernel_offset=128): > """ Apply Frobnication analysis, using optional linear offset > > dataset_name: Name of dataset in file > kernel_offset: Optional sequencing parameter. Must be a power of > 2 and at least 16 (default 128) > """ > # TODO: people report certain files frobnicate fine in IDL but not > in Python... > > import frob > data = self.f[dataset_name][...] > try: > return frob.frobnicate(data + kernel_offset) > except ValueError: > raise AnalysisFailed("Invalid input data") Thanks - I know it seems silly - but it is helpful. There are two separate issues though: 1) Is the upcasting behavior of 1.6 better than the overflow behavior of 1.5? 2) If the upcasting of 1.6 is bad, is it better to raise an error or silently overflow, as in 1.5? Taking 2) first, in this example: > return self.f[dataset_name][...] + heightmap assuming it is not going to upcast, would you rather it overflow than raise an error? Why? The second seems more explicit and sensible to me. For 1) - of course the upcasting in 1.6 is only going to work some of the time. For example: In [2]: np.array([127], dtype=np.int8) * 1000 Out[2]: array([-4072], dtype=int16) So - you'll get something, but there's a reasonable chance you won't get what you were expecting. Of course that is true for 1.5 as well, but at least the rule there is simpler and so easier - in my opinion - to think about. Best, Matthew From andrew.collette at gmail.com Mon Jan 7 17:58:12 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Mon, 7 Jan 2013 15:58:12 -0700 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi, > Taking 2) first, in this example: > >> return self.f[dataset_name][...] + heightmap > > assuming it is not going to upcast, would you rather it overflow than > raise an error? Why? The second seems more explicit and sensible to > me. Yes, I think this (the 1.5 overflow behavior) was a bit odd, if easy to understand. > For 1) - of course the upcasting in 1.6 is only going to work some of > the time. For example: > > In [2]: np.array([127], dtype=np.int8) * 1000 > Out[2]: array([-4072], dtype=int16) > > So - you'll get something, but there's a reasonable chance you won't > get what you were expecting. Of course that is true for 1.5 as well, > but at least the rule there is simpler and so easier - in my opinion - > to think about. Part of what my first example was trying to demonstrate was that the function author assumed arrays and scalars obeyed the same rules for addition. For example, if data were int8 and heightmap were an int16 array with a max value of 32767, and the data had a max value in the same spot with e.g. 10, then the addition would overflow at that position, even with the int16 result. That's how array addition works in numpy, and as I understand it that's not slated to change. But when we have a scalar of value 32767 (which fits in int16 but not int8), we are proposing instead to do nothing under the assumption that it's an error. In summary: yes, there are some odd results, but they're consistent with the rules for addition elsewhere in numpy, and I would prefer that to treating this case as an error. Out of curiosity, I checked what IDL did, and it overflows using something like the numpy 1.6 rules: IDL> print, byte(1) + fix(32767) -32768 and in other places with 1.5-like behavior: IDL> print, byte(1) ^ fix(1000) 1 Of course, I don't hold up IDL as a shining example of good analysis software. :) Andrew From raul at virtualmaterials.com Mon Jan 7 18:08:58 2013 From: raul at virtualmaterials.com (Raul Cota) Date: Mon, 07 Jan 2013 16:08:58 -0700 Subject: [Numpy-discussion] Remove support for numeric and numarray in 1.8 In-Reply-To: References: <50EA4BDC.2060108@virtualmaterials.com> Message-ID: <50EB558A.1030306@virtualmaterials.com> An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Jan 7 19:14:57 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 7 Jan 2013 17:14:57 -0700 Subject: [Numpy-discussion] Remove support for numeric and numarray in 1.8 In-Reply-To: <50EB558A.1030306@virtualmaterials.com> References: <50EA4BDC.2060108@virtualmaterials.com> <50EB558A.1030306@virtualmaterials.com> Message-ID: On Mon, Jan 7, 2013 at 4:08 PM, Raul Cota wrote: > Ran a fair bit of our test suite using numpy 1.7 compiling against the > corresponding 'numpy/oldnumeric.h' and everything worked well . > > All I saw was the warning below which is obviously expected: > """ > Warning 23 warning Msg: Using deprecated NumPy API, disable it by > #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION > c:\python27\lib\site-packages\numpy\core\include\numpy\npy_deprecated_api.h > 8 > """ > > Great! Thanks, not many are in a position to check that part of numpy. Looks like we will need to keep the deprecated api around for a while also. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From raul at virtualmaterials.com Mon Jan 7 20:38:54 2013 From: raul at virtualmaterials.com (Raul Cota) Date: Mon, 07 Jan 2013 18:38:54 -0700 Subject: [Numpy-discussion] Remove support for numeric and numarray in 1.8 In-Reply-To: References: <50EA4BDC.2060108@virtualmaterials.com> <50EB558A.1030306@virtualmaterials.com> Message-ID: <50EB78AE.3070001@virtualmaterials.com> An HTML attachment was scrubbed... URL: From ondrej.certik at gmail.com Mon Jan 7 21:09:50 2013 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Mon, 7 Jan 2013 18:09:50 -0800 Subject: [Numpy-discussion] Which Python to use for Mac binaries In-Reply-To: References: Message-ID: On Sun, Jan 6, 2013 at 2:40 PM, Chris Barker - NOAA Federal wrote: > On Sun, Jan 6, 2013 at 2:04 AM, Ralf Gommers wrote: >>> Which exact Python do we need to use on Mac? Do we need to use the >>> binary installer from python.org? >> >> Yes, the one from python.org. >> >>> Or can I install it from source? > > you could install from source using the same method that the > python.org binaries are built -- there is a script with the source to > do that, though I'm not sure what the point of that would be. Is it possible to install the dmg images without root access from the command line? I know how to access the contents: $ hdiutil attach python-2.7.3-macosx10.6.dmg $ ls /Volumes/Python\ 2.7.3/ Build.txt License.txt Python.mpkg ReadMe.txt But I am not currently sure what to do with it. The Python.mpkg directory seems to contain the sources. I have access to Vincent's computer, as suggested by Ralf and it is already setup, so I am using it. But I am not able (so far) to replicate the setup there so that I can create the binaries on any other Mac computer, which makes me feel really uneasy. By replicating the setup, at least once (preferably automated) would make me understand things much better. If possible, I would prefer to just use a command line (ssh) to do all that. (So that's maybe building from source is the only option.) Ondrej From ondrej.certik at gmail.com Mon Jan 7 21:12:58 2013 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Mon, 7 Jan 2013 18:12:58 -0800 Subject: [Numpy-discussion] Which Python to use for Mac binaries In-Reply-To: References: Message-ID: On Sun, Jan 6, 2013 at 2:04 AM, Ralf Gommers wrote: > > > > On Sun, Jan 6, 2013 at 3:21 AM, Ond?ej ?ert?k > wrote: >> >> Hi, >> >> Currently the NumPy binaries are built using the pavement.py script, >> which uses the following Pythons: >> >> MPKG_PYTHON = { >> "2.5": >> ["/Library/Frameworks/Python.framework/Versions/2.5/bin/python"], >> "2.6": >> ["/Library/Frameworks/Python.framework/Versions/2.6/bin/python"], >> "2.7": >> ["/Library/Frameworks/Python.framework/Versions/2.7/bin/python"], >> "3.1": >> ["/Library/Frameworks/Python.framework/Versions/3.1/bin/python3"], >> "3.2": >> ["/Library/Frameworks/Python.framework/Versions/3.2/bin/python3"], >> "3.3": >> ["/Library/Frameworks/Python.framework/Versions/3.3/bin/python3"], >> } >> >> So for example I can easily create the 2.6 binary if that Python is >> pre-installed on the Mac box that I am using. >> On one of the Mac boxes that I am using, the 2.7 is missing, so are >> 3.1, 3.2 and 3.3. So I was thinking >> of updating my Fabric fab file to automatically install all Pythons >> from source and build against that, just like I do for Wine. >> >> Which exact Python do we need to use on Mac? Do we need to use the >> binary installer from python.org? > > > Yes, the one from python.org. > >> >> Or can I install it from source? Finally, for which Python versions >> should we provide binary installers for Mac? >> For reference, the 1.6.2 had installers for 2.5, 2.6 and 2.7 only for >> OS X 10.3. There is only 2.7 version for OS X 10.6. > > > The provided installers and naming scheme should match what's done for > Python itself on python.org. > > The 10.3 installers for 2.5, 2.6 and 2.7 should be compiled on OS X 10.5. > This is kind of hard to come by these days, but Vincent Davis maintains a > build machine for numpy and scipy. That's already set up correctly, so all > you have to do is connect to it via ssh, check out v.17.0 in ~/Code/numpy, > check in release.sh that the section for OS X 10.6 is disabled and for 10.5 > enabled and run it. > > OS X 10.6 broke support for previous versions in some subtle ways, so even > when using the 10.4 SDK numpy compiled on 10.6 won't run on 10.5. As long as > we're supporting 10.5 you therefore need to compile on it. > > The 10.7 --> 10.6 support hasn't been checked, but I wouldn't trust it. I > have a 10.6 machine, so I can compile those binaries if needed. > >> >> Also, what is the meaning of the following piece of code in pavement.py: >> >> def _build_mpkg(pyver): >> # account for differences between Python 2.7.1 versions from >> python.org >> if os.environ.get('MACOSX_DEPLOYMENT_TARGET', None) == "10.6": >> ldflags = "-undefined dynamic_lookup -bundle -arch i386 -arch >> x86_64 -Wl,-search_paths_first" >> else: >> ldflags = "-undefined dynamic_lookup -bundle -arch i386 -arch >> ppc -Wl,-search_paths_first" >> ldflags += " -L%s" % os.path.join(os.path.dirname(__file__), "build") > > > The 10.6 binaries support only Intel Macs, both 32-bit and 64-bit. The 10.3 > binaries support PPC Macs and 32-bit Intel. That's what the above does. Note > that we simply follow the choice made by the Python release managers here. > >> >> if pyver == "2.5": >> sh("CC=gcc-4.0 LDFLAGS='%s' %s setupegg.py bdist_mpkg" % >> (ldflags, " ".join(MPKG_PYTHON[pyver]))) >> else: >> sh("LDFLAGS='%s' %s setupegg.py bdist_mpkg" % (ldflags, " >> ".join(MPKG_PYTHON[pyver]))) > > > This is necessary because in Python 2.5, distutils asks for "gcc" instead of > "gcc-4.0", so you may get the wrong one without CC=gcc-4.0. From Python 2.6 > on this was fixed. > >> >> In particular, the last line gets executed and it then fails with: >> >> paver dmg -p 2.6 >> ---> pavement.dmg >> ---> pavement.clean >> LDFLAGS='-undefined dynamic_lookup -bundle -arch i386 -arch ppc >> -Wl,-search_paths_first -Lbuild' >> /Library/Frameworks/Python.framework/Versions/2.6/bin/python >> setupegg.py bdist_mpkg >> Traceback (most recent call last): >> File "setupegg.py", line 17, in >> from setuptools import setup >> ImportError: No module named setuptools >> >> >> The reason is (I think) that if the Python binary is called explicitly >> with /Library/Frameworks/Python.framework/Versions/2.6/bin/python, >> then the paths are not setup properly in virtualenv, and thus >> setuptools (which is only installed in virtualenv, but not in system >> Python) fails to import. The solution is to simply apply this patch: > > > Avoid using system Python for anything. The first thing to do on any new OS > X system is install Python some other way, preferably from python.org. > >> >> diff --git a/pavement.py b/pavement.py >> index e693016..0c637f8 100644 >> --- a/pavement.py >> +++ b/pavement.py >> @@ -449,7 +449,7 @@ def _build_mpkg(pyver): >> if pyver == "2.5": >> sh("CC=gcc-4.0 LDFLAGS='%s' %s setupegg.py bdist_mpkg" % >> (ldflags, " ".join(MPKG_PYTHON[pyver]))) >> else: >> - sh("LDFLAGS='%s' %s setupegg.py bdist_mpkg" % (ldflags, " >> ".join(MPKG_PYTHON[pyver]))) >> + sh("python setupegg.py bdist_mpkg") > > > This doesn't work unless using virtualenvs, you're just throwing away the > version selection here. If you can support virtualenvs in addition to > python.org pythons, that would be useful. But being able to build binaries > when needed simply by "paver dmg -p 2.x" is quite useful. Absolutely. I was following the release.sh in the numpy git repository, which contains: paver bootstrap source bootstrap/bin/activate python setupsconsegg.py install paver pdf paver dmg -p 2.7 So it is using the virtualenv and it works on Vincent's computer, but it doesn't work on my other computer. I wanted to make the steps somehow reproducible. I started adding the commands needed to setup the Mac (any Mac) into my Fabfile here: https://github.com/certik/numpy-vendor/blob/master/fabfile.py#L98 but I run into the issues above. Of course, I'll try to just use Vincent's computer, but I would feel much better if the numpy release process for Mac didn't depend on one particular computer, but rather could be quite easily reproduced on any Mac OS X of the right version. Ondrej From chris.barker at noaa.gov Mon Jan 7 23:41:14 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Mon, 7 Jan 2013 20:41:14 -0800 Subject: [Numpy-discussion] Which Python to use for Mac binaries In-Reply-To: References: Message-ID: On Mon, Jan 7, 2013 at 6:09 PM, Ond?ej ?ert?k wrote: > Is it possible to install the dmg images without root access from the > command line? I've never tried, but it looks like you can: http://www.commandlinefu.com/commands/view/2031/install-an-mpkg-from-the-command-line-on-osx > But I am not currently sure what to do with it. The Python.mpkg > directory seems to contain the sources. yup -- that's where everything is. the "installer" command should be able to unpack it. > By replicating the setup, at least once (preferably automated) would > make me understand things much better. > If possible, I would prefer to just use a command line (ssh) to do all > that. (So that's maybe building from source > is the only option.) If you ndo need to build from source, see this message for a bit more info: http://mail.python.org/pipermail/pythonmac-sig/2012-October/023742.html there are a few prerequisites you need to install first... Either way, you should be able to build a start-to-finish build script. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From ondrej.certik at gmail.com Tue Jan 8 01:23:08 2013 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Mon, 7 Jan 2013 22:23:08 -0800 Subject: [Numpy-discussion] Which Python to use for Mac binaries In-Reply-To: References: Message-ID: On Mon, Jan 7, 2013 at 8:41 PM, Chris Barker - NOAA Federal wrote: > On Mon, Jan 7, 2013 at 6:09 PM, Ond?ej ?ert?k wrote: >> Is it possible to install the dmg images without root access from the >> command line? > > I've never tried, but it looks like you can: > > http://www.commandlinefu.com/commands/view/2031/install-an-mpkg-from-the-command-line-on-osx This requires root access. Without sudo, I get: $ installer -pkg /Volumes/Python\ 2.7.3/Python.mpkg/ -target ondrej installer: This package requires authentication to install. and since I don't have root access, it doesn't work. So one way around it would be to install python from source, that shouldn't require root access. > > >> But I am not currently sure what to do with it. The Python.mpkg >> directory seems to contain the sources. > > yup -- that's where everything is. the "installer" command should be > able to unpack it. Ok. > >> By replicating the setup, at least once (preferably automated) would >> make me understand things much better. >> If possible, I would prefer to just use a command line (ssh) to do all >> that. (So that's maybe building from source >> is the only option.) > > If you ndo need to build from source, see this message for a bit more info: > > http://mail.python.org/pipermail/pythonmac-sig/2012-October/023742.html > > there are a few prerequisites you need to install first... > > Either way, you should be able to build a start-to-finish build script. Yes, that would be my goal eventually. Without root access. But right now, I am not even sure it's possible. So for now I'll simply use already pre-configured box. Ondrej From ralf.gommers at gmail.com Tue Jan 8 02:36:16 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 8 Jan 2013 08:36:16 +0100 Subject: [Numpy-discussion] Which Python to use for Mac binaries In-Reply-To: References: Message-ID: On Tue, Jan 8, 2013 at 3:12 AM, Ond?ej ?ert?k wrote: > On Sun, Jan 6, 2013 at 2:04 AM, Ralf Gommers > wrote: > > > > > > > > On Sun, Jan 6, 2013 at 3:21 AM, Ond?ej ?ert?k > > wrote: > >> > >> Hi, > >> > >> Currently the NumPy binaries are built using the pavement.py script, > >> which uses the following Pythons: > >> > >> MPKG_PYTHON = { > >> "2.5": > >> ["/Library/Frameworks/Python.framework/Versions/2.5/bin/python"], > >> "2.6": > >> ["/Library/Frameworks/Python.framework/Versions/2.6/bin/python"], > >> "2.7": > >> ["/Library/Frameworks/Python.framework/Versions/2.7/bin/python"], > >> "3.1": > >> ["/Library/Frameworks/Python.framework/Versions/3.1/bin/python3"], > >> "3.2": > >> ["/Library/Frameworks/Python.framework/Versions/3.2/bin/python3"], > >> "3.3": > >> ["/Library/Frameworks/Python.framework/Versions/3.3/bin/python3"], > >> } > >> > >> So for example I can easily create the 2.6 binary if that Python is > >> pre-installed on the Mac box that I am using. > >> On one of the Mac boxes that I am using, the 2.7 is missing, so are > >> 3.1, 3.2 and 3.3. So I was thinking > >> of updating my Fabric fab file to automatically install all Pythons > >> from source and build against that, just like I do for Wine. > >> > >> Which exact Python do we need to use on Mac? Do we need to use the > >> binary installer from python.org? > > > > > > Yes, the one from python.org. > > > >> > >> Or can I install it from source? Finally, for which Python versions > >> should we provide binary installers for Mac? > >> For reference, the 1.6.2 had installers for 2.5, 2.6 and 2.7 only for > >> OS X 10.3. There is only 2.7 version for OS X 10.6. > > > > > > The provided installers and naming scheme should match what's done for > > Python itself on python.org. > > > > The 10.3 installers for 2.5, 2.6 and 2.7 should be compiled on OS X 10.5. > > This is kind of hard to come by these days, but Vincent Davis maintains a > > build machine for numpy and scipy. That's already set up correctly, so > all > > you have to do is connect to it via ssh, check out v.17.0 in > ~/Code/numpy, > > check in release.sh that the section for OS X 10.6 is disabled and for > 10.5 > > enabled and run it. > > > > OS X 10.6 broke support for previous versions in some subtle ways, so > even > > when using the 10.4 SDK numpy compiled on 10.6 won't run on 10.5. As > long as > > we're supporting 10.5 you therefore need to compile on it. > > > > The 10.7 --> 10.6 support hasn't been checked, but I wouldn't trust it. I > > have a 10.6 machine, so I can compile those binaries if needed. > > > >> > >> Also, what is the meaning of the following piece of code in pavement.py: > >> > >> def _build_mpkg(pyver): > >> # account for differences between Python 2.7.1 versions from > >> python.org > >> if os.environ.get('MACOSX_DEPLOYMENT_TARGET', None) == "10.6": > >> ldflags = "-undefined dynamic_lookup -bundle -arch i386 -arch > >> x86_64 -Wl,-search_paths_first" > >> else: > >> ldflags = "-undefined dynamic_lookup -bundle -arch i386 -arch > >> ppc -Wl,-search_paths_first" > >> ldflags += " -L%s" % os.path.join(os.path.dirname(__file__), > "build") > > > > > > The 10.6 binaries support only Intel Macs, both 32-bit and 64-bit. The > 10.3 > > binaries support PPC Macs and 32-bit Intel. That's what the above does. > Note > > that we simply follow the choice made by the Python release managers > here. > > > >> > >> if pyver == "2.5": > >> sh("CC=gcc-4.0 LDFLAGS='%s' %s setupegg.py bdist_mpkg" % > >> (ldflags, " ".join(MPKG_PYTHON[pyver]))) > >> else: > >> sh("LDFLAGS='%s' %s setupegg.py bdist_mpkg" % (ldflags, " > >> ".join(MPKG_PYTHON[pyver]))) > > > > > > This is necessary because in Python 2.5, distutils asks for "gcc" > instead of > > "gcc-4.0", so you may get the wrong one without CC=gcc-4.0. From Python > 2.6 > > on this was fixed. > > > >> > >> In particular, the last line gets executed and it then fails with: > >> > >> paver dmg -p 2.6 > >> ---> pavement.dmg > >> ---> pavement.clean > >> LDFLAGS='-undefined dynamic_lookup -bundle -arch i386 -arch ppc > >> -Wl,-search_paths_first -Lbuild' > >> /Library/Frameworks/Python.framework/Versions/2.6/bin/python > >> setupegg.py bdist_mpkg > >> Traceback (most recent call last): > >> File "setupegg.py", line 17, in > >> from setuptools import setup > >> ImportError: No module named setuptools > >> > >> > >> The reason is (I think) that if the Python binary is called explicitly > >> with /Library/Frameworks/Python.framework/Versions/2.6/bin/python, > >> then the paths are not setup properly in virtualenv, and thus > >> setuptools (which is only installed in virtualenv, but not in system > >> Python) fails to import. The solution is to simply apply this patch: > > > > > > Avoid using system Python for anything. The first thing to do on any new > OS > > X system is install Python some other way, preferably from python.org. > > > >> > >> diff --git a/pavement.py b/pavement.py > >> index e693016..0c637f8 100644 > >> --- a/pavement.py > >> +++ b/pavement.py > >> @@ -449,7 +449,7 @@ def _build_mpkg(pyver): > >> if pyver == "2.5": > >> sh("CC=gcc-4.0 LDFLAGS='%s' %s setupegg.py bdist_mpkg" % > >> (ldflags, " ".join(MPKG_PYTHON[pyver]))) > >> else: > >> - sh("LDFLAGS='%s' %s setupegg.py bdist_mpkg" % (ldflags, " > >> ".join(MPKG_PYTHON[pyver]))) > >> + sh("python setupegg.py bdist_mpkg") > > > > > > This doesn't work unless using virtualenvs, you're just throwing away the > > version selection here. If you can support virtualenvs in addition to > > python.org pythons, that would be useful. But being able to build > binaries > > when needed simply by "paver dmg -p 2.x" is quite useful. > > > Absolutely. I was following the release.sh in the numpy git > repository, which contains: > > paver bootstrap > source bootstrap/bin/activate > python setupsconsegg.py install > paver pdf > paver dmg -p 2.7 > > So it is using the virtualenv and it works on Vincent's computer, but > it doesn't work on my > other computer. > Note that it's only using a virtualenv for this one step (building the docs). This is because building the docs requires installing numpy first to be able to extract the docstrings. > I wanted to make the steps somehow reproducible. I started adding the > commands needed to setup the Mac (any Mac) > into my Fabfile here: > > https://github.com/certik/numpy-vendor/blob/master/fabfile.py#L98 > > but I run into the issues above. > > Of course, I'll try to just use Vincent's computer, but I would feel > much better if the numpy release process for Mac didn't depend on one > particular computer, but rather could be quite easily reproduced on > any Mac OS X of the right version. > It doesn't depend on that one computer of course, it takes only a few minutes to set up a new Mac. But yes, currently it does require admin rights to install a framework Python. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From robince at gmail.com Tue Jan 8 09:06:17 2013 From: robince at gmail.com (Robin) Date: Tue, 8 Jan 2013 14:06:17 +0000 Subject: [Numpy-discussion] Embedded NumPy LAPACK errors In-Reply-To: References: <50E73ECC.8050803@eml.cc> <3FF2E38B-6A93-4AC6-B28B-CD1C50784AD5@gmail.com> Message-ID: On Sat, Jan 5, 2013 at 1:03 PM, Robin wrote: >>> If not, is there a reasonable way to build numpy.linalg such that >>> it interfaces with MKL correctly ? I managed to get this to work in the end. Since Matlab uses MKL with ILP64 interface it is not possible to get Numpy to use that without modifications to all the lapack calls. However, I was able to keep the two different versions of lapack seperate. The first step is to build numpy to link statically against MKL. I wasn't sure how to get distutils to do this so I copied all the mkl static .a libaries to a temporary directory and pointed numpy to that to force the issue (so dynamic linking wasn't an option). Even with that it still uses the Lapack from the Matlab dynamic global symbols. The trick was adding the linker flag "-Bsymbolic" which means lapack_lite calls to lapack use the statically linked local copies. With these changes everything appears to work. There are two test failures (below) which do not appear when running the same Numpy build outside of Matlab but they don't seem so severe. So: [robini at robini2-pc numpy]$ cat site.cfg [mkl] search_static_first = true library_dirs = /tmp/intel64 include_dirs = /opt/intel/mkl/include #mkl_libs = mkl_sequential, mkl_intel_lp64, mkl_core, mkl_lapack95_lp64, mkl_blas95_lp64 mkl_libs = mkl_lapack95, mkl_blas95, mkl_intel_lp64, mkl_sequential, mkl_core, svml, imf, irc lapack_libs = [robini at robini2-pc numpy]$ ls /tmp/intel64/ libimf.a libmkl_gnu_thread.a libirc.a libmkl_intel_ilp64.a libmkl_blacs_ilp64.a libmkl_intel_lp64.a libmkl_blacs_intelmpi_ilp64.a libmkl_intel_sp2dp.a libmkl_blacs_intelmpi_lp64.a libmkl_intel_thread.a libmkl_blacs_lp64.a libmkl_lapack95_ilp64.a libmkl_blacs_openmpi_ilp64.a libmkl_lapack95_lp64.a libmkl_blacs_openmpi_lp64.a libmkl_pgi_thread.a libmkl_blacs_sgimpt_ilp64.a libmkl_scalapack_ilp64.a libmkl_blacs_sgimpt_lp64.a libmkl_scalapack_lp64.a libmkl_blas95_ilp64.a libmkl_sequential.a libmkl_blas95_lp64.a libmkl_solver_ilp64.a libmkl_cdft_core.a libmkl_solver_ilp64_sequential.a libmkl_core.a libmkl_solver_lp64.a libmkl_gf_ilp64.a libmkl_solver_lp64_sequential.a libmkl_gf_lp64.a libsvml.a in numpy/distutils/intelccompiler.py: class IntelEM64TCCompiler(UnixCCompiler): """ A modified Intel x86_64 compiler compatible with a 64bit gcc built Python. """ compiler_type = 'intelem' cc_exe = 'icc -m64 -fPIC' cc_args = "-fPIC" def __init__ (self, verbose=0, dry_run=0, force=0): UnixCCompiler.__init__ (self, verbose,dry_run, force) self.cc_exe = 'icc -m64 -fPIC -O3 -fomit-frame-pointer' compiler = self.cc_exe self.set_executables(compiler=compiler, compiler_so=compiler, compiler_cxx=compiler, linker_exe=compiler, linker_so=compiler + ' -shared -static-intel -Bsymbolic') Test failures (test_special_values also fails outside Matlab, but the other 2 only occur when embedded): ====================================================================== FAIL: test_umath.test_nextafterl ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/epd-7.3/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/home/robini/slash/lib/python2.7/site-packages/numpy/testing/decorators.py", line 215, in knownfailer return f(*args, **kwargs) File "/home/robini/slash/lib/python2.7/site-packages/numpy/core/tests/test_umath.py", line 1123, in test_nextafterl return _test_nextafter(np.longdouble) File "/home/robini/slash/lib/python2.7/site-packages/numpy/core/tests/test_umath.py", line 1108, in _test_nextafter assert np.nextafter(one, two) - one == eps AssertionError ====================================================================== FAIL: test_umath.test_spacingl ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/epd-7.3/lib/python2.7/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/home/robini/slash/lib/python2.7/site-packages/numpy/testing/decorators.py", line 215, in knownfailer return f(*args, **kwargs) File "/home/robini/slash/lib/python2.7/site-packages/numpy/core/tests/test_umath.py", line 1149, in test_spacingl return _test_spacing(np.longdouble) File "/home/robini/slash/lib/python2.7/site-packages/numpy/core/tests/test_umath.py", line 1132, in _test_spacing assert np.spacing(one) == eps AssertionError ====================================================================== FAIL: test_special_values (test_umath_complex.TestClog) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/robini/slash/lib/python2.7/site-packages/numpy/testing/decorators.py", line 146, in skipper_func return f(*args, **kwargs) File "/home/robini/slash/lib/python2.7/site-packages/numpy/core/tests/test_umath_complex.py", line 299, in test_special_values assert_almost_equal(np.log(np.conj(xa[i])), np.conj(np.log(xa[i]))) File "/home/robini/slash/lib/python2.7/site-packages/numpy/testing/utils.py", line 448, in assert_almost_equal raise AssertionError(msg) AssertionError: Arrays are not almost equal to 7 decimals ACTUAL: array([-inf+3.14159265j]) DESIRED: array([-inf-3.14159265j]) ---------------------------------------------------------------------- Ran 3571 tests in 10.897s FAILED (KNOWNFAIL=5, SKIP=1, failures=3) Cheers Robin From cournape at gmail.com Tue Jan 8 10:20:51 2013 From: cournape at gmail.com (David Cournapeau) Date: Tue, 8 Jan 2013 09:20:51 -0600 Subject: [Numpy-discussion] Which Python to use for Mac binaries In-Reply-To: References: Message-ID: On Mon, Jan 7, 2013 at 7:31 AM, Matthew Brett wrote: > Hi, > > On Sun, Jan 6, 2013 at 10:40 PM, Chris Barker - NOAA Federal > wrote: >> On Sun, Jan 6, 2013 at 2:04 AM, Ralf Gommers wrote: >>>> Which exact Python do we need to use on Mac? Do we need to use the >>>> binary installer from python.org? >>> >>> Yes, the one from python.org. >>> >>>> Or can I install it from source? >> >> you could install from source using the same method that the >> python.org binaries are built -- there is a script with the source to >> do that, though I'm not sure what the point of that would be. >> >>> The 10.3 installers for 2.5, 2.6 and 2.7 should be compiled on OS X 10.5. >> >> It would be great to continue support for that, though I wonder how >> many people still need it -- I don't think Apple supports 10.5 >> anymore, for instance. >> >>> The 10.7 --> 10.6 support hasn't been checked, but I wouldn't trust it. I >>> have a 10.6 machine, so I can compile those binaries if needed. >> >> That would be better, but it would also be nice to check how building >> on 10.7 works. >> >>> Avoid using system Python for anything. The first thing to do on any new OS >>> X system is install Python some other way, preferably from python.org. >> >> +1 >> >>> Last note: bdist_mpkg is unmaintained and doesn't support Python 3.x. Most >>> recent version is at: https://github.com/matthew-brett/bdist_mpkg, for >>> previous versions numpy releases I've used that at commit e81a58a471 >> >> There has been recent discussion on the pythonmac list about this -- >> some waffling about how important it is -- though I think it would be >> good to keep it up to date. > > I updated my fork of bdist_mpkg with Python 3k support. It doesn't > have any tests that I could see, but I've run it on python 2.6 and 3.2 > and 3.3 on one of my packages as a first pass. > >>> If we want 3.x binaries, then we should fix that or (preferably) build >>> binaries with Bento. Bento has grown support for mpkg's; I'm not sure how >>> robust that is. >> >> So maybe bento is a better route than bdist_mpkg -- this is worth >> discussion on teh pythonmac list. > > David - can you give a status update on that? It is more a starting point than anything else, and barely tested. I would advise against using it ATM. thanks, David From ondrej.certik at gmail.com Tue Jan 8 11:24:10 2013 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Tue, 8 Jan 2013 08:24:10 -0800 Subject: [Numpy-discussion] Which Python to use for Mac binaries In-Reply-To: References: Message-ID: On Mon, Jan 7, 2013 at 11:36 PM, Ralf Gommers wrote: > > > > On Tue, Jan 8, 2013 at 3:12 AM, Ond?ej ?ert?k > wrote: >> >> On Sun, Jan 6, 2013 at 2:04 AM, Ralf Gommers >> wrote: >> > >> > >> > >> > On Sun, Jan 6, 2013 at 3:21 AM, Ond?ej ?ert?k >> > wrote: >> >> >> >> Hi, >> >> >> >> Currently the NumPy binaries are built using the pavement.py script, >> >> which uses the following Pythons: >> >> >> >> MPKG_PYTHON = { >> >> "2.5": >> >> ["/Library/Frameworks/Python.framework/Versions/2.5/bin/python"], >> >> "2.6": >> >> ["/Library/Frameworks/Python.framework/Versions/2.6/bin/python"], >> >> "2.7": >> >> ["/Library/Frameworks/Python.framework/Versions/2.7/bin/python"], >> >> "3.1": >> >> ["/Library/Frameworks/Python.framework/Versions/3.1/bin/python3"], >> >> "3.2": >> >> ["/Library/Frameworks/Python.framework/Versions/3.2/bin/python3"], >> >> "3.3": >> >> ["/Library/Frameworks/Python.framework/Versions/3.3/bin/python3"], >> >> } >> >> >> >> So for example I can easily create the 2.6 binary if that Python is >> >> pre-installed on the Mac box that I am using. >> >> On one of the Mac boxes that I am using, the 2.7 is missing, so are >> >> 3.1, 3.2 and 3.3. So I was thinking >> >> of updating my Fabric fab file to automatically install all Pythons >> >> from source and build against that, just like I do for Wine. >> >> >> >> Which exact Python do we need to use on Mac? Do we need to use the >> >> binary installer from python.org? >> > >> > >> > Yes, the one from python.org. >> > >> >> >> >> Or can I install it from source? Finally, for which Python versions >> >> should we provide binary installers for Mac? >> >> For reference, the 1.6.2 had installers for 2.5, 2.6 and 2.7 only for >> >> OS X 10.3. There is only 2.7 version for OS X 10.6. >> > >> > >> > The provided installers and naming scheme should match what's done for >> > Python itself on python.org. >> > >> > The 10.3 installers for 2.5, 2.6 and 2.7 should be compiled on OS X >> > 10.5. >> > This is kind of hard to come by these days, but Vincent Davis maintains >> > a >> > build machine for numpy and scipy. That's already set up correctly, so >> > all >> > you have to do is connect to it via ssh, check out v.17.0 in >> > ~/Code/numpy, >> > check in release.sh that the section for OS X 10.6 is disabled and for >> > 10.5 >> > enabled and run it. >> > >> > OS X 10.6 broke support for previous versions in some subtle ways, so >> > even >> > when using the 10.4 SDK numpy compiled on 10.6 won't run on 10.5. As >> > long as >> > we're supporting 10.5 you therefore need to compile on it. >> > >> > The 10.7 --> 10.6 support hasn't been checked, but I wouldn't trust it. >> > I >> > have a 10.6 machine, so I can compile those binaries if needed. >> > >> >> >> >> Also, what is the meaning of the following piece of code in >> >> pavement.py: >> >> >> >> def _build_mpkg(pyver): >> >> # account for differences between Python 2.7.1 versions from >> >> python.org >> >> if os.environ.get('MACOSX_DEPLOYMENT_TARGET', None) == "10.6": >> >> ldflags = "-undefined dynamic_lookup -bundle -arch i386 -arch >> >> x86_64 -Wl,-search_paths_first" >> >> else: >> >> ldflags = "-undefined dynamic_lookup -bundle -arch i386 -arch >> >> ppc -Wl,-search_paths_first" >> >> ldflags += " -L%s" % os.path.join(os.path.dirname(__file__), >> >> "build") >> > >> > >> > The 10.6 binaries support only Intel Macs, both 32-bit and 64-bit. The >> > 10.3 >> > binaries support PPC Macs and 32-bit Intel. That's what the above does. >> > Note >> > that we simply follow the choice made by the Python release managers >> > here. >> > >> >> >> >> if pyver == "2.5": >> >> sh("CC=gcc-4.0 LDFLAGS='%s' %s setupegg.py bdist_mpkg" % >> >> (ldflags, " ".join(MPKG_PYTHON[pyver]))) >> >> else: >> >> sh("LDFLAGS='%s' %s setupegg.py bdist_mpkg" % (ldflags, " >> >> ".join(MPKG_PYTHON[pyver]))) >> > >> > >> > This is necessary because in Python 2.5, distutils asks for "gcc" >> > instead of >> > "gcc-4.0", so you may get the wrong one without CC=gcc-4.0. From Python >> > 2.6 >> > on this was fixed. >> > >> >> >> >> In particular, the last line gets executed and it then fails with: >> >> >> >> paver dmg -p 2.6 >> >> ---> pavement.dmg >> >> ---> pavement.clean >> >> LDFLAGS='-undefined dynamic_lookup -bundle -arch i386 -arch ppc >> >> -Wl,-search_paths_first -Lbuild' >> >> /Library/Frameworks/Python.framework/Versions/2.6/bin/python >> >> setupegg.py bdist_mpkg >> >> Traceback (most recent call last): >> >> File "setupegg.py", line 17, in >> >> from setuptools import setup >> >> ImportError: No module named setuptools >> >> >> >> >> >> The reason is (I think) that if the Python binary is called explicitly >> >> with /Library/Frameworks/Python.framework/Versions/2.6/bin/python, >> >> then the paths are not setup properly in virtualenv, and thus >> >> setuptools (which is only installed in virtualenv, but not in system >> >> Python) fails to import. The solution is to simply apply this patch: >> > >> > >> > Avoid using system Python for anything. The first thing to do on any new >> > OS >> > X system is install Python some other way, preferably from python.org. >> > >> >> >> >> diff --git a/pavement.py b/pavement.py >> >> index e693016..0c637f8 100644 >> >> --- a/pavement.py >> >> +++ b/pavement.py >> >> @@ -449,7 +449,7 @@ def _build_mpkg(pyver): >> >> if pyver == "2.5": >> >> sh("CC=gcc-4.0 LDFLAGS='%s' %s setupegg.py bdist_mpkg" % >> >> (ldflags, " ".join(MPKG_PYTHON[pyver]))) >> >> else: >> >> - sh("LDFLAGS='%s' %s setupegg.py bdist_mpkg" % (ldflags, " >> >> ".join(MPKG_PYTHON[pyver]))) >> >> + sh("python setupegg.py bdist_mpkg") >> > >> > >> > This doesn't work unless using virtualenvs, you're just throwing away >> > the >> > version selection here. If you can support virtualenvs in addition to >> > python.org pythons, that would be useful. But being able to build >> > binaries >> > when needed simply by "paver dmg -p 2.x" is quite useful. >> >> >> Absolutely. I was following the release.sh in the numpy git >> repository, which contains: >> >> paver bootstrap >> source bootstrap/bin/activate >> python setupsconsegg.py install >> paver pdf >> paver dmg -p 2.7 >> >> So it is using the virtualenv and it works on Vincent's computer, but >> it doesn't work on my >> other computer. > > > Note that it's only using a virtualenv for this one step (building the > docs). This is because building the docs requires installing numpy first to > be able to extract the docstrings. Ah, I missed this important part. Since I generate the pdf files in linux, I can just copy them on Mac and thus don't need any of the virtualenv part. > >> >> I wanted to make the steps somehow reproducible. I started adding the >> commands needed to setup the Mac (any Mac) >> into my Fabfile here: >> >> https://github.com/certik/numpy-vendor/blob/master/fabfile.py#L98 >> >> but I run into the issues above. >> >> Of course, I'll try to just use Vincent's computer, but I would feel >> much better if the numpy release process for Mac didn't depend on one >> particular computer, but rather could be quite easily reproduced on >> any Mac OS X of the right version. > > > It doesn't depend on that one computer of course, it takes only a few > minutes to set up a new Mac. But yes, currently it does require admin rights > to install a framework Python. Ok, that's what I wanted to know. Thanks, Ondrej From chris.barker at noaa.gov Tue Jan 8 11:45:25 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Tue, 8 Jan 2013 08:45:25 -0800 Subject: [Numpy-discussion] Which Python to use for Mac binaries In-Reply-To: References: Message-ID: On Mon, Jan 7, 2013 at 10:23 PM, Ond?ej ?ert?k wrote: >> http://www.commandlinefu.com/commands/view/2031/install-an-mpkg-from-the-command-line-on-osx > > This requires root access. Without sudo, I get: > > $ installer -pkg /Volumes/Python\ 2.7.3/Python.mpkg/ -target ondrej > installer: This package requires authentication to install. > > and since I don't have root access, it doesn't work. > > So one way around it would be to install python from source, that > shouldn't require root access. hmm -- this all may be a trick -- both the *.mpkg and the standard build put everything in /Library/Frameworks/Python -- which is where it belongs. Bu tif you need root access to write there, then there is a problem. I'm sure a non-root build could put everything in the users' home directory, then packages built against that would have their paths messed up. What's odd is that I'm pretty sure I've been able to point+click install those without sudo...(I could recall incorrectly). This would be a good question for the pythonmac list -- low traffic, but there are some very smart and helpful folks there: http://mail.python.org/mailman/listinfo/pythonmac-sig >>> But I am not currently sure what to do with it. The Python.mpkg >>> directory seems to contain the sources. It should be possible to unpack a mpkg by hand, but it contains both the contents, and various instal scripts, so that seems like a really ugly solution. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From matthew.brett at gmail.com Tue Jan 8 11:59:17 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 8 Jan 2013 16:59:17 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi, On Mon, Jan 7, 2013 at 10:58 PM, Andrew Collette wrote: > Hi, > >> Taking 2) first, in this example: >> >>> return self.f[dataset_name][...] + heightmap >> >> assuming it is not going to upcast, would you rather it overflow than >> raise an error? Why? The second seems more explicit and sensible to >> me. > > Yes, I think this (the 1.5 overflow behavior) was a bit odd, if easy > to understand. > >> For 1) - of course the upcasting in 1.6 is only going to work some of >> the time. For example: >> >> In [2]: np.array([127], dtype=np.int8) * 1000 >> Out[2]: array([-4072], dtype=int16) >> >> So - you'll get something, but there's a reasonable chance you won't >> get what you were expecting. Of course that is true for 1.5 as well, >> but at least the rule there is simpler and so easier - in my opinion - >> to think about. > > Part of what my first example was trying to demonstrate was that the > function author assumed arrays and scalars obeyed the same rules for > addition. > > For example, if data were int8 and heightmap were an int16 array with > a max value of 32767, and the data had a max value in the same spot > with e.g. 10, then the addition would overflow at that position, even > with the int16 result. That's how array addition works in numpy, and > as I understand it that's not slated to change. > > But when we have a scalar of value 32767 (which fits in int16 but not > int8), we are proposing instead to do nothing under the assumption > that it's an error. > > In summary: yes, there are some odd results, but they're consistent > with the rules for addition elsewhere in numpy, and I would prefer > that to treating this case as an error. I think you are voting strongly for the current casting rules, because they make it less obvious to the user that scalars are different from arrays. Returning to the question of 1.5 behavior vs the error - I think you are saying you prefer the 1.5 silent-but-deadly approach to the error, but I think I still haven't grasped why. Maybe someone else can explain it? The holiday has not been good to my brain. Best, Matthew From andrew.collette at gmail.com Tue Jan 8 12:20:53 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Tue, 8 Jan 2013 10:20:53 -0700 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi, > I think you are voting strongly for the current casting rules, because > they make it less obvious to the user that scalars are different from > arrays. Maybe this is the source of my confusion... why should scalars be different from arrays? They should follow the same rules, as closely as possible. If a scalar value would fit in an int16, why not add it using the rules for an int16 array? > Returning to the question of 1.5 behavior vs the error - I think you > are saying you prefer the 1.5 silent-but-deadly approach to the error, > but I think I still haven't grasped why. Maybe someone else can > explain it? The holiday has not been good to my brain. In a strict choice between 1.5-behavior and errors, I'm not sure which one I would pick. I don't think either is particularly useful. Of course, other members of the community would likely have a different view, especially those who got used to the 1.5 behavior. Andrew From mail.till at gmx.de Tue Jan 8 13:17:39 2013 From: mail.till at gmx.de (Till Stensitz) Date: Tue, 8 Jan 2013 18:17:39 +0000 (UTC) Subject: [Numpy-discussion] Linear least squares Message-ID: Hi, i did some profiling and testing of my data-fitting code. One of its core parts is doing some linear least squares, until now i used np.linalg.lstsq. Most of time the size a is (250, 7) and of b is (250, 800). Today i compared it to using pinv manually, to my surprise, it is much faster. I taught, both are svd based? Too check another computer i also run my test on wakari: https://www.wakari.io/nb/tillsten/linear_least_squares Also using scipy.linalg instead of np.linalg is slower for both cases. My numpy and scipy are both from C. Gohlkes website. If my result is valid in general, maybe the lstsq function should be changed or a hint should be added to the documentation. greetings Till From shish at keba.be Tue Jan 8 13:48:42 2013 From: shish at keba.be (Olivier Delalleau) Date: Tue, 8 Jan 2013 13:48:42 -0500 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: 2013/1/8 Andrew Collette : > Hi, > >> I think you are voting strongly for the current casting rules, because >> they make it less obvious to the user that scalars are different from >> arrays. > > Maybe this is the source of my confusion... why should scalars be > different from arrays? They should follow the same rules, as closely > as possible. If a scalar value would fit in an int16, why not add it > using the rules for an int16 array? As I mentioned in another post, I also agree that it would make things simpler and safer to just yield the same result as if we were using a one-element array. My understanding of the motivation for the rule "scalars do not upcast arrays unless they are of a fundamentally different type" is that it avoids accidentally upcasting arrays in operations like "x + 1" (for instance if x is a float32 array, the upcast would yield a float64 result, and if x is an int16, it would yield int64), which may waste memory. I find it a useful feature, however I'm not sure it's worth the headaches it can lead to. However, my first reaction at the idea of dropping this rule altogether is that it would lead to a long and painful deprecation process. I may be wrong though, I really haven't thought about it much. -=- Olivier From alan.isaac at gmail.com Tue Jan 8 14:28:51 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 08 Jan 2013 14:28:51 -0500 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: Message-ID: <50EC7373.7010407@gmail.com> On 1/8/2013 1:48 PM, Olivier Delalleau wrote: > As I mentioned in another post, I also agree that it would make things > simpler and safer to just yield the same result as if we were using a > one-element array. Yes! Anything else is going to drive people insane, especially new users. Alan Isaac From njs at pobox.com Tue Jan 8 14:59:16 2013 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 8 Jan 2013 19:59:16 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: On 8 Jan 2013 17:24, "Andrew Collette" wrote: > > Hi, > > > I think you are voting strongly for the current casting rules, because > > they make it less obvious to the user that scalars are different from > > arrays. > > Maybe this is the source of my confusion... why should scalars be > different from arrays? They should follow the same rules, as closely > as possible. If a scalar value would fit in an int16, why not add it > using the rules for an int16 array? The problem is that rule for arrays - and for every other party of numpy in general - are that we *don't* pick types based on values. Numpy always uses input types to determine output types, not input values. # This value fits in an int8 In [5]: a = np.array([1]) # And yet... In [6]: a.dtype Out[6]: dtype('int64') In [7]: small = np.array([1], dtype=np.int8) # Computing 1 + 1 doesn't need a large integer... but we use one In [8]: (small + a).dtype Out[8]: dtype('int64') Python scalars have an unambiguous types: a Python 'int' is a C 'long', and a Python 'float' is a C 'double'. And these are the types that np.array() converts them to. So it's pretty unambiguous that "using the same rules for arrays and scalars" would mean, ignore the value of the scalar, and in expressions like np.array([1], dtype=np.int8) + 1 we should always upcast to int32/int64. The problem is that this makes working with narrow types very awkward for no real benefit, so everyone pretty much seems to want *some* kind of special case. These are both absolutely special cases: numarray through 1.5: in a binary operation, if one operand has ndim==0 and the other has ndim>0, ignore the width of the ndim==0 operand. 1.6, your proposal: in a binary operation, if one operand has ndim==0 and the other has ndim>0, downcast the ndim==0 item to the smallest width that is consistent with its value and the other operand's type. -n From njs at pobox.com Tue Jan 8 15:04:09 2013 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 8 Jan 2013 20:04:09 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: <50EC7373.7010407@gmail.com> References: <50EC7373.7010407@gmail.com> Message-ID: On Tue, Jan 8, 2013 at 7:28 PM, Alan G Isaac wrote: > On 1/8/2013 1:48 PM, Olivier Delalleau wrote: >> As I mentioned in another post, I also agree that it would make things >> simpler and safer to just yield the same result as if we were using a >> one-element array. > > Yes! > Anything else is going to drive people insane, > especially new users. New users don't use narrow-width dtypes... it's important to remember in this discussion that in numpy, non-standard dtypes only arise when users explicitly request them, so there's some expressed intention there that we want to try and respect. (As opposed to the type associated with Python manifest constants like the "2" in "2 * a", which probably no programmer looked at and thought "hmm, what I want here is 2-as-an-int64".) -n From alan.isaac at gmail.com Tue Jan 8 15:43:52 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 08 Jan 2013 15:43:52 -0500 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50EC7373.7010407@gmail.com> Message-ID: <50EC8508.6090307@gmail.com> On 1/8/2013 3:04 PM, Nathaniel Smith wrote: > New users don't use narrow-width dtypes... it's important to remember > in this discussion that in numpy, non-standard dtypes only arise when > users explicitly request them, so there's some expressed intention > there that we want to try and respect. 1. I think the first statement is wrong. Control over dtypes is a good reason for a new use to consider NumPy. 2. You cannot treat the intention as separate from the rules. Users want to play by the rules. Because NumPy supports broadcasting, it is natural for array-array operations and scalar-array operations to be consistent. I believe anything else will be too confusing. I do not recall an example yet that clearly demonstrates a case where a single user would want two different behaviors for a scalar operation and an analogous broadcasting operation. Was there one? Alan Isaac From andrew.collette at gmail.com Tue Jan 8 16:14:12 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Tue, 8 Jan 2013 14:14:12 -0700 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi Nathaniel, (Responding to both your emails) > The problem is that rule for arrays - and for every other party of > numpy in general - are that we *don't* pick types based on values. > Numpy always uses input types to determine output types, not input > values. Yes, of course... array operations are governed exclusively by their dtypes. It seems to me that, using the language of the bug report (2878), if we have this: result = arr + scalar I would argue that our job is, rather than to pick result.dtype, to pick scalar.dtype, and apply the normal rules for array operations. > So it's pretty unambiguous that > "using the same rules for arrays and scalars" would mean, ignore the > value of the scalar, and in expressions like > np.array([1], dtype=np.int8) + 1 > we should always upcast to int32/int64. Ah, but that's my point: we already, in 1.6, ignore the intrinsic width of the scalar and effectively substitute one based on it's value: >>> a = np.array([1], dtype=int8) >>> (a + 1).dtype dtype('int8') >>> (a + 1000).dtype dtype('int16') >>> (a + 90000).dtype dtype('int32') >>> (a + 2**40).dtype dtype('int64') > 1.6, your proposal: in a binary operation, if one operand has ndim==0 > and the other has ndim>0, downcast the ndim==0 item to the smallest > width that is consistent with its value and the other operand's type. Yes, exactly. I'm not trying to propose a completely new behavior: as I mentioned (although very far upthread), this is the mental model I had of how things worked in 1.6 already. > New users don't use narrow-width dtypes... it's important to remember > in this discussion that in numpy, non-standard dtypes only arise when > users explicitly request them, so there's some expressed intention > there that we want to try and respect. I would respectfully disagree. One example I cited was that when dealing with HDF5, it's very common to get int16's (and even int8's) when reading from a file because they are used to save disk space. All a new user has to do to get int8's from a file they got from someone else is: >>> data = some_hdf5_file['MyDataset'][...] This is a general issue applying to data which is read from real-world external sources. For example, digitizers routinely represent their samples as int8's or int16's, and you apply a scale and offset to get a reading in volts. As you say, the proposed change will prevent accidental upcasting by people who selected int8/int16 on purpose to save memory, by notifying them with a ValueError. But another assumption we could make is that people who choose to use narrow types for performance reasons should be expected to use caution when performing operations that might upcast, and that the default behavior should be to follow the normal array rules as closely as possible, as is done in 1.6. Andrew From sebastian at sipsolutions.net Tue Jan 8 16:15:38 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 08 Jan 2013 22:15:38 +0100 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: <1357679738.2754.3.camel@sebastian-laptop> On Tue, 2013-01-08 at 19:59 +0000, Nathaniel Smith wrote: > On 8 Jan 2013 17:24, "Andrew Collette" wrote: > > > > Hi, > > > > > I think you are voting strongly for the current casting rules, because > > > they make it less obvious to the user that scalars are different from > > > arrays. > > > > Maybe this is the source of my confusion... why should scalars be > > different from arrays? They should follow the same rules, as closely > > as possible. If a scalar value would fit in an int16, why not add it > > using the rules for an int16 array? > > The problem is that rule for arrays - and for every other party of > numpy in general - are that we *don't* pick types based on values. > Numpy always uses input types to determine output types, not input > values. > > # This value fits in an int8 > In [5]: a = np.array([1]) > > # And yet... > In [6]: a.dtype > Out[6]: dtype('int64') > > In [7]: small = np.array([1], dtype=np.int8) > > # Computing 1 + 1 doesn't need a large integer... but we use one > In [8]: (small + a).dtype > Out[8]: dtype('int64') > > Python scalars have an unambiguous types: a Python 'int' is a C > 'long', and a Python 'float' is a C 'double'. And these are the types > that np.array() converts them to. So it's pretty unambiguous that > "using the same rules for arrays and scalars" would mean, ignore the > value of the scalar, and in expressions like > np.array([1], dtype=np.int8) + 1 > we should always upcast to int32/int64. The problem is that this makes > working with narrow types very awkward for no real benefit, so > everyone pretty much seems to want *some* kind of special case. These > are both absolutely special cases: > > numarray through 1.5: in a binary operation, if one operand has > ndim==0 and the other has ndim>0, ignore the width of the ndim==0 > operand. > > 1.6, your proposal: in a binary operation, if one operand has ndim==0 > and the other has ndim>0, downcast the ndim==0 item to the smallest > width that is consistent with its value and the other operand's type. > Well, that leaves the maybe not quite implausible proposal of saying that numpy scalars behave like arrays with ndim>0, but python scalars behave like they do in 1.6. to allow for easier working with narrow types. Sebastian > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From chris.barker at noaa.gov Tue Jan 8 16:17:58 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Tue, 8 Jan 2013 13:17:58 -0800 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: <50EC8508.6090307@gmail.com> References: <50EC7373.7010407@gmail.com> <50EC8508.6090307@gmail.com> Message-ID: On Tue, Jan 8, 2013 at 12:43 PM, Alan G Isaac wrote: >> New users don't use narrow-width dtypes... it's important to remember > 1. I think the first statement is wrong. > Control over dtypes is a good reason for > a new use to consider NumPy. Absolutely. > Because NumPy supports broadcasting, > it is natural for array-array operations and > scalar-array operations to be consistent. > I believe anything else will be too confusing. Theoretically true -- but in practice, the problem arrises because it is easy to write literals with the standard python scalars, so one is very likely to want to do: arr = np.zeros((m,n), dtype=np.uint8) arr += 3 and not want an upcast. I don't think we want to require that to be spelled: arr += np.array(3, dtype=np.uint8) so that defines desired behaviour for array<->scalar. but what should this do? arr1 = np.zeros((m,n), dtype=np.uint8) arr2 = np.zeros((m,n), dtype=np.uint16) arr1 + arr2 or arr2 + arr1 upcast in both cases? use the type of the left operand? raise an exception? matching the array<-> scalar approach would mean always keeping the smallest type, which is unlikely to be what is wanted. Having it be dependent on order would be really ripe fro confusion. raising an exception might have been the best idea from the beginning. (though I wouldn't want that in the array<-> scalar case). So perhaps having a scalar array distinction, while quite impure, is the best compromise. NOTE: no matter how you slice it, at some point reducing operations produce something different (that can no longer be reduced), so I do think it would be nice for rank-zero arrays and scalars to be the same thing (in this regard and others) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From shish at keba.be Tue Jan 8 16:24:52 2013 From: shish at keba.be (Olivier Delalleau) Date: Tue, 8 Jan 2013 16:24:52 -0500 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: <1357679738.2754.3.camel@sebastian-laptop> References: <50E61E29.1020709@astro.uio.no> <1357679738.2754.3.camel@sebastian-laptop> Message-ID: 2013/1/8 Sebastian Berg : > On Tue, 2013-01-08 at 19:59 +0000, Nathaniel Smith wrote: >> On 8 Jan 2013 17:24, "Andrew Collette" wrote: >> > >> > Hi, >> > >> > > I think you are voting strongly for the current casting rules, because >> > > they make it less obvious to the user that scalars are different from >> > > arrays. >> > >> > Maybe this is the source of my confusion... why should scalars be >> > different from arrays? They should follow the same rules, as closely >> > as possible. If a scalar value would fit in an int16, why not add it >> > using the rules for an int16 array? >> >> The problem is that rule for arrays - and for every other party of >> numpy in general - are that we *don't* pick types based on values. >> Numpy always uses input types to determine output types, not input >> values. >> >> # This value fits in an int8 >> In [5]: a = np.array([1]) >> >> # And yet... >> In [6]: a.dtype >> Out[6]: dtype('int64') >> >> In [7]: small = np.array([1], dtype=np.int8) >> >> # Computing 1 + 1 doesn't need a large integer... but we use one >> In [8]: (small + a).dtype >> Out[8]: dtype('int64') >> >> Python scalars have an unambiguous types: a Python 'int' is a C >> 'long', and a Python 'float' is a C 'double'. And these are the types >> that np.array() converts them to. So it's pretty unambiguous that >> "using the same rules for arrays and scalars" would mean, ignore the >> value of the scalar, and in expressions like >> np.array([1], dtype=np.int8) + 1 >> we should always upcast to int32/int64. The problem is that this makes >> working with narrow types very awkward for no real benefit, so >> everyone pretty much seems to want *some* kind of special case. These >> are both absolutely special cases: >> >> numarray through 1.5: in a binary operation, if one operand has >> ndim==0 and the other has ndim>0, ignore the width of the ndim==0 >> operand. >> >> 1.6, your proposal: in a binary operation, if one operand has ndim==0 >> and the other has ndim>0, downcast the ndim==0 item to the smallest >> width that is consistent with its value and the other operand's type. >> > > Well, that leaves the maybe not quite implausible proposal of saying > that numpy scalars behave like arrays with ndim>0, but python scalars > behave like they do in 1.6. to allow for easier working with narrow > types. I know I already said it, but I really think it'd be a bad idea to have a different behavior between Python scalars and Numpy scalars, because I think most people would expect them to behave the same (when knowing what dtype is a Python float / int). It could lead to very tricky bugs to handle them differently. -=- Olivier From shish at keba.be Tue Jan 8 16:29:45 2013 From: shish at keba.be (Olivier Delalleau) Date: Tue, 8 Jan 2013 16:29:45 -0500 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50EC7373.7010407@gmail.com> <50EC8508.6090307@gmail.com> Message-ID: 2013/1/8 Chris Barker - NOAA Federal : > On Tue, Jan 8, 2013 at 12:43 PM, Alan G Isaac wrote: >>> New users don't use narrow-width dtypes... it's important to remember > >> 1. I think the first statement is wrong. >> Control over dtypes is a good reason for >> a new use to consider NumPy. > > Absolutely. > >> Because NumPy supports broadcasting, >> it is natural for array-array operations and >> scalar-array operations to be consistent. >> I believe anything else will be too confusing. > > Theoretically true -- but in practice, the problem arrises because it > is easy to write literals with the standard python scalars, so one is > very likely to want to do: > > arr = np.zeros((m,n), dtype=np.uint8) > arr += 3 > > and not want an upcast. Note that the behavior with in-place operations is also an interesting topic, but slightly different, since there is no ambiguity on the dtype of the output (which is required to match that of the input). I was actually thinking about this earlier today but decided not to mention it yet to avoid making the discussion even more complex ;) The key question is whether the operand should be cast before the operation, or whether to perform the operation in an upcasted array, then downcast it back into the original version. I actually thnk the latter makes more sense (and that's actually what's being done I think in 1.6.1 from a few tests I tried), and to me this is an argument in favor of the upcast behavior for non-inplace operations. -=- Olivier From d.s.seljebotn at astro.uio.no Tue Jan 8 16:32:42 2013 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 08 Jan 2013 22:32:42 +0100 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: Message-ID: <50EC907A.8040003@astro.uio.no> On 01/08/2013 06:20 PM, Andrew Collette wrote: > Hi, > >> I think you are voting strongly for the current casting rules, because >> they make it less obvious to the user that scalars are different from >> arrays. > > Maybe this is the source of my confusion... why should scalars be > different from arrays? They should follow the same rules, as closely Scalars (as in, Python float/int) are inherently different because the user didn't specify a dtype. For an array, there was always *some* point where the user chose, explicitly or implicitly, a dtype. > as possible. If a scalar value would fit in an int16, why not add it > using the rules for an int16 array? So you are saying that, for an array x, you want x + random.randint(100000) to produce an array with a random dtype? So that after carefully testing that your code works, suddenly a different draw (or user input, or whatever) causes a different set of dtypes to ripple through your entire program? To me this is something that must be avoided at all costs. It's hard enough to reason about the code one writes without throwing in complete randomness (by which I mean, types determined by values). Dag Sverre > >> Returning to the question of 1.5 behavior vs the error - I think you >> are saying you prefer the 1.5 silent-but-deadly approach to the error, >> but I think I still haven't grasped why. Maybe someone else can >> explain it? The holiday has not been good to my brain. > > In a strict choice between 1.5-behavior and errors, I'm not sure which > one I would pick. I don't think either is particularly useful. Of > course, other members of the community would likely have a different > view, especially those who got used to the 1.5 behavior. > > Andrew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From d.s.seljebotn at astro.uio.no Tue Jan 8 16:37:51 2013 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 08 Jan 2013 22:37:51 +0100 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: <50EC907A.8040003@astro.uio.no> References: <50EC907A.8040003@astro.uio.no> Message-ID: <50EC91AF.8030908@astro.uio.no> On 01/08/2013 10:32 PM, Dag Sverre Seljebotn wrote: > On 01/08/2013 06:20 PM, Andrew Collette wrote: >> Hi, >> >>> I think you are voting strongly for the current casting rules, because >>> they make it less obvious to the user that scalars are different from >>> arrays. >> >> Maybe this is the source of my confusion... why should scalars be >> different from arrays? They should follow the same rules, as closely > > Scalars (as in, Python float/int) are inherently different because the > user didn't specify a dtype. > > For an array, there was always *some* point where the user chose, > explicitly or implicitly, a dtype. > >> as possible. If a scalar value would fit in an int16, why not add it >> using the rules for an int16 array? > > So you are saying that, for an array x, you want > > x + random.randint(100000) > > to produce an array with a random dtype? > > So that after carefully testing that your code works, suddenly a > different draw (or user input, or whatever) causes a different set of > dtypes to ripple through your entire program? > > To me this is something that must be avoided at all costs. It's hard > enough to reason about the code one writes without throwing in complete > randomness (by which I mean, types determined by values). Oh, sorry, given that this is indeed the present behaviour, this just sounds silly. I should have said it's something I dislike about the present behaviour then. Dag Sverre > > Dag Sverre > > > > >> >>> Returning to the question of 1.5 behavior vs the error - I think you >>> are saying you prefer the 1.5 silent-but-deadly approach to the error, >>> but I think I still haven't grasped why. Maybe someone else can >>> explain it? The holiday has not been good to my brain. >> >> In a strict choice between 1.5-behavior and errors, I'm not sure which >> one I would pick. I don't think either is particularly useful. Of >> course, other members of the community would likely have a different >> view, especially those who got used to the 1.5 behavior. >> >> Andrew >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From andrew.collette at gmail.com Tue Jan 8 17:30:33 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Tue, 8 Jan 2013 15:30:33 -0700 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: <50EC907A.8040003@astro.uio.no> References: <50EC907A.8040003@astro.uio.no> Message-ID: Hi Dag, > So you are saying that, for an array x, you want > > x + random.randint(100000) > > to produce an array with a random dtype? Under the proposed behavior, depending on the dtype of x and the value from random, this would sometimes add-with-rollover and sometimes raise ValueError. Under the 1.5 behavior, it would always add-with-rollover and preserve the type of x. Under the 1.6 behavior, it produces a range of dtypes, each of which is at least large enough to hold the random int. Personally, I prefer the third option, but I strongly prefer either the second or the third to the first. Andrew From josef.pktd at gmail.com Tue Jan 8 17:35:12 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 8 Jan 2013 17:35:12 -0500 Subject: [Numpy-discussion] Linear least squares In-Reply-To: References: Message-ID: On Tue, Jan 8, 2013 at 1:17 PM, Till Stensitz wrote: > Hi, > i did some profiling and testing of my data-fitting code. > One of its core parts is doing some linear least squares, > until now i used np.linalg.lstsq. Most of time the size > a is (250, 7) and of b is (250, 800). My guess is that this depends a lot on the shape try a is (10000, 7) and b is (10000, 1) Josef > > Today i compared it to using pinv manually, > to my surprise, it is much faster. I taught, > both are svd based? Too check another computer > i also run my test on wakari: > > https://www.wakari.io/nb/tillsten/linear_least_squares > > Also using scipy.linalg instead of np.linalg is > slower for both cases. My numpy and scipy > are both from C. Gohlkes website. If my result > is valid in general, maybe the lstsq function > should be changed or a hint should be added > to the documentation. > > greetings > Till > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From shish at keba.be Tue Jan 8 17:41:34 2013 From: shish at keba.be (Olivier Delalleau) Date: Tue, 8 Jan 2013 17:41:34 -0500 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50EC907A.8040003@astro.uio.no> Message-ID: Le mardi 8 janvier 2013, Andrew Collette a ?crit : > Hi Dag, > > > So you are saying that, for an array x, you want > > > > x + random.randint(100000) > > > > to produce an array with a random dtype? > > Under the proposed behavior, depending on the dtype of x and the value > from random, this would sometimes add-with-rollover and sometimes > raise ValueError. > > Under the 1.5 behavior, it would always add-with-rollover and preserve > the type of x. > > Under the 1.6 behavior, it produces a range of dtypes, each of which > is at least large enough to hold the random int. > > Personally, I prefer the third option, but I strongly prefer either > the second or the third to the first. > > Andrew > Keep in mind that in the third option (current 1.6 behavior) the dtype is large enough to hold the random number, but not necessarily to hold the result. So for instance if x is an int16 array with only positive values, the result of this addition may contain negative values (or not, depending on the number being drawn). That's the part I feel is flawed with this behavior, it is quite unpredictable. -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew.collette at gmail.com Tue Jan 8 17:51:28 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Tue, 8 Jan 2013 15:51:28 -0700 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50EC907A.8040003@astro.uio.no> Message-ID: Hi, > Keep in mind that in the third option (current 1.6 behavior) the dtype is > large enough to hold the random number, but not necessarily to hold the > result. So for instance if x is an int16 array with only positive values, > the result of this addition may contain negative values (or not, depending > on the number being drawn). That's the part I feel is flawed with this > behavior, it is quite unpredictable. Yes, certainly. But in either the proposed or 1.5 behavior, if the values in x are close to the limits of the type, this can happen also. Andrew From shish at keba.be Tue Jan 8 18:36:03 2013 From: shish at keba.be (Olivier Delalleau) Date: Tue, 8 Jan 2013 18:36:03 -0500 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50EC907A.8040003@astro.uio.no> Message-ID: Le mardi 8 janvier 2013, Andrew Collette a ?crit : > Hi, > > > Keep in mind that in the third option (current 1.6 behavior) the dtype is > > large enough to hold the random number, but not necessarily to hold the > > result. So for instance if x is an int16 array with only positive values, > > the result of this addition may contain negative values (or not, > depending > > on the number being drawn). That's the part I feel is flawed with this > > behavior, it is quite unpredictable. > > Yes, certainly. But in either the proposed or 1.5 behavior, if the > values in x are close to the limits of the type, this can happen also. > My previous email may not have been clear enough, so to be sure: in my above example, if the random number is 30000, then the result may contain negative values (int16). If the random number is 50000, then the result will only contain positive values (upcast to int32). Do you believe it is a good behavior? -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Jan 8 18:47:06 2013 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 8 Jan 2013 23:47:06 +0000 Subject: [Numpy-discussion] Linear least squares In-Reply-To: References: Message-ID: On Tue, Jan 8, 2013 at 6:17 PM, Till Stensitz wrote: > Hi, > i did some profiling and testing of my data-fitting code. > One of its core parts is doing some linear least squares, > until now i used np.linalg.lstsq. Most of time the size > a is (250, 7) and of b is (250, 800). > > Today i compared it to using pinv manually, > to my surprise, it is much faster. I taught, > both are svd based? np.linalg.lstsq is written in Python (calling LAPACK for the SVD), so you could run the line_profiler over it and see where the slowdown is. An obvious thing is that it always computes residuals, which could be costly; if your pinv code isn't doing that then it's not really comparable. (Though might still be well-suited for your actual problem.) Depending on how well-conditioned your problems are, and how much speed you need, there are faster ways than pinv as well. (Going via qr might or might not, going via cholesky almost certainly will be.) -n From andrew.collette at gmail.com Tue Jan 8 19:35:42 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Tue, 8 Jan 2013 17:35:42 -0700 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50EC907A.8040003@astro.uio.no> Message-ID: Hi Olivier, >> Yes, certainly. But in either the proposed or 1.5 behavior, if the >> values in x are close to the limits of the type, this can happen also. > > > My previous email may not have been clear enough, so to be sure: in my above > example, if the random number is 30000, then the result may contain negative > values (int16). If the random number is 50000, then the result will only > contain positive values (upcast to int32). Do you believe it is a good > behavior? Under the proposed behavior, if the random number is 30000, then you *still* may have negative values, and if it's 50000, you get ValueError. No, I don't think the behavior you outlined is particularly nice, but (1) it's consistent with array addition elsewhere, at least in my mind, and (2) I don't think that sometimes getting a ValueError is a big improvement. Although I still prefer automatic upcasting, this discussion is really making me see the value of a nice, simple rule like in 1.5. :) Andrew From charlesr.harris at gmail.com Wed Jan 9 00:11:30 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 8 Jan 2013 22:11:30 -0700 Subject: [Numpy-discussion] Linear least squares In-Reply-To: References: Message-ID: On Tue, Jan 8, 2013 at 11:17 AM, Till Stensitz wrote: > Hi, > i did some profiling and testing of my data-fitting code. > One of its core parts is doing some linear least squares, > until now i used np.linalg.lstsq. Most of time the size > a is (250, 7) and of b is (250, 800). > > Today i compared it to using pinv manually, > to my surprise, it is much faster. I taught, > both are svd based? Too check another computer > i also run my test on wakari: > > https://www.wakari.io/nb/tillsten/linear_least_squares > > Also using scipy.linalg instead of np.linalg is > slower for both cases. My numpy and scipy > are both from C. Gohlkes website. If my result > is valid in general, maybe the lstsq function > should be changed or a hint should be added > to the documentation. > > Do you know if both are using Atlas (MKL)? Numpy will compile a default unoptimized version if there is no Atlas (or MKL). Also, lstsq is a direct call to an LAPACK least squares function, so the underlying functions themselves are probably different for lstsq and pinv. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From brenbarn at brenbarn.net Wed Jan 9 02:23:05 2013 From: brenbarn at brenbarn.net (OKB (not okblacke)) Date: Wed, 9 Jan 2013 07:23:05 +0000 (UTC) Subject: [Numpy-discussion] Bug with ufuncs made with frompyfunc Message-ID: A bug causing errors with using methods of ufuncs created with frompyfunc was mentioned on the list over a year ago: http://mail.scipy.org/pipermail/numpy-discussion/2011- September/058501.html Is there any word on the status of this bug? I wasn't able to find a ticket in the bug tracker. From mail.till at gmx.de Wed Jan 9 03:29:20 2013 From: mail.till at gmx.de (Till Stensitz) Date: Wed, 9 Jan 2013 08:29:20 +0000 (UTC) Subject: [Numpy-discussion] Linear least squares References: Message-ID: Nathaniel Smith pobox.com> writes: > > An obvious thing is that it always computes residuals, which could be > costly; if your pinv code isn't doing that then it's not really > comparable. (Though might still be well-suited for your actual > problem.) > > Depending on how well-conditioned your problems are, and how much > speed you need, there are faster ways than pinv as well. (Going via qr > might or might not, going via cholesky almost certainly will be.) > > -n > You are right. With calculating the residuals, the speedup goes down to a factor of 2. I had to calculate the residuals anyways because lstsq only returns the squared sum of the residuals, while i need every residual (as an input to optimize.leastsq). Josef is also right, it is shape depended. For his example, lstsq is faster. Maybe it is possible to make lstsq to choose its method automatically? Or some keyword to set the method and making other decompositions available. From mike.r.anderson.13 at gmail.com Wed Jan 9 05:35:29 2013 From: mike.r.anderson.13 at gmail.com (Mike Anderson) Date: Wed, 9 Jan 2013 18:35:29 +0800 Subject: [Numpy-discussion] Insights / lessons learned from NumPy design In-Reply-To: References: Message-ID: On 8 January 2013 02:08, Chris Barker - NOAA Federal wrote: > On Thu, Jan 3, 2013 at 10:29 PM, Mike Anderson > wrote: > > In the Clojure community there has been some discussion about creating a > > common matrix maths library / API. Currently there are a few different > > fledgeling matrix libraries in Clojure, so it seemed like a worthwhile > > effort to unify them and have a common base on which to build on. > > > > NumPy has been something of an inspiration for this, so I though I'd ask > > here to see what lessons have been learned. > > A few thoughts: > > > We're thinking of a matrix library > > First -- is this a "matrix" library, or a general use nd-array > library? That will drive your design a great deal. For my part, I came > from MATLAB, which started our very focused on matrixes, then extended > to be more generally useful. Personally, I found the matrix-focus to > get in the way more than help -- in any "real" code, you're the actual > matrix operations are likely to be a tiny fraction of the code. > > One reason I like numpy is that it is array-first, with secondary > support for matrix stuff. > > That being said, there is the numpy matrix type, and there are those > that find it very useful. particularly in teaching situations, though > it feels a bit "tacked-on", and that does get in the way, so if you > want a "real" matrix object, but also a general purpose array lib, > thinking about both up front will be helpful. > This is very useful context - thanks! I've had opinions in favour of both an nd-array style library and a matrix library. I guess it depends on your use case which one you are more inclined to think in. I'm hoping that it should be possible for the same API to support both, i.e. you should be able to use a 2D array of numbers as a matrix, and vice-versa. > > > - Support for multi-dimensional matrices (but with fast paths for 1D > vectors > > and 2D matrices as the common cases) > > what is a multi-dimensional matrix? -- is a 3-d something, a stack of > matrixes? or something else? (note, numpy lacks this kind of object, > but it is sometimes asked for -- i.e a way to do fast matrix > multiplication with a lot of small matrixes) > > I think fast paths for 1-D and 2-D is secondary, though you may want > "easy paths" for those. IN particular, if you want good support for > linear algebra (matrixes), then having a clean and natural "row vector > and "column vector" would be nice. See the archives of this list for > a bunch of discussion about that -- and what the weaknesses are of the > numpy matrix object. > > > - Immutability by default, i.e. matrix operations are pure functions that > > create new matrices. > > I'd be careful about this -- the purity and predictability is nice, > but these days a lot of time is spend allocating and moving memory > around -- numpy array's mutability is a major key feature -- indeed, > the key issues with performance with numpy surrond the fact that many > copies may be made unnecessarily (note, Dag's suggesting of lazy > evaluation may mitigate this to some extent). > Interesting and very useful to know. Sounds like we should definitely allow for mutable arrays / zero-copy operations in that case if that is proving to be a big bottleneck. > > > - Support for 64-bit double precision floats only (this is the standard > > float type in Clojure) > > not a bad start, but another major strength of numpy is the multiple > data types - you may wantt to design that concept in from the start. > Sounds like good advice and that should be possible to accomodate in the design. But I'm curious: what is the main use case for the alternative data types in NumPy? Is it for columns of data of heterogeneous types? or something else? > > > - Ability to support multiple different back-end matrix implementations > > (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.) > > This ties in to another major strength of numpy -- ndarrays are both > powerful python objects, and wrappers around standard C arrays -- that > makes it pretty darn easy to interface with external libraries for > core computation. Great - good to know we are on the right track with this one. Thanks Chris for all your comments / suggestions! -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike.r.anderson.13 at gmail.com Wed Jan 9 05:49:06 2013 From: mike.r.anderson.13 at gmail.com (Mike Anderson) Date: Wed, 9 Jan 2013 18:49:06 +0800 Subject: [Numpy-discussion] Insights / lessons learned from NumPy design In-Reply-To: <50E68C2A.9060400@astro.uio.no> References: <50E68C2A.9060400@astro.uio.no> Message-ID: On 4 January 2013 16:00, Dag Sverre Seljebotn wrote: > On 01/04/2013 07:29 AM, Mike Anderson wrote: > > Hello all, > > > > In the Clojure community there has been some discussion about creating a > > common matrix maths library / API. Currently there are a few different > > fledgeling matrix libraries in Clojure, so it seemed like a worthwhile > > effort to unify them and have a common base on which to build on. > > > > NumPy has been something of an inspiration for this, so I though I'd ask > > here to see what lessons have been learned. > > > > We're thinking of a matrix library with roughly the following design > > (subject to change!) > > - Support for multi-dimensional matrices (but with fast paths for 1D > > vectors and 2D matrices as the common cases) > > Food for thought: Myself I have vectors that are naturally stored in 2D, > "matrices" that can be naturally stored in 4D and so on (you can't view > them that way when doing linear algebra, it's just that the indices can > have multiple components) -- I like that NumPy calls everything "array"; > I think vector and matrix are higher-level mathematical concepts. > Very interesting. Can I ask what the application is? And is it equivalent from a mathematical perspective to flattening the 2D vectors into very long 1D vectors? > > > - Immutability by default, i.e. matrix operations are pure functions > > that create new matrices. There could be a "backdoor" option to mutate > > matrices, but that would be unidiomatic in Clojure > > Sounds very promising (assuming you can reuse the buffer if the input > matrix had no other references and is not used again?). It's very common > for NumPy arrays to fill a large chunk of the available memory (think > 20-100 GB), so for those users this would need to be coupled with buffer > reuse and good diagnostics that help remove references to old > generations of a matrix. > Yes it should be possible to re-use buffers, though to some extent that would depend on the underlying matrix library implementation. The JVM makes things a bit interesting here - the GC is extremely good but it doesn't play particularly nicely with non-Java native code. 20-100GB is pretty ambitious and I guess reflects the maturity of NumPy - I'd be happy with good handling of 100MB matrices right now..... > > > - Support for 64-bit double precision floats only (this is the standard > > float type in Clojure) > > - Ability to support multiple different back-end matrix implementations > > (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.) > > - A full range of matrix operations. Operations would be delegated to > > back end implementations where they are supported, otherwise generic > > implementations could be used. > > > > Any thoughts on this topic based on the NumPy experience? In particular > > would be very interesting to know: > > - Features in NumPy which proved to be redundant / not worth the effort > > - Features that you wish had been designed in at the start > > - Design decisions that turned out to be a particularly big mistake / > > success > > > > Would love to hear your insights, any ideas+advice greatly appreciated! > > Travis Oliphant noted some of his thoughts on this in the recent thread > "DARPA funding for Blaze and passing the NumPy torch" which is a must-read. > Great link. Thanks for this and all your other comments! -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike.r.anderson.13 at gmail.com Wed Jan 9 05:57:27 2013 From: mike.r.anderson.13 at gmail.com (Mike Anderson) Date: Wed, 9 Jan 2013 18:57:27 +0800 Subject: [Numpy-discussion] Insights / lessons learned from NumPy design In-Reply-To: <50E68F12.90804@astro.uio.no> References: <50E68C2A.9060400@astro.uio.no> <50E68F12.90804@astro.uio.no> Message-ID: On 4 January 2013 16:13, Dag Sverre Seljebotn wrote: > On 01/04/2013 09:00 AM, Dag Sverre Seljebotn wrote: > > On 01/04/2013 07:29 AM, Mike Anderson wrote: > > > Oh: Depending on your amibitions, it's worth thinking hard about i) > storage format, and ii) lazy evaluation. > > Storage format: The new trend is for more flexible formats than just > column-major/row-major, e.g., storing cache-sized n-dimensional tiles. > I'm hoping the API will be independent of storage format - i.e. the underlying implementations can store the data any way they like. So the API will be written in terms of abstractions, and the user will have the choice of whatever concrete implementation best fits the specific needs. Sparse matrices, tiled matrices etc. should all be possible options. Has this kind of approach been used much with NumPy? > > Lazy evaluation: The big problem with numpy is that "a + b + np.sqrt(c)" > will first make a temporary result for "a + b", rather than doing the > whole expression on the fly, which is *very* bad for performance. > > So if you want immutability, I urge you to consider every operation to > build up an expression tree/"program", and then either find out the > smart points where you interpret that program automatically, or make > explicit eval() of an expression tree the default mode. > Very interesting. Seems like this could be layered on top though? i.e. have a separate DSL for building up the expression tree, then compile this down to the optimal set of underlying operations? > > Of course this depends all on how ambitious you are. > A little ambitious, though mostly I'll be glad to get something working that people find useful :-) Thanks again for your comments Dag! -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Wed Jan 9 07:30:24 2013 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 09 Jan 2013 13:30:24 +0100 Subject: [Numpy-discussion] Insights / lessons learned from NumPy design In-Reply-To: References: <50E68C2A.9060400@astro.uio.no> Message-ID: <50ED62E0.1010502@astro.uio.no> On 01/09/2013 11:49 AM, Mike Anderson wrote: > On 4 January 2013 16:00, Dag Sverre Seljebotn > > wrote: > > On 01/04/2013 07:29 AM, Mike Anderson wrote: > > Hello all, > > > > In the Clojure community there has been some discussion about > creating a > > common matrix maths library / API. Currently there are a few > different > > fledgeling matrix libraries in Clojure, so it seemed like a > worthwhile > > effort to unify them and have a common base on which to build on. > > > > NumPy has been something of an inspiration for this, so I though > I'd ask > > here to see what lessons have been learned. > > > > We're thinking of a matrix library with roughly the following design > > (subject to change!) > > - Support for multi-dimensional matrices (but with fast paths for 1D > > vectors and 2D matrices as the common cases) > > Food for thought: Myself I have vectors that are naturally stored in 2D, > "matrices" that can be naturally stored in 4D and so on (you can't view > them that way when doing linear algebra, it's just that the indices can > have multiple components) -- I like that NumPy calls everything "array"; > I think vector and matrix are higher-level mathematical concepts. > > > Very interesting. Can I ask what the application is? And is it > equivalent from a mathematical perspective to flattening the 2D vectors > into very long 1D vectors? For instance, if you are solving an equation for one value per grid point on a 2D or 3D grid. In PDE problems this occurs all the time, though normally the flattening is treated explicitly before one gets to solving the equation, and when not a reshape operation like you say is usually OK (but the very concept for flattening/reshaping is something that's inherent to arrays, not matrices). Chris also mentioned the case where you have lots of small matrices (say, A[i,j,k] is element (i,j) in matrix k), and you want to multiply all matrices by the same vector, or all matrices by different vectors, and so on. > > - Immutability by default, i.e. matrix operations are pure functions > > that create new matrices. There could be a "backdoor" option to > mutate > > matrices, but that would be unidiomatic in Clojure > > Sounds very promising (assuming you can reuse the buffer if the input > matrix had no other references and is not used again?). It's very common > for NumPy arrays to fill a large chunk of the available memory (think > 20-100 GB), so for those users this would need to be coupled with buffer > reuse and good diagnostics that help remove references to old > generations of a matrix. > > > Yes it should be possible to re-use buffers, though to some extent that > would depend on the underlying matrix library implementation. The JVM > makes things a bit interesting here - the GC is extremely good but it > doesn't play particularly nicely with non-Java native code. My hunch is that you rely on the GC I think you'll get nowhere (though if you're happy to treat 100 MB matrices then that may not matter so much). > 20-100GB is pretty ambitious and I guess reflects the maturity of NumPy > - I'd be happy with good handling of 100MB matrices right now..... Still, if you copy 100 MB every time you assign to a single element, performance won't be stellar to say the least. I don't know Clojure but I'm thinking that an immutable design would be something like b = a but with 1.0 in position (0, 3) c = b + (3.2 in position (3, 4) however you want to express that syntax-wise. Pasting in your other post: On 01/09/2013 11:57 AM, Mike Anderson wrote:> On 4 January 2013 16:13, > I'm hoping the API will be independent of storage format - i.e. the > underlying implementations can store the data any way they like. So the > API will be written in terms of abstractions, and the user will have the > choice of whatever concrete implementation best fits the specific needs. > Sparse matrices, tiled matrices etc. should all be possible options. > > Has this kind of approach been used much with NumPy? No, NumPy only supports strided arrays. SciPy has sparse matrices using a different API (which is a pain point). > Lazy evaluation: The big problem with numpy is that "a + b + np.sqrt(c)" > will first make a temporary result for "a + b", rather than doing the > whole expression on the fly, which is *very* bad for performance. > > So if you want immutability, I urge you to consider every operation to > build up an expression tree/"program", and then either find out the > smart points where you interpret that program automatically, or make > explicit eval() of an expression tree the default mode. > > > Very interesting. Seems like this could be layered on top though? i.e. > have a separate DSL for building up the expression tree, then compile > this down to the optimal set of underlying operations? That's what Theano/Numexpr does on NumPy. But it does mean that users have to deal with 2-3 different APIs and ways of doing things rather than one. If you start out with only the lazy API then users only have to use 1 API everywhere, instead of 2-3. If you want immutability, I don't think you can get around laziness, because it will allow you to have a "journal" rather than copying the 100 MB array all the time. I.e., my example would become b = a but with 1.0 in position (0, 3) c = b + (3.2 in position (3, 4) print c[0,0] # looks up memory in a print c[3,4] # hits a "journal" of dirty values that's not yet committed # to linear memory d = eval(c) # copy the 100MB Combined with a tiled storage scheme so that the last step only needs to copy a few dirty blocks, immutable arrays may be within reach. > Of course this depends all on how ambitious you are. > > > A little ambitious, though mostly I'll be glad to get something working > that people find useful :-) I'd advise you to either go for something really simple and clean (which would almost certainly involve directly mutable arrays), or something very powerful (probably with only the abstraction of immutable arrays, with multiple back-end strategies for how to deal with that, but certainly buffering up single-element-updates). The latter is more of a full research project. Having both immutable and mutable matrices in the same API doesn't sound ideal to me at least. Dag Sverre From davidmenhur at gmail.com Wed Jan 9 07:59:57 2013 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Wed, 9 Jan 2013 13:59:57 +0100 Subject: [Numpy-discussion] Insights / lessons learned from NumPy design In-Reply-To: References: Message-ID: On Jan 9, 2013 11:35 AM, "Mike Anderson" wrote: > But I'm curious: what is the main use case for the alternative data types in NumPy? Is it for columns of data of heterogeneous types? or something else? In my case, I have used 32 bit (or lower) arrays due to memory limitations and some significant speedups in certain situations. This was particularly useful when I was preprocessing numerous arrays to especially Boolean data, saved a lot of hd space and I/O. I have used 128 bits when precision was critical, as I was dealing with very small differences. It is also nice to be able to repeat your computation with different precision in order to spot possible numerical instabilities, even if the performance is not great.l David. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Jan 9 09:29:25 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 9 Jan 2013 14:29:25 +0000 Subject: [Numpy-discussion] Bug with ufuncs made with frompyfunc In-Reply-To: References: Message-ID: On Wed, Jan 9, 2013 at 7:23 AM, OKB (not okblacke) wrote: > A bug causing errors with using methods of ufuncs created with > frompyfunc was mentioned on the list over a year ago: > http://mail.scipy.org/pipermail/numpy-discussion/2011- > September/058501.html > > Is there any word on the status of this bug? I wasn't able to find > a ticket in the bug tracker. That thread says that it had already been fixed in the development version of numpy, so it should be fixed in the upcoming 1.7. If you want to be sure then you try it on the 1.7 release candidate. -n From heng at cantab.net Wed Jan 9 09:30:22 2013 From: heng at cantab.net (Henry Gomersall) Date: Wed, 09 Jan 2013 14:30:22 +0000 Subject: [Numpy-discussion] natural alignment Message-ID: <1357741822.3475.23.camel@farnsworth> Further to my previous emails about getting SIMD aligned arrays, I've noticed that numpy arrays aren't always naturally aligned either. For example, numpy.float96 arrays are not always aligned on 12-byte boundaries under 32-bit linux/gcc. Indeed, .alignment on the array always seems to return 4 (with 64-bit, .alignment returns 4, 8, and 16 for float32, float64 and longdouble respectively). Can I assume _anything_ in general about the alignment of a numpy array? (I mean, based on what all implementations of the underlying malloc etc will return). Should I rely on what is returned from .alignment? cheers, Henry From alan.isaac at gmail.com Wed Jan 9 09:53:14 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Wed, 09 Jan 2013 09:53:14 -0500 Subject: [Numpy-discussion] Insights / lessons learned from NumPy design In-Reply-To: <50ED62E0.1010502@astro.uio.no> References: <50E68C2A.9060400@astro.uio.no> <50ED62E0.1010502@astro.uio.no> Message-ID: <50ED845A.7080402@gmail.com> I'm just a Python+NumPy user and not a CS type. May I ask a naive question on this thread? Given the work that has (as I understand it) gone into making NumPy usable as a C library, why is the discussion not going in a direction like the following: What changes to the NumPy code base would be required for it to provide useful ndarray functionality in a C extension to Clojure? Is this simply incompatible with the goal that Clojure compile to JVM byte code? Thanks, Alan Isaac From njs at pobox.com Wed Jan 9 09:58:24 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 9 Jan 2013 14:58:24 +0000 Subject: [Numpy-discussion] Insights / lessons learned from NumPy design In-Reply-To: <50ED845A.7080402@gmail.com> References: <50E68C2A.9060400@astro.uio.no> <50ED62E0.1010502@astro.uio.no> <50ED845A.7080402@gmail.com> Message-ID: On Wed, Jan 9, 2013 at 2:53 PM, Alan G Isaac wrote: > I'm just a Python+NumPy user and not a CS type. > May I ask a naive question on this thread? > > Given the work that has (as I understand it) gone into > making NumPy usable as a C library, why is the discussion not > going in a direction like the following: > What changes to the NumPy code base would be required for it > to provide useful ndarray functionality in a C extension > to Clojure? Is this simply incompatible with the goal that > Clojure compile to JVM byte code? IIUC that work was done on a fork of numpy which has since been abandoned by its authors, so... yeah, numpy itself doesn't have much to offer in this area right now. It could in principle with a bunch of refactoring (ideally not on a fork, since we saw how well that went), but I don't think most happy current numpy users are wishing they could switch to writing Lisp on the JVM or vice-versa, so I don't think it's surprising that no-one's jumped up to do this work. -n From njs at pobox.com Wed Jan 9 10:09:21 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 9 Jan 2013 15:09:21 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: On Tue, Jan 8, 2013 at 9:14 PM, Andrew Collette wrote: > Hi Nathaniel, > > (Responding to both your emails) > >> The problem is that rule for arrays - and for every other party of >> numpy in general - are that we *don't* pick types based on values. >> Numpy always uses input types to determine output types, not input >> values. > > Yes, of course... array operations are governed exclusively by their > dtypes. It seems to me that, using the language of the bug report > (2878), if we have this: > > result = arr + scalar > > I would argue that our job is, rather than to pick result.dtype, to > pick scalar.dtype, and apply the normal rules for array operations. Okay, but we already have unambiguous rules for picking scalar.dtype: you use whatever width the underlying type has, so it'd always be np.int_ or np.float64. Those are the normal rules for picking dtypes. I'm just trying to make clear that what you're arguing for is also a very special case, which also violates the rules numpy uses everywhere else. That doesn't mean we should rule it out ("Special cases aren't special enough to break the rules. / Although practicality beats purity."), but claiming that it is just "the normal rules" while everything else is a "special case" is rhetorically unhelpful. >> So it's pretty unambiguous that >> "using the same rules for arrays and scalars" would mean, ignore the >> value of the scalar, and in expressions like >> np.array([1], dtype=np.int8) + 1 >> we should always upcast to int32/int64. > > Ah, but that's my point: we already, in 1.6, ignore the intrinsic > width of the scalar and effectively substitute one based on it's > value: > >>>> a = np.array([1], dtype=int8) >>>> (a + 1).dtype > dtype('int8') >>>> (a + 1000).dtype > dtype('int16') >>>> (a + 90000).dtype > dtype('int32') >>>> (a + 2**40).dtype > dtype('int64') Sure. But the only reason this is in 1.6 is that the person who made the change never mentioned it to anyone else, so it wasn't noticed until after 1.6 came out. If it had gone through proper review/mailing list discussion (like we're doing now) then it's very unlikely it would have gone in in its present form. >> 1.6, your proposal: in a binary operation, if one operand has ndim==0 >> and the other has ndim>0, downcast the ndim==0 item to the smallest >> width that is consistent with its value and the other operand's type. > > Yes, exactly. I'm not trying to propose a completely new behavior: as > I mentioned (although very far upthread), this is the mental model I > had of how things worked in 1.6 already. > >> New users don't use narrow-width dtypes... it's important to remember >> in this discussion that in numpy, non-standard dtypes only arise when >> users explicitly request them, so there's some expressed intention >> there that we want to try and respect. > > I would respectfully disagree. One example I cited was that when > dealing with HDF5, it's very common to get int16's (and even int8's) > when reading from a file because they are used to save disk space. > All a new user has to do to get int8's from a file they got from > someone else is: > >>>> data = some_hdf5_file['MyDataset'][...] > > This is a general issue applying to data which is read from real-world > external sources. For example, digitizers routinely represent their > samples as int8's or int16's, and you apply a scale and offset to get > a reading in volts. This particular case is actually handled fine by 1.5, because int array + float scalar *does* upcast to float. It's width that's ignored (int8 versus int32), not the basic "kind" of data (int versus float). But overall this does sound like a problem -- but it's not a problem with the scalar/array rules, it's a problem with working with narrow width data in general. There's a good argument to be made that data files should be stored in compressed form, but read in in full-width form, exactly to avoid the problems that arise when trying to manipulate narrow-width representations. Suppose your scale and offset *were* integers, so that the "kind" casting rules didn't get invoked. Even if this were the case, then the rules you're arguing for would not actually solve your problem at all. It'd be very easy to have, say, scale=100, offset=100, both of which fit fine in an int8... but actually performing the scaling/offseting in an int8 would still be a terrible idea! The problem you're talking about is picking the correct width for an *operation*, and futzing about with picking the dtypes of *one input* to that operation is not going to help; it's like trying to ensure your house won't fall down by making sure the doors are really sturdy. -n From alan.isaac at gmail.com Wed Jan 9 10:09:46 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Wed, 09 Jan 2013 10:09:46 -0500 Subject: [Numpy-discussion] Insights / lessons learned from NumPy design In-Reply-To: References: <50E68C2A.9060400@astro.uio.no> <50ED62E0.1010502@astro.uio.no> <50ED845A.7080402@gmail.com> Message-ID: <50ED883A.6060605@gmail.com> On 1/9/2013 9:58 AM, Nathaniel Smith wrote: > I don't think most happy current numpy users are wishing they > could switch to writing Lisp on the JVM or vice-versa, so I don't > think it's surprising that no-one's jumped up to do this work. Sure. I'm trying to look at this more from the Clojure end. Is it really better to start from scratch than to attempt a contribution to NumPy that would make it useful to Clojure. Given the amount of work that has gone into making NumPy what it is, it seems a huge project for the Clojure people to hope to produce anything comparable starting from scratch. Thanks, Alan From ben.root at ou.edu Wed Jan 9 10:41:34 2013 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 9 Jan 2013 10:41:34 -0500 Subject: [Numpy-discussion] Insights / lessons learned from NumPy design In-Reply-To: References: <50E68C2A.9060400@astro.uio.no> <50ED62E0.1010502@astro.uio.no> <50ED845A.7080402@gmail.com> Message-ID: On Wed, Jan 9, 2013 at 9:58 AM, Nathaniel Smith wrote: > On Wed, Jan 9, 2013 at 2:53 PM, Alan G Isaac wrote: > > I'm just a Python+NumPy user and not a CS type. > > May I ask a naive question on this thread? > > > > Given the work that has (as I understand it) gone into > > making NumPy usable as a C library, why is the discussion not > > going in a direction like the following: > > What changes to the NumPy code base would be required for it > > to provide useful ndarray functionality in a C extension > > to Clojure? Is this simply incompatible with the goal that > > Clojure compile to JVM byte code? > > IIUC that work was done on a fork of numpy which has since been > abandoned by its authors, so... yeah, numpy itself doesn't have much > to offer in this area right now. It could in principle with a bunch of > refactoring (ideally not on a fork, since we saw how well that went), > but I don't think most happy current numpy users are wishing they > could switch to writing Lisp on the JVM or vice-versa, so I don't > think it's surprising that no-one's jumped up to do this work. > > If I could just point out that the attempt to fork numpy for the .NET work was done back in the subversion days, and there was little-to-no effort to incrementally merge back changes to master, and vice-versa. With git as our repository now, such work may be more feasible. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Jan 9 11:42:30 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 9 Jan 2013 09:42:30 -0700 Subject: [Numpy-discussion] Linear least squares In-Reply-To: References: Message-ID: On Wed, Jan 9, 2013 at 1:29 AM, Till Stensitz wrote: > Nathaniel Smith pobox.com> writes: > > > > > > An obvious thing is that it always computes residuals, which could be > > costly; if your pinv code isn't doing that then it's not really > > comparable. (Though might still be well-suited for your actual > > problem.) > > > > Depending on how well-conditioned your problems are, and how much > > speed you need, there are faster ways than pinv as well. (Going via qr > > might or might not, going via cholesky almost certainly will be.) > > > > -n > > > > > You are right. With calculating the residuals, the speedup goes > down to a factor of 2. I had to calculate the residuals anyways because > lstsq only returns the squared sum of the residuals, while i need every > residual (as an input to optimize.leastsq). > > Same here. Unfortunately the residuals computed by the LAPACK function are in a different basis so aren't directly usable. I'd support adding a keyword to disable the usual computation of the sum of squares. Josef is also right, it is shape depended. For his example, lstsq is faster. > > Maybe it is possible to make lstsq to choose its method automatically? > Or some keyword to set the method and making other decompositions > available. > QR without column pivoting is a nice option for "safe" problems, but it doesn't provide a reliable indication of rank reduction. I also don't find pinv useful once the rank goes down, since it relies on Euclidean distance having relevance in parameter space and that is seldom a sound assumption, usually it is better to reformulate the problem or remove a column from the design matrix. So maybe an 'unsafe', or less suggestively, 'fast' keyword could also be an option. IIRC, this was discussed on the scipy mailing list a year or two ago. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Wed Jan 9 11:47:40 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Wed, 9 Jan 2013 11:47:40 -0500 Subject: [Numpy-discussion] ANN: NumPy 1.7.0rc1 release In-Reply-To: References: Message-ID: Hi, Congratulation for the release and a big thanks for the hard work. I tested it with our software and all work fine. thanks! Fr?d?ric On Sun, Dec 30, 2012 at 7:17 PM, Sandro Tosi wrote: > Hi Ondrej & al, > > On Sat, Dec 29, 2012 at 1:02 AM, Ond?ej ?ert?k wrote: >> I'm pleased to announce the availability of the first release candidate of >> NumPy 1.7.0rc1. > > Congrats on this RC release! > > I've uploaded this version to Debian and updated some of the issues > related to it. There are also a couple of minor PR you might want to > consider for 1.7: 2872 and 2873. > > Cheers, > -- > Sandro Tosi (aka morph, morpheus, matrixhasu) > My website: http://matrixhasu.altervista.org/ > Me at Debian: http://wiki.debian.org/SandroTosi > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From chris.barker at noaa.gov Wed Jan 9 12:22:19 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Wed, 9 Jan 2013 09:22:19 -0800 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: On Wed, Jan 9, 2013 at 7:09 AM, Nathaniel Smith wrote: >> This is a general issue applying to data which is read from real-world >> external sources. For example, digitizers routinely represent their >> samples as int8's or int16's, and you apply a scale and offset to get >> a reading in volts. > > This particular case is actually handled fine by 1.5, because int > array + float scalar *does* upcast to float. It's width that's ignored > (int8 versus int32), not the basic "kind" of data (int versus float). > > But overall this does sound like a problem -- but it's not a problem > with the scalar/array rules, it's a problem with working with narrow > width data in general. Exactly -- this is key. details asside, we essentially have a choice between an approach that makes it easy to preserver your values -- upcasting liberally, or making it easy to preserve your dtype -- requiring users to specifically upcast where needed. IIRC, our experience with earlier versions of numpy (and Numeric before that) is that all too often folks would choose a small dtype quite deliberately, then have it accidentally upcast for them -- this was determined to be not-so-good behavior. I think the HDF (and also netcdf...) case is a special case -- the small dtype+scaling has been chosen deliberately by whoever created the data file (to save space), but we would want it generally opaque to the consumer of the file -- to me, that means the issue should be adressed by the file reading tools, not numpy. If your HDF5 reader chooses the the resulting dtype explicitly, it doesn't matter what numpy's defaults are. If the user wants to work with the raw, unscaled arrays, then they should know what they are doing. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From jaakko.luttinen at aalto.fi Wed Jan 9 12:32:06 2013 From: jaakko.luttinen at aalto.fi (Jaakko Luttinen) Date: Wed, 9 Jan 2013 19:32:06 +0200 Subject: [Numpy-discussion] numpydoc for python 3? Message-ID: <50EDA996.6090806@aalto.fi> Hi! I'm trying to use numpydoc (Sphinx extension) for my project written in Python 3.2. However, installing numpydoc gives errors shown at http://pastebin.com/MPED6v9G and although it says "Successfully installed numpydoc", trying to import numpydoc raises errors.. Could this be fixed or am I doing something wrong? Thanks! Jaakko From chris.barker at noaa.gov Wed Jan 9 12:38:46 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Wed, 9 Jan 2013 09:38:46 -0800 Subject: [Numpy-discussion] Insights / lessons learned from NumPy design In-Reply-To: References: Message-ID: On Wed, Jan 9, 2013 at 2:35 AM, Mike Anderson >> First -- is this a "matrix" library, or a general use nd-array >> library? That will drive your design a great deal. > This is very useful context - thanks! I've had opinions in favour of both an > nd-array style library and a matrix library. I guess it depends on your use > case which one you are more inclined to think in. > > I'm hoping that it should be possible for the same API to support both, i.e. > you should be able to use a 2D array of numbers as a matrix, and vice-versa. sure, but the API can/should be differnent -- in some sense, the numpy matrix object is really just syntactic sugar -- you can use a 2-d array as a matrix, but then you have to explicilty call linear algebra functions to get things like matrix multiplication, etc. and do some hand work to make sure you're got things the right shape -- i.e a column or row vector where called for. tacking on the matrix object helped this, but in practice, it gets tricky to prevent operations from accidentally returning a plan array from operations on a matrix. Also numpy's matrix concept does not include the concept of a row or column vector, just 1XN or NX1 matrixes -- which works OK, but then when you iterate through a vector, you get 1X1 matrixes, rather than scalars -- a bit odd. Anyway, it takes some though to have two clean APIs sharing one core object. >> not a bad start, but another major strength of numpy is the multiple >> data types - you may wantt to design that concept in from the start. > But I'm curious: what is the main use case for the alternative data types in > NumPy? Is it for columns of data of heterogeneous types? or something else? heterogeneous data types were added relatively recently in numpy, and are great mostly for interacting with other libraries (and some syntactic sugar uses...) that may store data in arrays of structures. But multiple homogenous data types are critical for saving memory, speeding operations, doing integer math when that's really called for, manipulating images, etc, etc..... > 20-100GB is pretty ambitious and I guess reflects the maturity of > NumPy - I'd be happy with good handling of 100MB matrices right > now..... 100MB is prety darn small these days -- if you're only interested in smallish problems, then you can probably forget about performance issues, and focus on a really nice API. But I"m not sure I'd bother with that -- once people start using it, they'll want to use it for big problems! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From andrew.collette at gmail.com Wed Jan 9 12:58:39 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Wed, 9 Jan 2013 10:58:39 -0700 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50E61E29.1020709@astro.uio.no> Message-ID: Hi Nathaniel, > Sure. But the only reason this is in 1.6 is that the person who made > the change never mentioned it to anyone else, so it wasn't noticed > until after 1.6 came out. If it had gone through proper review/mailing > list discussion (like we're doing now) then it's very unlikely it > would have gone in in its present form. This is also a good point; I didn't realize that was how it was handled. Ultimately, the people who have to make this decision are the people who actually do the work -- and that means the core numpy maintainers. We've had a great discussion and I certainly feel like my input has been respected. Although I still disagree with the change, I certainly see that it's not as simple as I first thought. At this point the discussion has gone on for about 70 emails so far and I think I've said all I can. Thanks again for being willing to engage with users like this... numpy is an unusual project in that regard. I imagine that once the change is released (scheduled for 2.0?) the broader community will also be happy to provide input. Andrew From d.s.seljebotn at astro.uio.no Wed Jan 9 13:04:23 2013 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 09 Jan 2013 19:04:23 +0100 Subject: [Numpy-discussion] Insights / lessons learned from NumPy design In-Reply-To: References: <50E68C2A.9060400@astro.uio.no> <50ED62E0.1010502@astro.uio.no> <50ED845A.7080402@gmail.com> Message-ID: <50EDB127.30602@astro.uio.no> On 01/09/2013 04:41 PM, Benjamin Root wrote: > > > On Wed, Jan 9, 2013 at 9:58 AM, Nathaniel Smith > wrote: > > On Wed, Jan 9, 2013 at 2:53 PM, Alan G Isaac > wrote: > > I'm just a Python+NumPy user and not a CS type. > > May I ask a naive question on this thread? > > > > Given the work that has (as I understand it) gone into > > making NumPy usable as a C library, why is the discussion not > > going in a direction like the following: > > What changes to the NumPy code base would be required for it > > to provide useful ndarray functionality in a C extension > > to Clojure? Is this simply incompatible with the goal that > > Clojure compile to JVM byte code? > > IIUC that work was done on a fork of numpy which has since been > abandoned by its authors, so... yeah, numpy itself doesn't have much > to offer in this area right now. It could in principle with a bunch of > refactoring (ideally not on a fork, since we saw how well that went), > but I don't think most happy current numpy users are wishing they > could switch to writing Lisp on the JVM or vice-versa, so I don't > think it's surprising that no-one's jumped up to do this work. > > > If I could just point out that the attempt to fork numpy for the .NET > work was done back in the subversion days, and there was little-to-no > effort to incrementally merge back changes to master, and vice-versa. > With git as our repository now, such work may be more feasible. This is a matter of personal software design taste I guess, so the following is very subjective. I don't think there's anything at all to gain from this. In 2013 (and presumably, the future), a static C or C++ library is IMO fundamentally incompatible with achieving optimal performance. Going through a major refactor simply to end up with something that's no faster and no more flexible than what NumPy is today seems sort of pointless to me. What one wants is to generate ufuncs etc. on the fly using LLVM that are tuned to the specific tiling pattern of a specific operation, not a static C or C++ library (even with C++ meta-programming, the combinatorial explosion kills you if you do it all at compile-time). Granted, one could probably write a C++ library that was more of a compiler, using LLVM to emit code. But that's starting all over so not really relevant to the question of a NumPy refactor. This is how I understand Continuum thinks too, with Numba as a back-end for Blaze. (And Travis also spoke about this in his "farewell address".) Finally, Mark Florisson sort of started this with the 'minivect' library last summer which could as a "ufunc" backend both for Cython and Numba (which for this purpose are different languages), however as I understand it focus is now more on developing Numba directly rather than minivect (which is understandable as that's quicker). Dag Sverre From d.s.seljebotn at astro.uio.no Wed Jan 9 13:07:16 2013 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 09 Jan 2013 19:07:16 +0100 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: Message-ID: <50EDB1D4.5090909@astro.uio.no> On 01/09/2013 06:22 PM, Chris Barker - NOAA Federal wrote: > On Wed, Jan 9, 2013 at 7:09 AM, Nathaniel Smith wrote: >>> This is a general issue applying to data which is read from real-world >>> external sources. For example, digitizers routinely represent their >>> samples as int8's or int16's, and you apply a scale and offset to get >>> a reading in volts. >> >> This particular case is actually handled fine by 1.5, because int >> array + float scalar *does* upcast to float. It's width that's ignored >> (int8 versus int32), not the basic "kind" of data (int versus float). >> >> But overall this does sound like a problem -- but it's not a problem >> with the scalar/array rules, it's a problem with working with narrow >> width data in general. > > Exactly -- this is key. details asside, we essentially have a choice > between an approach that makes it easy to preserver your values -- > upcasting liberally, or making it easy to preserve your dtype -- > requiring users to specifically upcast where needed. > > IIRC, our experience with earlier versions of numpy (and Numeric > before that) is that all too often folks would choose a small dtype > quite deliberately, then have it accidentally upcast for them -- this > was determined to be not-so-good behavior. > > I think the HDF (and also netcdf...) case is a special case -- the > small dtype+scaling has been chosen deliberately by whoever created > the data file (to save space), but we would want it generally opaque > to the consumer of the file -- to me, that means the issue should be > adressed by the file reading tools, not numpy. If your HDF5 reader > chooses the the resulting dtype explicitly, it doesn't matter what > numpy's defaults are. If the user wants to work with the raw, unscaled > arrays, then they should know what they are doing. +1. I think h5py should consider: File("my.h5")['int8_dset'].dtype == int64 File("my.h5", preserve_dtype=True)['int8_dset'].dtype == int8 Dag Sverre From pierre.raybaut at gmail.com Wed Jan 9 16:05:23 2013 From: pierre.raybaut at gmail.com (Pierre Raybaut) Date: Wed, 9 Jan 2013 22:05:23 +0100 Subject: [Numpy-discussion] ANN: first previews of WinPython for Python 3 32/64bit Message-ID: Hi all, I'm pleased to announce that the first previews of WinPython for Python 3 32bit and 64bit are available (WinPython v3.3.0.0alpha1): http://code.google.com/p/winpython/ This first release based on Python 3 required to migrate the following libraries which were only available for Python 2: * formlayout 1.0.12 * guidata 1.6.0dev1 * guiqwt 2.3.0dev1 * Spyder 2.1.14dev Please note that these libraries are still development release. [Special thanks to Christoph Gohlke for patching and building a version of PyQwt compatible with Python 3.3] WinPython is a free open-source portable distribution of Python for Windows, designed for scientists. It is a full-featured (see http://code.google.com/p/winpython/wiki/PackageIndex) Python-based scientific environment: * Designed for scientists (thanks to the integrated libraries NumPy, SciPy, Matplotlib, guiqwt, etc.: * Regular *scientific users*: interactive data processing and visualization using Python with Spyder * *Advanced scientific users and software developers*: Python applications development with Spyder, version control with Mercurial and other development tools (like gettext) * *Portable*: preconfigured, it should run out of the box on any machine under Windows (without any installation requirements) and the folder containing WinPython can be moved to any location (local, network or removable drive) * *Flexible*: one can install (or should I write "use" as it's portable) as many WinPython versions as necessary (like isolated and self-consistent environments), even if those versions are running different versions of Python (2.7, 3.x in the near future) or different architectures (32bit or 64bit) on the same machine * *Customizable*: using the integrated package manager (wppm, as WinPython Package Manager), it's possible to install, uninstall or upgrade Python packages (see http://code.google.com/p/winpython/wiki/WPPM for more details on supported package formats). *WinPython is not an attempt to replace Python(x,y)*, this is just something different (see http://code.google.com/p/winpython/wiki/Roadmap): more flexible, easier to maintain, movable and less invasive for the OS, but certainly less user-friendly, with less packages/contents and without any integration to Windows explorer [*]. [*] Actually there is an optional integration into Windows explorer, providing the same features as the official Python installer regarding file associations and context menu entry (this option may be activated through the WinPython Control Panel), and adding shortcuts to Windows Start menu. Enjoy! -Pierre From chris.barker at noaa.gov Wed Jan 9 16:19:49 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Wed, 9 Jan 2013 13:19:49 -0800 Subject: [Numpy-discussion] Insights / lessons learned from NumPy design In-Reply-To: References: <50E68C2A.9060400@astro.uio.no> <50E68F12.90804@astro.uio.no> Message-ID: On Wed, Jan 9, 2013 at 2:57 AM, Mike Anderson > I'm hoping the API will be independent of storage format - i.e. the > underlying implementations can store the data any way they like. So the API > will be written in terms of abstractions, and the user will have the choice > of whatever concrete implementation best fits the specific needs. Sparse > matrices, tiled matrices etc. should all be possible options. A note about that -- as I think if it, numpy arrays are two things: 1) a python object for working with numbers, in a wide variety of ways 2) a wrapper around a C-array (or data block) that can be used to provide an easyway for Python to interact with C (and Fortran, and...) libraries, etc. As it turns out a LOT of people use numpy for (2) -- what this means is that while you could change the underlying data representation, etc, and keep the same Python API -- such changes would break a lot of non-pure-python code that relies on that data representation. This is a big issue with the numpy-for-PyPy project -- they could write a numpy clone, but it would only be useful for the pure-python stuff. Even then, a number of folks do tricks with numpy arrays in python that rely on the underlying structure. Not sure how all this would play out for Clojure, but it's something to keep in mind. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From jrocher at enthought.com Wed Jan 9 17:32:28 2013 From: jrocher at enthought.com (Jonathan Rocher) Date: Wed, 9 Jan 2013 16:32:28 -0600 Subject: [Numpy-discussion] [SCIPY2013] Feedback on mini-symposia themes Message-ID: Dear community members, We are working hard to organize the SciPy2013 conference (Scientific Computing with Python) , this June 24th-29th in Austin, TX. We would like to probe the community about the themes you would be interested in contributing to or participating in for the mini-symposia at SciPy2013. These mini-symposia are held to discuss scientific computing applied to a specific *scientific domain/industry* during a half afternoon after the general conference. Their goal is to promote industry specific libraries and tools, and gather people with similar interests for discussions. For example, the SciPy2012 edition successfully hosted 4 mini-symposia on Astronomy/Astrophysics, Bio-informatics, Meteorology, and Geophysics. Please join us and voice your opinion to shape the next SciPy conference at: http://www.surveygizmo.com/s3/1114631/SciPy-2013-Themes Thanks, The Scipy2013 organizers -- Jonathan Rocher, PhD Scientific software developer Enthought, Inc. jrocher at enthought.com 1-512-536-1057 http://www.enthought.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From chanley at gmail.com Wed Jan 9 18:21:44 2013 From: chanley at gmail.com (Christopher Hanley) Date: Wed, 9 Jan 2013 18:21:44 -0500 Subject: [Numpy-discussion] Remove support for numeric and numarray in 1.8 In-Reply-To: References: Message-ID: After poking around our code base and talking to a few folks I predict that we at STScI can remove our dependence on the numpy-numarray compatibility layer by the end of this calendar year. I'm unsure of what the timeline for numpy 1.8 is so I don't know if this schedule supports removal of the compatibility layer from 1.8 or not. Thanks, Chris On Sat, Jan 5, 2013 at 9:38 PM, Charles R Harris wrote: > Thoughts? > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Jan 9 18:38:56 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 9 Jan 2013 23:38:56 +0000 Subject: [Numpy-discussion] Remove support for numeric and numarray in 1.8 In-Reply-To: References: Message-ID: On Wed, Jan 9, 2013 at 11:21 PM, Christopher Hanley wrote: > After poking around our code base and talking to a few folks I predict that > we at STScI can remove our dependence on the numpy-numarray compatibility > layer by the end of this calendar year. I'm unsure of what the timeline for > numpy 1.8 is so I don't know if this schedule supports removal of the > compatibility layer from 1.8 or not. It'd be nice if 1.8 were out before that, but that doesn't really matter -- let us know when you get it sorted? Also, would it help if we added a big scary warning at import time to annoy your more recalcitrant developers with? :-) The basic issue is that none of us actually use this stuff, it has no tests, the rest of numpy is changing around it, and we have no idea if it works, so at some point it makes more sense for us to just stop shipping the compat layer and let anyone who still needs it maintain their own copy of the code. -n From charlesr.harris at gmail.com Wed Jan 9 18:41:52 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 9 Jan 2013 16:41:52 -0700 Subject: [Numpy-discussion] Remove support for numeric and numarray in 1.8 In-Reply-To: References: Message-ID: On Wed, Jan 9, 2013 at 4:21 PM, Christopher Hanley wrote: > After poking around our code base and talking to a few folks I predict > that we at STScI can remove our dependence on the numpy-numarray > compatibility layer by the end of this calendar year. I'm unsure of what > the timeline for numpy 1.8 is so I don't know if this schedule supports > removal of the compatibility layer from 1.8 or not. > > Together with the previous post that puts the kibosh on removing either numeric or numarray support from 1.8, at least if we get 1.8 before the end of summer. It's good to know where folks stand with regard to those packages, we'll give it another shot next year. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ondrej.certik at gmail.com Wed Jan 9 21:55:39 2013 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Wed, 9 Jan 2013 18:55:39 -0800 Subject: [Numpy-discussion] Which Python to use for Mac binaries In-Reply-To: References: Message-ID: On Tue, Jan 8, 2013 at 8:45 AM, Chris Barker - NOAA Federal wrote: > On Mon, Jan 7, 2013 at 10:23 PM, Ond?ej ?ert?k wrote: >>> http://www.commandlinefu.com/commands/view/2031/install-an-mpkg-from-the-command-line-on-osx >> >> This requires root access. Without sudo, I get: >> >> $ installer -pkg /Volumes/Python\ 2.7.3/Python.mpkg/ -target ondrej >> installer: This package requires authentication to install. >> >> and since I don't have root access, it doesn't work. >> >> So one way around it would be to install python from source, that >> shouldn't require root access. > > hmm -- this all may be a trick -- both the *.mpkg and the standard > build put everything in /Library/Frameworks/Python -- which is where > it belongs. Bu tif you need root access to write there, then there is > a problem. I'm sure a non-root build could put everything in the > users' home directory, then packages built against that would have > their paths messed up. Right. > > What's odd is that I'm pretty sure I've been able to point+click > install those without sudo...(I could recall incorrectly). > > This would be a good question for the pythonmac list -- low traffic, > but there are some very smart and helpful folks there: > > http://mail.python.org/mailman/listinfo/pythonmac-sig > > >>>> But I am not currently sure what to do with it. The Python.mpkg >>>> directory seems to contain the sources. > > It should be possible to unpack a mpkg by hand, but it contains both > the contents, and various instal scripts, so that seems like a really > ugly solution. Yep. In the meantime, the hard drive on Vincent's box failed, so he reinstalled the box completely. Also he explained to me a lot of Mac things over the phone, so I think I now understand what is going on with the dmg. As such, I have updated my instructions in my release helper repo: https://github.com/certik/numpy-vendor by the following paragraph: """ First prepare the Mac build box as follows: * Install Python 2.5, 2.6, 2.7 from python.org using the dmg disk image * Install setuptools and bdist_mpkg into all these Pythons * Install Paver into the default Python Tip: Add the /Library/Frameworks/Python.framework directory into git and commit after each installation of any package or Python. That way you can easily remove temporary installations. """ And you need sudo access to do those. If your user is an admin, then it can do it, otherwise it can't. So one can only use a Mac, which has the above setup installed. With that, my Fabfile can then do the rest. So I just built the following binaries: numpy-1.7.0rc1-py2.5-python.org-macosx10.3.dmg numpy-1.7.0rc1-py2.6-python.org-macosx10.3.dmg numpy-1.7.0rc1-py2.7-python.org-macosx10.3.dmg and uploaded to: https://sourceforge.net/projects/numpy/files/NumPy/1.7.0rc1/ So I think we are all set here. Ralf, would you be willing to build the final binary on 10.6? I don't think you have to do it for this rc1, but I am going to release rc2 now and for that it would be nice to have it. Ondrej From mike.r.anderson.13 at gmail.com Wed Jan 9 23:06:37 2013 From: mike.r.anderson.13 at gmail.com (Mike Anderson) Date: Thu, 10 Jan 2013 12:06:37 +0800 Subject: [Numpy-discussion] Insights / lessons learned from NumPy design In-Reply-To: <50ED883A.6060605@gmail.com> References: <50E68C2A.9060400@astro.uio.no> <50ED62E0.1010502@astro.uio.no> <50ED845A.7080402@gmail.com> <50ED883A.6060605@gmail.com> Message-ID: On 9 January 2013 23:09, Alan G Isaac wrote: > On 1/9/2013 9:58 AM, Nathaniel Smith wrote: > > I don't think most happy current numpy users are wishing they > > could switch to writing Lisp on the JVM or vice-versa, so I don't > > think it's surprising that no-one's jumped up to do this work. > > > Sure. I'm trying to look at this more from the Clojure end. > Is it really better to start from scratch than to attempt > a contribution to NumPy that would make it useful to Clojure. > Given the amount of work that has gone into making NumPy > what it is, it seems a huge project for the Clojure people > to hope to produce anything comparable starting from scratch. > > Thanks, > Alan Currently I expect that the Clojure community will produce an abstraction / API for matrices / ndarrays that supports multiple implementations. It's fairly idiomatic in Clojure to work in abstractions, and the language offers good tools for making different concrete abstractions work with a common API, so it's less hard to make this work than it might sound. An interface to NumPy could certainly be one of the implementations of this API - I'm sure people would find this very useful given the maturity on NumPy and the need for integration in environments with heterogeneous systems. At the same time, there will be people in the Clojure world who will want to stay 100% on the JVM for certain projects. For them I don't see how NumPy could be used, unless it can be made to run well on Jython perhaps? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Thu Jan 10 02:21:01 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 10 Jan 2013 08:21:01 +0100 Subject: [Numpy-discussion] Which Python to use for Mac binaries In-Reply-To: References: Message-ID: On Thu, Jan 10, 2013 at 3:55 AM, Ond?ej ?ert?k wrote: > On Tue, Jan 8, 2013 at 8:45 AM, Chris Barker - NOAA Federal > wrote: > > On Mon, Jan 7, 2013 at 10:23 PM, Ond?ej ?ert?k > wrote: > >>> > http://www.commandlinefu.com/commands/view/2031/install-an-mpkg-from-the-command-line-on-osx > >> > >> This requires root access. Without sudo, I get: > >> > >> $ installer -pkg /Volumes/Python\ 2.7.3/Python.mpkg/ -target ondrej > >> installer: This package requires authentication to install. > >> > >> and since I don't have root access, it doesn't work. > >> > >> So one way around it would be to install python from source, that > >> shouldn't require root access. > > > > hmm -- this all may be a trick -- both the *.mpkg and the standard > > build put everything in /Library/Frameworks/Python -- which is where > > it belongs. Bu tif you need root access to write there, then there is > > a problem. I'm sure a non-root build could put everything in the > > users' home directory, then packages built against that would have > > their paths messed up. > > Right. > > > > > What's odd is that I'm pretty sure I've been able to point+click > > install those without sudo...(I could recall incorrectly). > > > > This would be a good question for the pythonmac list -- low traffic, > > but there are some very smart and helpful folks there: > > > > http://mail.python.org/mailman/listinfo/pythonmac-sig > > > > > >>>> But I am not currently sure what to do with it. The Python.mpkg > >>>> directory seems to contain the sources. > > > > It should be possible to unpack a mpkg by hand, but it contains both > > the contents, and various instal scripts, so that seems like a really > > ugly solution. > > Yep. > > In the meantime, the hard drive on Vincent's box failed, so he > reinstalled the box completely. > Also he explained to me a lot of Mac things over the phone, so I think > I now understand what is going on with the dmg. > > As such, I have updated my instructions in my release helper repo: > > https://github.com/certik/numpy-vendor > > by the following paragraph: > > """ > First prepare the Mac build box as follows: > > * Install Python 2.5, 2.6, 2.7 from python.org using the dmg disk image > * Install setuptools and bdist_mpkg into all these Pythons > * Install Paver into the default Python > > Tip: Add the /Library/Frameworks/Python.framework directory into git > and commit after each installation of any package or Python. That way > you can easily remove temporary installations. > """ > > And you need sudo access to do those. If your user is an admin, then > it can do it, otherwise it can't. > So one can only use a Mac, which has the above setup installed. With > that, my Fabfile can then do the rest. > > So I just built the following binaries: > > numpy-1.7.0rc1-py2.5-python.org-macosx10.3.dmg > numpy-1.7.0rc1-py2.6-python.org-macosx10.3.dmg > numpy-1.7.0rc1-py2.7-python.org-macosx10.3.dmg > > and uploaded to: > > https://sourceforge.net/projects/numpy/files/NumPy/1.7.0rc1/ > > > So I think we are all set here. Ralf, would you be willing to build > the final binary on 10.6? I don't think you have to do it for this > rc1, but I am going to release rc2 now and for that it would be nice > to have it. > Sure, no problem. For the part that needs to be built on 10.6 that is. Vincent's box still has 10.5, right? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Thu Jan 10 04:40:52 2013 From: chaoyuejoy at gmail.com (Chao YUE) Date: Thu, 10 Jan 2013 10:40:52 +0100 Subject: [Numpy-discussion] return index of maximum value in an array easily? Message-ID: Dear all, Are we going to consider returning the index of maximum value in an array easily without calling np.argmax and np.unravel_index consecutively? I saw few posts in mailing archive and stackover flow on this, when I tried to return the index of maximum value of 2d array. It seems that I am not the first to be confused by this. http://stackoverflow.com/questions/11377028/getting-index-of-numpy-ndarray http://old.nabble.com/maximum-value-and-corresponding-index-td24834930.html http://stackoverflow.com/questions/5469286/how-to-get-the-index-of-a-maximum-element-in-a-numpy-array http://stackoverflow.com/questions/4150542/determine-index-of-highest-value-in-pythons-numpy cheers, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From klemm at phys.ethz.ch Thu Jan 10 05:00:57 2013 From: klemm at phys.ethz.ch (Hanno Klemm) Date: Thu, 10 Jan 2013 11:00:57 +0100 Subject: [Numpy-discussion] =?utf-8?q?return_index_of_maximum_value_in_an_?= =?utf-8?q?array_easily=3F?= In-Reply-To: References: Message-ID: Hi Chao, in two dimensions the following works very well: In [97]: a = np.random.randn(5,7) In [98]: a[divmod(a.argmax(), a.shape[1])] Out[98]: 1.3680204597100922 In [99]: a.max() Out[99]: 1.3680204597100922 In [100]: In [100]: b = a[divmod(a.argmax(), a.shape[1])] In [101]: b==a.max() Out[101]: True Cheers, Hanno On 10.01.2013 10:40, Chao YUE wrote: > Dear all, > > Are we going to consider returning the index of maximum value in an > array easily > without calling np.argmax and np.unravel_index consecutively? > > I saw few posts in mailing archive and stackover flow on this, when I > tried to return > the index of maximum value of 2d array. > > It seems that I am not the first to be confused by this. > > > http://stackoverflow.com/questions/11377028/getting-index-of-numpy-ndarray > [1] > > http://old.nabble.com/maximum-value-and-corresponding-index-td24834930.html > [2] > > http://stackoverflow.com/questions/5469286/how-to-get-the-index-of-a-maximum-element-in-a-numpy-array > [3] > > > http://stackoverflow.com/questions/4150542/determine-index-of-highest-value-in-pythons-numpy > [4] > > cheers, > > Chao > -- > > > *********************************************************************************** > > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > > ************************************************************************************ > > Links: > ------ > [1] > http://stackoverflow.com/questions/11377028/getting-index-of-numpy-ndarray > [2] > http://old.nabble.com/maximum-value-and-corresponding-index-td24834930.html > [3] > > http://stackoverflow.com/questions/5469286/how-to-get-the-index-of-a-maximum-element-in-a-numpy-array > [4] > > http://stackoverflow.com/questions/4150542/determine-index-of-highest-value-in-pythons-numpy > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Hanno Klemm klemm at phys.ethz.ch From madsipsen at gmail.com Thu Jan 10 05:32:45 2013 From: madsipsen at gmail.com (Mads Ipsen) Date: Thu, 10 Jan 2013 11:32:45 +0100 Subject: [Numpy-discussion] int and long issues Message-ID: <50EE98CD.8090700@gmail.com> Hi, I find this to be a little strange: x = numpy.arange(10) isinstance(x[0],int) gives True y = numpy.where(x < 5)[0] isinstance(y[0],int) gives False isinstance(y[0],long) gives True Specs: Python 2.7.2, numpy-1.6.1, Win7, 64 bit Best regards, Mads -- +-----------------------------------------------------+ | Mads Ipsen | +----------------------+------------------------------+ | G?seb?ksvej 7, 4. tv | | | DK-2500 Valby | phone: +45-29716388 | | Denmark | email: mads.ipsen at gmail.com | +----------------------+------------------------------+ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail.till at gmx.de Thu Jan 10 06:01:01 2013 From: mail.till at gmx.de (Till Stensitzki) Date: Thu, 10 Jan 2013 11:01:01 +0000 (UTC) Subject: [Numpy-discussion] =?utf-8?q?ANN=3A_first_previews_of_WinPython_f?= =?utf-8?q?or_Python_3=0932/64bit?= References: Message-ID: Pierre Raybaut gmail.com> writes: > > Hi all, > > I'm pleased to announce that the first previews of WinPython for > Python 3 32bit and 64bit are available (WinPython v3.3.0.0alpha1): > http://code.google.com/p/winpython/ > This first release based on Python 3 required to migrate the following > libraries which were only available for Python 2: > * formlayout 1.0.12 > * guidata 1.6.0dev1 > * guiqwt 2.3.0dev1 > * Spyder 2.1.14dev > Please note that these libraries are still development release. > [Special thanks to Christoph Gohlke for patching and building a > version of PyQwt compatible with Python 3.3] > Hey Pierre, i just want to say thanks for your work. I use spyder, winpython (no more hassle with administration) and guiqwt (fastest plotting library under pyqt) daily and love them. greetings Till From sebastian at sipsolutions.net Thu Jan 10 06:06:39 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 10 Jan 2013 12:06:39 +0100 Subject: [Numpy-discussion] int and long issues In-Reply-To: <50EE98CD.8090700@gmail.com> References: <50EE98CD.8090700@gmail.com> Message-ID: <1357815999.2516.6.camel@sebastian-laptop> On Thu, 2013-01-10 at 11:32 +0100, Mads Ipsen wrote: > Hi, > > I find this to be a little strange: > > x = numpy.arange(10) > isinstance(x[0],int) > > gives True > > y = numpy.where(x < 5)[0] > isinstance(y[0],int) > > gives False > > isinstance(y[0],long) > Check what type(x[0])/type(y[0]) prints, I expect these are very different, because the default integer type and the integer type used for indexing (addressing memory in general) are not necessarily the same. And because of that, `y[0]` probably simply isn't compatible to the datatype of a python integer for your hardware and OS (for example for me, your code works). So on python 2 (python 3 abolishes int and makes long the only integer, so this should work as expected there) you have to just check both even in the python context, because you can never really know (there may be some nice trick for that, but not sure). And if you want to allow for rare 0d arrays as well (well they are very rare admittingly)... it gets even a bit hairier. > gives True > > Specs: Python 2.7.2, numpy-1.6.1, Win7, 64 bit > > Best regards, > > Mads > -- > +-----------------------------------------------------+ > | Mads Ipsen | > +----------------------+------------------------------+ > | G?seb?ksvej 7, 4. tv | | > | DK-2500 Valby | phone: +45-29716388 | > | Denmark | email: mads.ipsen at gmail.com | > +----------------------+------------------------------+ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Thu Jan 10 06:06:54 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 10 Jan 2013 12:06:54 +0100 Subject: [Numpy-discussion] int and long issues In-Reply-To: <50EE98CD.8090700@gmail.com> References: <50EE98CD.8090700@gmail.com> Message-ID: <1357816014.2516.7.camel@sebastian-laptop> On Thu, 2013-01-10 at 11:32 +0100, Mads Ipsen wrote: > Hi, > > I find this to be a little strange: > > x = numpy.arange(10) > isinstance(x[0],int) > > gives True > > y = numpy.where(x < 5)[0] > isinstance(y[0],int) > > gives False > > isinstance(y[0],long) > Check what type(x[0])/type(y[0]) prints, I expect these are very different, because the default integer type and the integer type used for indexing (addressing memory in general) are not necessarily the same. And because of that, `y[0]` probably simply isn't compatible to the datatype of a python integer for your hardware and OS (for example for me, your code works). So on python 2 (python 3 abolishes int and makes long the only integer, so this should work as expected there) you have to just check both even in the python context, because you can never really know (there may be some nice trick for that, but not sure). And if you want to allow for rare 0d arrays as well (well they are very rare admittingly)... it gets even a bit hairier. Regards, Sebastian > gives True > > Specs: Python 2.7.2, numpy-1.6.1, Win7, 64 bit > > Best regards, > > Mads > -- > +-----------------------------------------------------+ > | Mads Ipsen | > +----------------------+------------------------------+ > | G?seb?ksvej 7, 4. tv | | > | DK-2500 Valby | phone: +45-29716388 | > | Denmark | email: mads.ipsen at gmail.com | > +----------------------+------------------------------+ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From mail.till at gmx.de Thu Jan 10 06:05:52 2013 From: mail.till at gmx.de (Till Stensitzki) Date: Thu, 10 Jan 2013 11:05:52 +0000 (UTC) Subject: [Numpy-discussion] Linear least squares References: Message-ID: > > QR without column pivoting is a nice option for >"safe" problems, but it doesn't >provide a reliable indication of rank >reduction. I also don't find pinv useful >once the rank goes down, since it relies on > Euclidean distance having relevance in >parameter space and that is seldom a sound >assumption, usually it is better to >reformulate the problem or remove a column > from the design matrix. Oh, i always taught that lstsq is more or less using the same procedure as pinv. Maybe you can give me a hint which algorithm is the most stable, as system (sum of expontials) is not very stable? My numerical lectures were some year ago. greetings Till From madsipsen at gmail.com Thu Jan 10 06:29:41 2013 From: madsipsen at gmail.com (Mads Ipsen) Date: Thu, 10 Jan 2013 12:29:41 +0100 Subject: [Numpy-discussion] int and long issues In-Reply-To: <1357815999.2516.6.camel@sebastian-laptop> References: <50EE98CD.8090700@gmail.com> <1357815999.2516.6.camel@sebastian-laptop> Message-ID: <50EEA625.1050305@gmail.com> Sebastian - thanks - very helpful. Best regards, Mads On 10/01/2013 12:06, Sebastian Berg wrote: > On Thu, 2013-01-10 at 11:32 +0100, Mads Ipsen wrote: >> Hi, >> >> I find this to be a little strange: >> >> x = numpy.arange(10) >> isinstance(x[0],int) >> >> gives True >> >> y = numpy.where(x < 5)[0] >> isinstance(y[0],int) >> >> gives False >> >> isinstance(y[0],long) >> > Check what type(x[0])/type(y[0]) prints, I expect these are very > different, because the default integer type and the integer type used > for indexing (addressing memory in general) are not necessarily the > same. And because of that, `y[0]` probably simply isn't compatible to > the datatype of a python integer for your hardware and OS (for example > for me, your code works). So on python 2 (python 3 abolishes int and > makes long the only integer, so this should work as expected there) you > have to just check both even in the python context, because you can > never really know (there may be some nice trick for that, but not sure). > And if you want to allow for rare 0d arrays as well (well they are very > rare admittingly)... it gets even a bit hairier. > > >> gives True >> >> Specs: Python 2.7.2, numpy-1.6.1, Win7, 64 bit >> >> Best regards, >> >> Mads >> -- >> +-----------------------------------------------------+ >> | Mads Ipsen | >> +----------------------+------------------------------+ >> | G?seb?ksvej 7, 4. tv | | >> | DK-2500 Valby | phone: +45-29716388 | >> | Denmark | email: mads.ipsen at gmail.com | >> +----------------------+------------------------------+ >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- +-----------------------------------------------------+ | Mads Ipsen | +----------------------+------------------------------+ | G?seb?ksvej 7, 4. tv | | | DK-2500 Valby | phone: +45-29716388 | | Denmark | email: mads.ipsen at gmail.com | +----------------------+------------------------------+ From pav at iki.fi Thu Jan 10 07:04:07 2013 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 10 Jan 2013 12:04:07 +0000 (UTC) Subject: [Numpy-discussion] numpydoc for python 3? References: <50EDA996.6090806@aalto.fi> Message-ID: Hi, Jaakko Luttinen aalto.fi> writes: > I'm trying to use numpydoc (Sphinx extension) for my project written in > Python 3.2. However, installing numpydoc gives errors shown at > http://pastebin.com/MPED6v9G and although it says "Successfully > installed numpydoc", trying to import numpydoc raises errors.. > > Could this be fixed or am I doing something wrong? Numpydoc hasn't been ported to Python 3 so far. This probably wouldn't a very large amount of work --- patches are accepted! -- Pauli Virtanen From nouiz at nouiz.org Thu Jan 10 09:28:45 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Thu, 10 Jan 2013 09:28:45 -0500 Subject: [Numpy-discussion] Remove support for numeric and numarray in 1.8 In-Reply-To: References: Message-ID: Hi, Just to note, as they plan to remove there dependency on it this year, is it bad that they can't use 1.8 for a few mounts until they finish the conversion? They already have a working version. They can continue to use it for as long they want. The only advantage for them if the compat layers are kept is the abbility for them to use the new numpy 1.8 a few monts earlier. I don't know enough about this issue, but from Nathaniel description, the consequence of dropping it in 1.8 seam light compared to the potential problem in my view. But the question is, how many other group are in there situation? Can we make a big warning printed when we compile again those compatibility layer to make it clear they will get removed? (probably it is already like that) my 2 cents Fred On Wed, Jan 9, 2013 at 6:41 PM, Charles R Harris wrote: > > > On Wed, Jan 9, 2013 at 4:21 PM, Christopher Hanley > wrote: >> >> After poking around our code base and talking to a few folks I predict >> that we at STScI can remove our dependence on the numpy-numarray >> compatibility layer by the end of this calendar year. I'm unsure of what >> the timeline for numpy 1.8 is so I don't know if this schedule supports >> removal of the compatibility layer from 1.8 or not. >> > > Together with the previous post that puts the kibosh on removing either > numeric or numarray support from 1.8, at least if we get 1.8 before the end > of summer. It's good to know where folks stand with regard to those > packages, we'll give it another shot next year. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jaakko.luttinen at aalto.fi Thu Jan 10 09:54:35 2013 From: jaakko.luttinen at aalto.fi (Jaakko Luttinen) Date: Thu, 10 Jan 2013 16:54:35 +0200 Subject: [Numpy-discussion] numpydoc for python 3? In-Reply-To: References: <50EDA996.6090806@aalto.fi> Message-ID: <50EED62B.9010105@aalto.fi> The files in numpy/doc/sphinxext/ and numpydoc/ (from PyPI) are a bit different. Which ones should be modified? -Jaakko On 01/10/2013 02:04 PM, Pauli Virtanen wrote: > Hi, > > Jaakko Luttinen aalto.fi> writes: >> I'm trying to use numpydoc (Sphinx extension) for my project written in >> Python 3.2. However, installing numpydoc gives errors shown at >> http://pastebin.com/MPED6v9G and although it says "Successfully >> installed numpydoc", trying to import numpydoc raises errors.. >> >> Could this be fixed or am I doing something wrong? > > Numpydoc hasn't been ported to Python 3 so far. This probably > wouldn't a very large amount of work --- patches are accepted! > From pav at iki.fi Thu Jan 10 10:04:40 2013 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 10 Jan 2013 15:04:40 +0000 (UTC) Subject: [Numpy-discussion] numpydoc for python 3? References: <50EDA996.6090806@aalto.fi> <50EED62B.9010105@aalto.fi> Message-ID: Jaakko Luttinen aalto.fi> writes: > The files in numpy/doc/sphinxext/ and numpydoc/ (from PyPI) are a bit > different. Which ones should be modified? The stuff in sphinxext/ is the development version of the package on PyPi, so the changes should be made in sphinxext/ -- Pauli Virtanen From jaakko.luttinen at aalto.fi Thu Jan 10 10:16:32 2013 From: jaakko.luttinen at aalto.fi (Jaakko Luttinen) Date: Thu, 10 Jan 2013 17:16:32 +0200 Subject: [Numpy-discussion] numpydoc for python 3? In-Reply-To: References: <50EDA996.6090806@aalto.fi> <50EED62B.9010105@aalto.fi> Message-ID: <50EEDB50.5000902@aalto.fi> On 01/10/2013 05:04 PM, Pauli Virtanen wrote: > Jaakko Luttinen aalto.fi> writes: >> The files in numpy/doc/sphinxext/ and numpydoc/ (from PyPI) are a bit >> different. Which ones should be modified? > > The stuff in sphinxext/ is the development version of the package on > PyPi, so the changes should be made in sphinxext/ > Thanks! I'm trying to run the tests with Python 2 using nosetests, but I get some errors http://pastebin.com/Mp9i8T2f . Am I doing something wrong? How should I run the tests? If I run nosetests on the numpydoc folder from PyPI, all the tests are successful. -Jaakko From klonuo at gmail.com Thu Jan 10 10:56:38 2013 From: klonuo at gmail.com (klo) Date: Thu, 10 Jan 2013 16:56:38 +0100 Subject: [Numpy-discussion] Building Numpy 1.6.2 for Python 3.3 on Windows Message-ID: <147978185.20130110165638@gmail.com> Hi, I run `python3 setup.py config` and then python3 setup.py build --compiler=mingw32 but it picks that I have MSVC 10 and complains about manifests. Why, or even better, how to compile with available MinGW compilers? Here is log: ======================================== C:\src\numpy-1.6.2>python3 setup.py --compiler=mingw32 Converting to Python3 via 2to3... Running from numpy source directory.usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...] or: setup.py --help [cmd1 cmd2 ...] or: setup.py --help-commands or: setup.py cmd --help error: option --compiler not recognized C:\src\numpy-1.6.2>python3 setup.py build --compiler=mingw32 Converting to Python3 via 2to3... F2PY Version 2 blas_opt_info: blas_mkl_info: FOUND: libraries = ['mkl_rt'] library_dirs = ['C:/Progra~1/Intel/Compos~1/mkl/lib/ia32'] include_dirs = ['C:/Progra~1/Intel/Compos~1/mkl/include'] define_macros = [('SCIPY_MKL_H', None)] FOUND: libraries = ['mkl_rt'] library_dirs = ['C:/Progra~1/Intel/Compos~1/mkl/lib/ia32'] include_dirs = ['C:/Progra~1/Intel/Compos~1/mkl/include'] define_macros = [('SCIPY_MKL_H', None)] non-existing path in 'numpy\\lib': 'benchmarks' lapack_opt_info: lapack_mkl_info: mkl_info: FOUND: libraries = ['mkl_rt'] library_dirs = ['C:/Progra~1/Intel/Compos~1/mkl/lib/ia32'] include_dirs = ['C:/Progra~1/Intel/Compos~1/mkl/include'] define_macros = [('SCIPY_MKL_H', None)] FOUND: libraries = ['mkl_rt'] library_dirs = ['C:/Progra~1/Intel/Compos~1/mkl/lib/ia32'] include_dirs = ['C:/Progra~1/Intel/Compos~1/mkl/include'] define_macros = [('SCIPY_MKL_H', None)] FOUND: libraries = ['mkl_rt'] library_dirs = ['C:/Progra~1/Intel/Compos~1/mkl/lib/ia32'] include_dirs = ['C:/Progra~1/Intel/Compos~1/mkl/include'] define_macros = [('SCIPY_MKL_H', None)] running build running config_cc unifing config_cc, config, build_clib, build_ext, build commands --compiler options running config_fc unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options running build_src build_src building py_modules sources building library "npymath" sources customize GnuFCompiler Could not locate executable g77 Could not locate executable f77 customize IntelVisualFCompiler Could not locate executable ifort Could not locate executable ifl customize AbsoftFCompiler Could not locate executable f90 customize CompaqVisualFCompiler Could not locate executable DF customize IntelItaniumVisualFCompiler Could not locate executable efl customize Gnu95FCompiler Found executable C:\MinGW\bin\gfortran.exe Found executable C:\MinGW\bin\gfortran.exe Running from numpy source directory.customize Gnu95FCompiler customize Gnu95FCompiler using config Traceback (most recent call last): File "C:\src\numpy-1.6.2\build\py3k\numpy\distutils\mingw32ccompiler.py", line 399, in msvc_manifest_xml fullver = _MSVCRVER_TO_FULLVER[str(maj * 10 + min)] KeyError: '100' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "setup.py", line 214, in setup_package() File "setup.py", line 207, in setup_package configuration=configuration ) File "C:\src\numpy-1.6.2\build\py3k\numpy\distutils\core.py", line 186, in setup return old_setup(**new_attr) File "c:\python33\lib\distutils\core.py", line 148, in setup dist.run_commands() File "c:\python33\lib\distutils\dist.py", line 917, in run_commands self.run_command(cmd) File "c:\python33\lib\distutils\dist.py", line 936, in run_command cmd_obj.run() File "C:\src\numpy-1.6.2\build\py3k\numpy\distutils\command\build.py", line 37, in run old_build.run(self) File "c:\python33\lib\distutils\command\build.py", line 126, in run self.run_command(cmd_name) File "c:\python33\lib\distutils\cmd.py", line 313, in run_command self.distribution.run_command(command) File "c:\python33\lib\distutils\dist.py", line 936, in run_command cmd_obj.run() File "C:\src\numpy-1.6.2\build\py3k\numpy\distutils\command\build_src.py", line 152, in run self.build_sources() File "C:\src\numpy-1.6.2\build\py3k\numpy\distutils\command\build_src.py", line 163, in build_sources self.build_library_sources(*libname_info) File "C:\src\numpy-1.6.2\build\py3k\numpy\distutils\command\build_src.py", line 298, in build_library_sources sources = self.generate_sources(sources, (lib_name, build_info)) File "C:\src\numpy-1.6.2\build\py3k\numpy\distutils\command\build_src.py", line 385, in generate_sources source = func(extension, build_dir) File "numpy\core\setup.py", line 694, in get_mathlib_info st = config_cmd.try_link('int main(void) { return 0;}') File "c:\python33\lib\distutils\command\config.py", line 246, in try_link libraries, library_dirs, lang) File "C:\src\numpy-1.6.2\build\py3k\numpy\distutils\command\config.py", line 146, in _link generate_manifest(self) File "C:\src\numpy-1.6.2\build\py3k\numpy\distutils\mingw32ccompiler.py", line 484, in generate_manifest manxml = msvc_manifest_xml(ma, mi) File "C:\src\numpy-1.6.2\build\py3k\numpy\distutils\mingw32ccompiler.py", line 402, in msvc_manifest_xml % (maj, min)) ValueError: Version 10,0 of MSVCRT not supported yet ======================================== From chanley at gmail.com Thu Jan 10 11:02:45 2013 From: chanley at gmail.com (Christopher Hanley) Date: Thu, 10 Jan 2013 11:02:45 -0500 Subject: [Numpy-discussion] Remove support for numeric and numarray in 1.8 In-Reply-To: References: Message-ID: I'm all for a big scary warning on import. Fair warning is good for everyone, not just our developers. As for testing, our software that uses the API is tested nightly. So if our software stops working, and the compatibility layer is the cause, we would definitely be looking into what happened. :-) In any case, fair warning of dropped support in 1.8 and removal in 1.9 is fine with us. Thanks, Chris On Wed, Jan 9, 2013 at 6:38 PM, Nathaniel Smith wrote: > On Wed, Jan 9, 2013 at 11:21 PM, Christopher Hanley > wrote: > > After poking around our code base and talking to a few folks I predict > that > > we at STScI can remove our dependence on the numpy-numarray compatibility > > layer by the end of this calendar year. I'm unsure of what the timeline > for > > numpy 1.8 is so I don't know if this schedule supports removal of the > > compatibility layer from 1.8 or not. > > It'd be nice if 1.8 were out before that, but that doesn't really > matter -- let us know when you get it sorted? > > Also, would it help if we added a big scary warning at import time to > annoy your more recalcitrant developers with? :-) > > The basic issue is that none of us actually use this stuff, it has no > tests, the rest of numpy is changing around it, and we have no idea if > it works, so at some point it makes more sense for us to just stop > shipping the compat layer and let anyone who still needs it maintain > their own copy of the code. > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.j.a.cock at googlemail.com Thu Jan 10 11:09:53 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 10 Jan 2013 16:09:53 +0000 Subject: [Numpy-discussion] Building Numpy 1.6.2 for Python 3.3 on Windows In-Reply-To: <147978185.20130110165638@gmail.com> References: <147978185.20130110165638@gmail.com> Message-ID: On Thu, Jan 10, 2013 at 3:56 PM, klo wrote: > Hi, > > I run `python3 setup.py config` and then > > python3 setup.py build --compiler=mingw32 > > but it picks that I have MSVC 10 and complains about manifests. > Why, or even better, how to compile with available MinGW compilers? I reported this issue/bug to the mailing list recently as part of a discussion with Ralf which lead to various fixes being made to get NumPy to compile with either mingw32 or MSCV 10. http://mail.scipy.org/pipermail/numpy-discussion/2012-November/064454.html My workaround is to change the default compiler for Python 3, by creating C:\Python33\Lib\distutils\distutils.cfg containing: [build] compiler=mingw32 Peter From raul at virtualmaterials.com Thu Jan 10 11:15:55 2013 From: raul at virtualmaterials.com (Raul Cota) Date: Thu, 10 Jan 2013 09:15:55 -0700 Subject: [Numpy-discussion] Remove support for numeric and numarray in 1.8 In-Reply-To: References: Message-ID: <50EEE93B.4040206@virtualmaterials.com> An HTML attachment was scrubbed... URL: From klonuo at gmail.com Thu Jan 10 11:35:31 2013 From: klonuo at gmail.com (klo) Date: Thu, 10 Jan 2013 17:35:31 +0100 Subject: [Numpy-discussion] Building Numpy 1.6.2 for Python 3.3 on Windows In-Reply-To: References: <147978185.20130110165638@gmail.com> Message-ID: <1074049520.20130110173531@gmail.com> > I reported this issue/bug to the mailing list recently as part of > a discussion with Ralf which lead to various fixes being made > to get NumPy to compile with either mingw32 or MSCV 10. > http://mail.scipy.org/pipermail/numpy-discussion/2012-November/064454.html > My workaround is to change the default compiler for Python 3, > by creating C:\Python33\Lib\distutils\distutils.cfg containing: > [build] > compiler=mingw32 Thanks, but I have already set C:\Python33\Lib\distutils\distutils.cfg: ======================================== [build] compiler=mingw32 [build_ext] compiler=mingw32 ======================================== From cgohlke at uci.edu Thu Jan 10 12:08:50 2013 From: cgohlke at uci.edu (Christoph Gohlke) Date: Thu, 10 Jan 2013 09:08:50 -0800 Subject: [Numpy-discussion] Building Numpy 1.6.2 for Python 3.3 on Windows In-Reply-To: <1074049520.20130110173531@gmail.com> References: <147978185.20130110165638@gmail.com> <1074049520.20130110173531@gmail.com> Message-ID: <50EEF5A2.9090600@uci.edu> On 1/10/2013 8:35 AM, klo wrote: >> I reported this issue/bug to the mailing list recently as part of >> a discussion with Ralf which lead to various fixes being made >> to get NumPy to compile with either mingw32 or MSCV 10. > >> http://mail.scipy.org/pipermail/numpy-discussion/2012-November/064454.html > >> My workaround is to change the default compiler for Python 3, >> by creating C:\Python33\Lib\distutils\distutils.cfg containing: > >> [build] >> compiler=mingw32 > > Thanks, but I have already set C:\Python33\Lib\distutils\distutils.cfg: > > ======================================== > [build] > compiler=mingw32 > > [build_ext] > compiler=mingw32 > ======================================== > Numpy <= 1.6 is not compatible with Python 3.3. Use numpy >= 1.7.0rc1. Christoph From chris.barker at noaa.gov Thu Jan 10 12:26:16 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 10 Jan 2013 09:26:16 -0800 Subject: [Numpy-discussion] Which Python to use for Mac binaries In-Reply-To: References: Message-ID: Ond?ej, Vincent, and Ralf (and others..) Thank you so much for doing all this -- it's a great service to the MacPython community. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From klonuo at gmail.com Thu Jan 10 14:14:35 2013 From: klonuo at gmail.com (klo) Date: Thu, 10 Jan 2013 20:14:35 +0100 Subject: [Numpy-discussion] Building Numpy 1.6.2 for Python 3.3 on Windows In-Reply-To: <50EEF5A2.9090600@uci.edu> References: <147978185.20130110165638@gmail.com> <1074049520.20130110173531@gmail.com> <50EEF5A2.9090600@uci.edu> Message-ID: <443695837.20130110201435@gmail.com> > Numpy <= 1.6 is not compatible with Python 3.3. Use numpy >= 1.7.0rc1. Thanks for the tip 1.7.0rc builds without issue From klonuo at gmail.com Thu Jan 10 14:57:50 2013 From: klonuo at gmail.com (klo) Date: Thu, 10 Jan 2013 20:57:50 +0100 Subject: [Numpy-discussion] Building Numpy 1.6.2 for Python 3.3 on Windows In-Reply-To: <443695837.20130110201435@gmail.com> References: <147978185.20130110165638@gmail.com> <1074049520.20130110173531@gmail.com> <50EEF5A2.9090600@uci.edu> <443695837.20130110201435@gmail.com> Message-ID: <485515987.20130110205750@gmail.com> >> Numpy <= 1.6 is not compatible with Python 3.3. Use numpy >= 1.7.0rc1. > Thanks for the tip > 1.7.0rc builds without issue Actually, this isn't over. It builds fine, but when I try to import numpy I get error: ======================================== ... from numpy.linalg import lapack_lite ImportError: DLL load failed: The specified module could not be found. ======================================== Google reveals that PATH has to be updated with "C:\Python33\Scripts" path, but then when I run `python3 setup.py build` I get another error: ======================================== C:\src\numpy-1.7.0rc1>python3 setup.py build Converting to Python3 via 2to3... Running from numpy source directory. F2PY Version 2 blas_opt_info: blas_mkl_info: FOUND: define_macros = [('SCIPY_MKL_H', None)] libraries = ['mkl_rt'] include_dirs = ['C:/Progra~1/Intel/Compos~1/mkl/include'] library_dirs = ['C:/Progra~1/Intel/Compos~1/mkl/lib/ia32'] FOUND: define_macros = [('SCIPY_MKL_H', None)] libraries = ['mkl_rt'] include_dirs = ['C:/Progra~1/Intel/Compos~1/mkl/include'] library_dirs = ['C:/Progra~1/Intel/Compos~1/mkl/lib/ia32'] non-existing path in 'numpy\\lib': 'benchmarks' lapack_opt_info: lapack_mkl_info: mkl_info: FOUND: define_macros = [('SCIPY_MKL_H', None)] libraries = ['mkl_rt'] include_dirs = ['C:/Progra~1/Intel/Compos~1/mkl/include'] library_dirs = ['C:/Progra~1/Intel/Compos~1/mkl/lib/ia32'] FOUND: define_macros = [('SCIPY_MKL_H', None)] libraries = ['mkl_rt'] include_dirs = ['C:/Progra~1/Intel/Compos~1/mkl/include'] library_dirs = ['C:/Progra~1/Intel/Compos~1/mkl/lib/ia32'] FOUND: define_macros = [('SCIPY_MKL_H', None)] libraries = ['mkl_rt'] include_dirs = ['C:/Progra~1/Intel/Compos~1/mkl/include'] library_dirs = ['C:/Progra~1/Intel/Compos~1/mkl/lib/ia32'] running build running config_cc unifing config_cc, config, build_clib, build_ext, build commands --compiler options running config_fc unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options running build_src build_src building py_modules sources creating build creating build\src.win32-3.3 creating build\src.win32-3.3\numpy creating build\src.win32-3.3\numpy\distutils building library "npymath" sources Building import library (ARCH=x86): "c:\python33\libs\libpython33.a" Traceback (most recent call last): File "setup.py", line 214, in setup_package() File "setup.py", line 207, in setup_package configuration=configuration ) File "C:\src\numpy-1.7.0rc1\build\py3k\numpy\distutils\core.py", line 186, in setup return old_setup(**new_attr) File "c:\python33\lib\distutils\core.py", line 148, in setup dist.run_commands() File "c:\python33\lib\distutils\dist.py", line 917, in run_commands self.run_command(cmd) File "c:\python33\lib\distutils\dist.py", line 936, in run_command cmd_obj.run() File "C:\src\numpy-1.7.0rc1\build\py3k\numpy\distutils\command\build.py", line 37, in run old_build.run(self) File "c:\python33\lib\distutils\command\build.py", line 126, in run self.run_command(cmd_name) File "c:\python33\lib\distutils\cmd.py", line 313, in run_command self.distribution.run_command(command) File "c:\python33\lib\distutils\dist.py", line 936, in run_command cmd_obj.run() File "C:\src\numpy-1.7.0rc1\build\py3k\numpy\distutils\command\build_src.py", line 152, in run self.build_sources() File "C:\src\numpy-1.7.0rc1\build\py3k\numpy\distutils\command\build_src.py", line 163, in build_sources self.build_library_sources(*libname_info) File "C:\src\numpy-1.7.0rc1\build\py3k\numpy\distutils\command\build_src.py", line 298, in build_library_sources sources = self.generate_sources(sources, (lib_name, build_info)) File "C:\src\numpy-1.7.0rc1\build\py3k\numpy\distutils\command\build_src.py", line 385, in generate_sources source = func(extension, build_dir) File "numpy\core\setup.py", line 646, in get_mathlib_info st = config_cmd.try_link('int main(void) { return 0;}') File "c:\python33\lib\distutils\command\config.py", line 243, in try_link self._check_compiler() File "C:\src\numpy-1.7.0rc1\build\py3k\numpy\distutils\command\config.py", line 45, in _check_compiler old_config._check_compiler(self) File "c:\python33\lib\distutils\command\config.py", line 98, in _check_compiler dry_run=self.dry_run, force=1) File "C:\src\numpy-1.7.0rc1\build\py3k\numpy\distutils\ccompiler.py", line 560, in new_compiler compiler = klass(None, dry_run, force) File "C:\src\numpy-1.7.0rc1\build\py3k\numpy\distutils\mingw32ccompiler.py", line 91, in __init__ build_import_library() File "C:\src\numpy-1.7.0rc1\build\py3k\numpy\distutils\mingw32ccompiler.py", line 383, in build_import_library return _build_import_library_x86() File "C:\src\numpy-1.7.0rc1\build\py3k\numpy\distutils\mingw32ccompiler.py", line 428, in _build_import_library_x86 dlist, flist = lib2def.parse_nm(nm_output) File "C:\src\numpy-1.7.0rc1\build\py3k\numpy\distutils\lib2def.py", line 77, in parse_nm data = DATA_RE.findall(nm_output) TypeError: can't use a string pattern on a bytes-like object ======================================== Any ideas? From klonuo at gmail.com Thu Jan 10 15:20:17 2013 From: klonuo at gmail.com (klo) Date: Thu, 10 Jan 2013 21:20:17 +0100 Subject: [Numpy-discussion] Building Numpy 1.6.2 for Python 3.3 on Windows In-Reply-To: <485515987.20130110205750@gmail.com> References: <147978185.20130110165638@gmail.com> <1074049520.20130110173531@gmail.com> <50EEF5A2.9090600@uci.edu> <443695837.20130110201435@gmail.com> <485515987.20130110205750@gmail.com> Message-ID: <15313110.20130110212017@gmail.com> > Actually, this isn't over. It builds fine, but when I try to import > numpy I get error: > ======================================== > ... Sorry for the noise, after re-reading tracelog, I realized that I accidentally removed "c:\python33\libs\libpython33.a" while removing previous non-working numpy build Cheers From ondrej.certik at gmail.com Thu Jan 10 21:14:19 2013 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Thu, 10 Jan 2013 18:14:19 -0800 Subject: [Numpy-discussion] Which Python to use for Mac binaries In-Reply-To: References: Message-ID: On Wed, Jan 9, 2013 at 11:21 PM, Ralf Gommers wrote: > Sure, no problem. For the part that needs to be built on 10.6 that is. > Vincent's box still has 10.5, right? Yes. Ondrej From ondrej.certik at gmail.com Thu Jan 10 21:14:50 2013 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Thu, 10 Jan 2013 18:14:50 -0800 Subject: [Numpy-discussion] Which Python to use for Mac binaries In-Reply-To: References: Message-ID: On Thu, Jan 10, 2013 at 9:26 AM, Chris Barker - NOAA Federal wrote: > Ond?ej, Vincent, and Ralf (and others..) > > Thank you so much for doing all this -- it's a great service to the > MacPython community. Chris, thank you for your help as well! Ondrej From ondrej.certik at gmail.com Thu Jan 10 21:21:15 2013 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Thu, 10 Jan 2013 18:21:15 -0800 Subject: [Numpy-discussion] Building Numpy 1.6.2 for Python 3.3 on Windows In-Reply-To: <15313110.20130110212017@gmail.com> References: <147978185.20130110165638@gmail.com> <1074049520.20130110173531@gmail.com> <50EEF5A2.9090600@uci.edu> <443695837.20130110201435@gmail.com> <485515987.20130110205750@gmail.com> <15313110.20130110212017@gmail.com> Message-ID: On Thu, Jan 10, 2013 at 12:20 PM, klo wrote: >> Actually, this isn't over. It builds fine, but when I try to import >> numpy I get error: > >> ======================================== >> ... > > Sorry for the noise, after re-reading tracelog, I realized that I accidentally > removed "c:\python33\libs\libpython33.a" while removing previous non-working > numpy build Cool, I am glad to hear that 1.7.0rc1 works great. Thanks for letting us know. Ondrej From ndbecker2 at gmail.com Fri Jan 11 10:40:39 2013 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 11 Jan 2013 10:40:39 -0500 Subject: [Numpy-discussion] phase unwrapping (1d) Message-ID: np.unwrap was too slow, so I rolled by own (in c++). I wanted to be able to handle the case of unwrap (arg (x1) + arg (x2)) Here, phase can change by more than 2pi. I came up with the following algorithm, any thoughts? In the following, y is normally set to pi. o points to output i points to input nint1 finds nearest integer value_t prev_o = init; for (; i != e; ++i, ++o) { *o = cnt * 2 * y + *i; value_t delta = *o - prev_o; if (delta / y > 1 or delta / y < -1) { int i = nint1 (delta / (2*y)); *o -= 2*y*i; cnt -= i; } prev_o = *o; } From njs at pobox.com Fri Jan 11 17:00:55 2013 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 11 Jan 2013 22:00:55 +0000 Subject: [Numpy-discussion] return index of maximum value in an array easily? In-Reply-To: References: Message-ID: On Thu, Jan 10, 2013 at 9:40 AM, Chao YUE wrote: > Dear all, > > Are we going to consider returning the index of maximum value in an array > easily > without calling np.argmax and np.unravel_index consecutively? This does seem like a good thing to support somehow. What would a good interface look like? Something like np.nonzero(a == np.max(a))? Should we support vectorized operation (e.g. argmax of each 2-d subarray of a 3-d array along some axis)? -n From chaoyuejoy at gmail.com Fri Jan 11 18:26:13 2013 From: chaoyuejoy at gmail.com (Chao YUE) Date: Sat, 12 Jan 2013 00:26:13 +0100 Subject: [Numpy-discussion] return index of maximum value in an array easily? In-Reply-To: References: Message-ID: Hi, I don't know how others think about this. Like you point out, one can use np.nonzero(a==np.max(a)) as a workaround. For the second point, in case I have an array: a = np.arange(24.).reshape(2,3,4) suppose I want to find the index for maximum value of each 2X3 array along the 3rd dimension, what I can think of will be: index_list = [] for i in range(a.shape[-1]): data = a[...,i] index_list.append(np.nonzero(data==np.max(data))) In [87]: index_list Out[87]: [(array([1]), array([2])), (array([1]), array([2])), (array([1]), array([2])), (array([1]), array([2]))] If we want to make the np.argmax function doing the job of this part of code, could we add another some kind of boolean keyword argument, for example, "exclude" to the function? [this is only my thinking, and I am only a beginner, maybe it's stupid!!!] np.argmax(a,axis=2,exclude=True) (default value for exclude is False) it will give the index of maximum value along all other axis except the axis=2 (which is acutally the 3rd axis) The output will be: np.array(index_list).squeeze() array([[1, 2], [1, 2], [1, 2], [1, 2]]) and one can use a[1,2,i] (i=1,2,3,4) to extract the maximum value. I doubt this is really useful...... too complicated...... Chao On Fri, Jan 11, 2013 at 11:00 PM, Nathaniel Smith wrote: > On Thu, Jan 10, 2013 at 9:40 AM, Chao YUE wrote: > > Dear all, > > > > Are we going to consider returning the index of maximum value in an array > > easily > > without calling np.argmax and np.unravel_index consecutively? > > This does seem like a good thing to support somehow. What would a good > interface look like? Something like np.nonzero(a == np.max(a))? Should > we support vectorized operation (e.g. argmax of each 2-d subarray of a > 3-d array along some axis)? > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Fri Jan 11 20:33:07 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 12 Jan 2013 02:33:07 +0100 Subject: [Numpy-discussion] return index of maximum value in an array easily? In-Reply-To: References: Message-ID: <1357954387.8396.6.camel@sebastian-laptop> On Sat, 2013-01-12 at 00:26 +0100, Chao YUE wrote: > Hi, > > I don't know how others think about this. Like you point out, one can > use > np.nonzero(a==np.max(a)) as a workaround. > > For the second point, in case I have an array: > a = np.arange(24.).reshape(2,3,4) > > suppose I want to find the index for maximum value of each 2X3 array > along > the 3rd dimension, what I can think of will be: > > index_list = [] > for i in range(a.shape[-1]): > data = a[...,i] > index_list.append(np.nonzero(data==np.max(data))) > To keep being close to min/max (and other ufunc based reduce operations), it would seem consistent to allow something like np.argmax(array, axis=(1, 2)), which would give a tuple of arrays as result such that array[np.argmax(array, axis=(1,2))] == np.max(array, axis=(1,2)) But apart from consistency, I am not sure anyone would get the idea of giving multiple axes into the function... > > In [87]: > > > index_list > Out[87]: > [(array([1]), array([2])), > (array([1]), array([2])), > (array([1]), array([2])), > (array([1]), array([2]))] > > > If we want to make the np.argmax function doing the job of this part > of code, > could we add another some kind of boolean keyword argument, for > example, > "exclude" to the function? > [this is only my thinking, and I am only a beginner, maybe it's > stupid!!!] > > np.argmax(a,axis=2,exclude=True) (default value for exclude is False) > > it will give the index of maximum value along all other axis except > the axis=2 > (which is acutally the 3rd axis) > > The output will be: > > np.array(index_list).squeeze() > > array([[1, 2], > [1, 2], > [1, 2], > [1, 2]]) > > and one can use a[1,2,i] (i=1,2,3,4) to extract the maximum value. > > I doubt this is really useful...... too complicated...... > > Chao > > On Fri, Jan 11, 2013 at 11:00 PM, Nathaniel Smith > wrote: > On Thu, Jan 10, 2013 at 9:40 AM, Chao YUE > wrote: > > Dear all, > > > > Are we going to consider returning the index of maximum > value in an array > > easily > > without calling np.argmax and np.unravel_index > consecutively? > > > This does seem like a good thing to support somehow. What > would a good > interface look like? Something like np.nonzero(a == > np.max(a))? Should > we support vectorized operation (e.g. argmax of each 2-d > subarray of a > 3-d array along some axis)? > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > -- > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > ************************************************************************************ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Sun Jan 13 00:34:32 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 12 Jan 2013 22:34:32 -0700 Subject: [Numpy-discussion] How many build systems do we need? Message-ID: Hi All, In the continuing proposal for cleanups, note that we currently support three (3!) build systems, distutils, scons, and bento. That's a bit much to maintain when contemplating changes, and scons and bento both have external dependencies. Can we dispense with any of these? Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike.r.anderson.13 at gmail.com Sun Jan 13 07:08:18 2013 From: mike.r.anderson.13 at gmail.com (Mike Anderson) Date: Sun, 13 Jan 2013 20:08:18 +0800 Subject: [Numpy-discussion] Insights / lessons learned from NumPy design In-Reply-To: References: <50E68C2A.9060400@astro.uio.no> <50E68F12.90804@astro.uio.no> Message-ID: On 10 January 2013 05:19, Chris Barker - NOAA Federal wrote: > On Wed, Jan 9, 2013 at 2:57 AM, Mike Anderson > > > I'm hoping the API will be independent of storage format - i.e. the > > underlying implementations can store the data any way they like. So the > API > > will be written in terms of abstractions, and the user will have the > choice > > of whatever concrete implementation best fits the specific needs. Sparse > > matrices, tiled matrices etc. should all be possible options. > > A note about that -- as I think if it, numpy arrays are two things: > > 1) a python object for working with numbers, in a wide variety of ways > > 2) a wrapper around a C-array (or data block) that can be used to > provide an easyway for Python to interact with C (and Fortran, and...) > libraries, etc. > > As it turns out a LOT of people use numpy for (2) -- what this means > is that while you could change the underlying data representation, > etc, and keep the same Python API -- such changes would break a lot of > non-pure-python code that relies on that data representation. > > This is a big issue with the numpy-for-PyPy project -- they could > write a numpy clone, but it would only be useful for the pure-python > stuff. > > Even then, a number of folks do tricks with numpy arrays in python > that rely on the underlying structure. > > Not sure how all this would play out for Clojure, but it's something > to keep in mind. Thanks Chris - this is a really helpful insight. Trying to translate that into the Clojure world, I think that's roughly equivalent to the separation between the API (roughly equivalent to the methods in the ndarray referred to in 1) from the specific implementations (which will probably include a data block ndarray-style wrapper like 2, but would also leave open other implementation options). That way the majority of users can code purely against the API, and they won't be affected if (when?) the underlying implementation changes. In this way, they should be able to get the benefits of 2) without building a direct dependency on it. Of course, I still expect some users to circumvent the API and build a dependency on the underlying implementation. Nothing we can do to stop that, and they may even have good reasons like hardcore performance optimization. We have to assume at that point they know what they are doing and are prepared to live with the consequences :-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Jan 13 08:29:20 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 13 Jan 2013 14:29:20 +0100 Subject: [Numpy-discussion] How many build systems do we need? In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 6:34 AM, Charles R Harris wrote: > Hi All, > > In the continuing proposal for cleanups, note that we currently support > three (3!) build systems, distutils, scons, and bento. That's a bit much to > maintain when contemplating changes, and scons and bento both have external > dependencies. Can we dispense with any of these? Thoughts? > Numscons is the only one that can be dropped. I'm still using it regularly, but the few things it does better than bento can be easily improved in bento. So if removing numscons support from master saves some developer hours, +1 from me. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sun Jan 13 08:44:31 2013 From: cournape at gmail.com (David Cournapeau) Date: Sun, 13 Jan 2013 07:44:31 -0600 Subject: [Numpy-discussion] How many build systems do we need? In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 7:29 AM, Ralf Gommers wrote: > > > > On Sun, Jan 13, 2013 at 6:34 AM, Charles R Harris > wrote: >> >> Hi All, >> >> In the continuing proposal for cleanups, note that we currently support >> three (3!) build systems, distutils, scons, and bento. That's a bit much to >> maintain when contemplating changes, and scons and bento both have external >> dependencies. Can we dispense with any of these? Thoughts? > > > Numscons is the only one that can be dropped. I'm still using it regularly, > but the few things it does better than bento can be easily improved in > bento. So if removing numscons support from master saves some developer > hours, +1 from me. I think numscons was already scheduled to be dropped in 1.7 (and next version of scipy as well) ? I am certainly in favor of dropping it as well. David From ralf.gommers at gmail.com Sun Jan 13 09:03:51 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 13 Jan 2013 15:03:51 +0100 Subject: [Numpy-discussion] How many build systems do we need? In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 2:44 PM, David Cournapeau wrote: > On Sun, Jan 13, 2013 at 7:29 AM, Ralf Gommers > wrote: > > > > > > > > On Sun, Jan 13, 2013 at 6:34 AM, Charles R Harris > > wrote: > >> > >> Hi All, > >> > >> In the continuing proposal for cleanups, note that we currently support > >> three (3!) build systems, distutils, scons, and bento. That's a bit > much to > >> maintain when contemplating changes, and scons and bento both have > external > >> dependencies. Can we dispense with any of these? Thoughts? > > > > > > Numscons is the only one that can be dropped. I'm still using it > regularly, > > but the few things it does better than bento can be easily improved in > > bento. So if removing numscons support from master saves some developer > > hours, +1 from me. > > I think numscons was already scheduled to be dropped in 1.7 (and next > version of scipy as well) ? I am certainly in favor of dropping it as > well. > It was deprecated when we added Bento support, but we never decided on a timeline. Scipy is a different story, since Bento doesn't build it correctly the last few times I checked (there are some tickets by Pauli and me on the Bento issue list). Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Jan 13 09:30:24 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 13 Jan 2013 14:30:24 +0000 Subject: [Numpy-discussion] How many build systems do we need? In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 5:34 AM, Charles R Harris wrote: > Hi All, > > In the continuing proposal for cleanups, note that we currently support > three (3!) build systems, distutils, scons, and bento. That's a bit much to > maintain when contemplating changes, and scons and bento both have external > dependencies. Can we dispense with any of these? Thoughts? I think it's actually 6 build systems, because each build system supports two modes: compiling each source file separately before linking, and concatenating everything into one big file and compiling that. It's been proposed that we phase out the one-file build (which is currently the default): http://mail.scipy.org/pipermail/numpy-discussion/2012-June/063015.html https://github.com/numpy/numpy/issues/315 The separate compilation approach is superior in every way, so long as it works. There is a theory that on some system somewhere there might be a broken compiler/linker which make it not work[1], but we don't actually know of any such system. Shall we switch the default to separate compilation for 1.8 and see if anyone notices? -n [1] The problem is that we need to make sure that symbols defined in numpy .c files are visible to other numpy .c files, but not to non-numpy code linked into the same process; this is a problem that the C standard didn't consider, so it requires system-specific fiddling. However that fiddling is pretty standard these days. From charlesr.harris at gmail.com Sun Jan 13 10:47:24 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 13 Jan 2013 08:47:24 -0700 Subject: [Numpy-discussion] How many build systems do we need? In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 7:30 AM, Nathaniel Smith wrote: > On Sun, Jan 13, 2013 at 5:34 AM, Charles R Harris > wrote: > > Hi All, > > > > In the continuing proposal for cleanups, note that we currently support > > three (3!) build systems, distutils, scons, and bento. That's a bit much > to > > maintain when contemplating changes, and scons and bento both have > external > > dependencies. Can we dispense with any of these? Thoughts? > > I think it's actually 6 build systems, because each build system > supports two modes: compiling each source file separately before > linking, and concatenating everything into one big file and compiling > that. > > It's been proposed that we phase out the one-file build (which is > currently the default): > http://mail.scipy.org/pipermail/numpy-discussion/2012-June/063015.html > https://github.com/numpy/numpy/issues/315 > The separate compilation approach is superior in every way, so long as > it works. There is a theory that on some system somewhere there might > be a broken compiler/linker which make it not work[1], but we don't > actually know of any such system. > > Shall we switch the default to separate compilation for 1.8 and see if > anyone notices? > > +1 > -n > > [1] The problem is that we need to make sure that symbols defined in > numpy .c files are visible to other numpy .c files, but not to > non-numpy code linked into the same process; this is a problem that > the C standard didn't consider, so it requires system-specific > fiddling. However that fiddling is pretty standard these days. > And do we really care? I've compiled and statically linked libraries using setup.py because it is more easily portable than make, and on windows few symbols are exposed by default while on linux most are, but who looks? Exposing unneeded symbols is a bit of a wart but I don't think it matters that much for most things. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Jan 13 10:50:15 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 13 Jan 2013 08:50:15 -0700 Subject: [Numpy-discussion] How many build systems do we need? In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 6:44 AM, David Cournapeau wrote: > On Sun, Jan 13, 2013 at 7:29 AM, Ralf Gommers > wrote: > > > > > > > > On Sun, Jan 13, 2013 at 6:34 AM, Charles R Harris > > wrote: > >> > >> Hi All, > >> > >> In the continuing proposal for cleanups, note that we currently support > >> three (3!) build systems, distutils, scons, and bento. That's a bit > much to > >> maintain when contemplating changes, and scons and bento both have > external > >> dependencies. Can we dispense with any of these? Thoughts? > > > > > > Numscons is the only one that can be dropped. I'm still using it > regularly, > > but the few things it does better than bento can be easily improved in > > bento. So if removing numscons support from master saves some developer > > hours, +1 from me. > > I think numscons was already scheduled to be dropped in 1.7 (and next > version of scipy as well) ? I am certainly in favor of dropping it as > well. > Is bento documented anywhere or can you commit to keeping it working for numpy? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sun Jan 13 11:25:00 2013 From: cournape at gmail.com (David Cournapeau) Date: Sun, 13 Jan 2013 10:25:00 -0600 Subject: [Numpy-discussion] How many build systems do we need? In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 9:50 AM, Charles R Harris wrote: > > > On Sun, Jan 13, 2013 at 6:44 AM, David Cournapeau > wrote: >> >> On Sun, Jan 13, 2013 at 7:29 AM, Ralf Gommers >> wrote: >> > >> > >> > >> > On Sun, Jan 13, 2013 at 6:34 AM, Charles R Harris >> > wrote: >> >> >> >> Hi All, >> >> >> >> In the continuing proposal for cleanups, note that we currently support >> >> three (3!) build systems, distutils, scons, and bento. That's a bit >> >> much to >> >> maintain when contemplating changes, and scons and bento both have >> >> external >> >> dependencies. Can we dispense with any of these? Thoughts? >> > >> > >> > Numscons is the only one that can be dropped. I'm still using it >> > regularly, >> > but the few things it does better than bento can be easily improved in >> > bento. So if removing numscons support from master saves some developer >> > hours, +1 from me. >> >> I think numscons was already scheduled to be dropped in 1.7 (and next >> version of scipy as well) ? I am certainly in favor of dropping it as >> well. > > > Is bento documented anywhere or can you commit to keeping it working for > numpy? Both. Doc: http://bento.readthedocs.org/en/latest/ and tests (https://travis-ci.org/cournape/Bento) are continuously run/updated. David From ralf.gommers at gmail.com Sun Jan 13 12:11:18 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 13 Jan 2013 18:11:18 +0100 Subject: [Numpy-discussion] How many build systems do we need? In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 5:25 PM, David Cournapeau wrote: > On Sun, Jan 13, 2013 at 9:50 AM, Charles R Harris > wrote: > > > > > > On Sun, Jan 13, 2013 at 6:44 AM, David Cournapeau > > wrote: > >> > >> On Sun, Jan 13, 2013 at 7:29 AM, Ralf Gommers > >> wrote: > >> > > >> > > >> > > >> > On Sun, Jan 13, 2013 at 6:34 AM, Charles R Harris > >> > wrote: > >> >> > >> >> Hi All, > >> >> > >> >> In the continuing proposal for cleanups, note that we currently > support > >> >> three (3!) build systems, distutils, scons, and bento. That's a bit > >> >> much to > >> >> maintain when contemplating changes, and scons and bento both have > >> >> external > >> >> dependencies. Can we dispense with any of these? Thoughts? > >> > > >> > > >> > Numscons is the only one that can be dropped. I'm still using it > >> > regularly, > >> > but the few things it does better than bento can be easily improved in > >> > bento. So if removing numscons support from master saves some > developer > >> > hours, +1 from me. > >> > >> I think numscons was already scheduled to be dropped in 1.7 (and next > >> version of scipy as well) ? I am certainly in favor of dropping it as > >> well. > > > > > > Is bento documented anywhere or can you commit to keeping it working for > > numpy? > > Both. Doc: http://bento.readthedocs.org/en/latest/ and tests > (https://travis-ci.org/cournape/Bento) are continuously run/updated. > That's bento's own test suite only - we should add a numpy build with Bento for at least Python 2.7 to the numpy Travis config. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Jan 13 12:11:47 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 13 Jan 2013 17:11:47 +0000 Subject: [Numpy-discussion] How many build systems do we need? In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 3:47 PM, Charles R Harris wrote: > > > On Sun, Jan 13, 2013 at 7:30 AM, Nathaniel Smith wrote: >> >> On Sun, Jan 13, 2013 at 5:34 AM, Charles R Harris >> wrote: >> > Hi All, >> > >> > In the continuing proposal for cleanups, note that we currently support >> > three (3!) build systems, distutils, scons, and bento. That's a bit much >> > to >> > maintain when contemplating changes, and scons and bento both have >> > external >> > dependencies. Can we dispense with any of these? Thoughts? >> >> I think it's actually 6 build systems, because each build system >> supports two modes: compiling each source file separately before >> linking, and concatenating everything into one big file and compiling >> that. >> >> It's been proposed that we phase out the one-file build (which is >> currently the default): >> http://mail.scipy.org/pipermail/numpy-discussion/2012-June/063015.html >> https://github.com/numpy/numpy/issues/315 >> The separate compilation approach is superior in every way, so long as >> it works. There is a theory that on some system somewhere there might >> be a broken compiler/linker which make it not work[1], but we don't >> actually know of any such system. >> >> Shall we switch the default to separate compilation for 1.8 and see if >> anyone notices? >> > > +1 https://github.com/numpy/numpy/issues/2913 >> [1] The problem is that we need to make sure that symbols defined in >> numpy .c files are visible to other numpy .c files, but not to >> non-numpy code linked into the same process; this is a problem that >> the C standard didn't consider, so it requires system-specific >> fiddling. However that fiddling is pretty standard these days. > > And do we really care? I've compiled and statically linked libraries using > setup.py because it is more easily portable than make, and on windows few > symbols are exposed by default while on linux most are, but who looks? > Exposing unneeded symbols is a bit of a wart but I don't think it matters > that much for most things. I guess it's like many things in programming -- it doesn't matter until it does. (And then you realize that you should have done things properly from the start because you have a horrible hairball of "eh, does it really matter?" to sort out.) OTOH it's easy to build a static object without unneeded symbols, at least if you have access to gnu ld. I assume that most of these systems that don't have dynamic linkers still use some variant of binutils: # Let's take the .o files from separate compilation and make a single .o file suitable for static linking $ NPY_SEPARATE_COMPILATION=1 python setup.py build $ cd build/temp.*/numpy/core/src/multiarray # Combine all .o files into a single .o file, resolving all internal symbols: $ ld -r *.o -o multiarray-all.o # This file still exports a ton of junk... $ nm multiarray-all.o | wc -l 2541 # But now, we can strip out all the stuff we don't want to be public $ strip --strip-all --keep-symbol initmultiarray multiarray-all.o # Ta-da, a single-file static Python module that exports only the module setup symbol: $ nm multiarray-all.o 0000000000047a40 T initmultiarray -n From cournape at gmail.com Sun Jan 13 12:26:33 2013 From: cournape at gmail.com (David Cournapeau) Date: Sun, 13 Jan 2013 11:26:33 -0600 Subject: [Numpy-discussion] How many build systems do we need? In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 11:11 AM, Ralf Gommers wrote: > > > > On Sun, Jan 13, 2013 at 5:25 PM, David Cournapeau > wrote: >> >> On Sun, Jan 13, 2013 at 9:50 AM, Charles R Harris >> wrote: >> > >> > >> > On Sun, Jan 13, 2013 at 6:44 AM, David Cournapeau >> > wrote: >> >> >> >> On Sun, Jan 13, 2013 at 7:29 AM, Ralf Gommers >> >> wrote: >> >> > >> >> > >> >> > >> >> > On Sun, Jan 13, 2013 at 6:34 AM, Charles R Harris >> >> > wrote: >> >> >> >> >> >> Hi All, >> >> >> >> >> >> In the continuing proposal for cleanups, note that we currently >> >> >> support >> >> >> three (3!) build systems, distutils, scons, and bento. That's a bit >> >> >> much to >> >> >> maintain when contemplating changes, and scons and bento both have >> >> >> external >> >> >> dependencies. Can we dispense with any of these? Thoughts? >> >> > >> >> > >> >> > Numscons is the only one that can be dropped. I'm still using it >> >> > regularly, >> >> > but the few things it does better than bento can be easily improved >> >> > in >> >> > bento. So if removing numscons support from master saves some >> >> > developer >> >> > hours, +1 from me. >> >> >> >> I think numscons was already scheduled to be dropped in 1.7 (and next >> >> version of scipy as well) ? I am certainly in favor of dropping it as >> >> well. >> > >> > >> > Is bento documented anywhere or can you commit to keeping it working for >> > numpy? >> >> Both. Doc: http://bento.readthedocs.org/en/latest/ and tests >> (https://travis-ci.org/cournape/Bento) are continuously run/updated. > > > That's bento's own test suite only - we should add a numpy build with Bento > for at least Python 2.7 to the numpy Travis config. Definitely. I was merely answering Chuck's worries that bento may just be a one man, undocumented, bitrotted thing :) David From njs at pobox.com Sun Jan 13 12:27:54 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 13 Jan 2013 17:27:54 +0000 Subject: [Numpy-discussion] New numpy functions: filled, filled_like Message-ID: Hi all, PR 2875 adds two new functions, that generalize zeros(), ones(), zeros_like(), ones_like(), by simply taking an arbitrary fill value: https://github.com/numpy/numpy/pull/2875 So np.ones((10, 10)) is the same as np.filled((10, 10), 1) The implementations are trivial, but the API seems useful because it provides an idiomatic way of efficiently creating an array full of inf, or nan, or None, whatever funny value you need. All the alternatives are either inefficient (np.ones(...) * np.inf) or cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But there's a question of taste here; one could argue instead that these just add more clutter to the numpy namespace. So, before we merge, anyone want to chime in? (Bonus, extra bike-sheddy survey: do people prefer np.filled((10, 10), np.nan) np.filled_like(my_arr, np.nan) or np.filled(np.nan, (10, 10)) np.filled_like(np.nan, my_arr) ?) -n From josef.pktd at gmail.com Sun Jan 13 12:44:11 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 13 Jan 2013 12:44:11 -0500 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 12:27 PM, Nathaniel Smith wrote: > Hi all, > > PR 2875 adds two new functions, that generalize zeros(), ones(), > zeros_like(), ones_like(), by simply taking an arbitrary fill value: > https://github.com/numpy/numpy/pull/2875 > So > np.ones((10, 10)) > is the same as > np.filled((10, 10), 1) > > The implementations are trivial, but the API seems useful because it > provides an idiomatic way of efficiently creating an array full of > inf, or nan, or None, whatever funny value you need. All the > alternatives are either inefficient (np.ones(...) * np.inf) or > cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But > there's a question of taste here; one could argue instead that these > just add more clutter to the numpy namespace. So, before we merge, > anyone want to chime in? +1 I find it useful. I do the indirect way very often, or write matlab style helper functions. def nanes: .... problem dtype: inf and nan only makes sense for float I don't think I used many besides those two. > > (Bonus, extra bike-sheddy survey: do people prefer > np.filled((10, 10), np.nan) > np.filled_like(my_arr, np.nan) + 0.5 > or > np.filled(np.nan, (10, 10)) > np.filled_like(np.nan, my_arr) > ?) Josef > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From e.antero.tammi at gmail.com Sun Jan 13 13:27:42 2013 From: e.antero.tammi at gmail.com (eat) Date: Sun, 13 Jan 2013 20:27:42 +0200 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: Message-ID: Hi, On Sun, Jan 13, 2013 at 7:27 PM, Nathaniel Smith wrote: > Hi all, > > PR 2875 adds two new functions, that generalize zeros(), ones(), > zeros_like(), ones_like(), by simply taking an arbitrary fill value: > https://github.com/numpy/numpy/pull/2875 > So > np.ones((10, 10)) > is the same as > np.filled((10, 10), 1) > > The implementations are trivial, but the API seems useful because it > provides an idiomatic way of efficiently creating an array full of > inf, or nan, or None, whatever funny value you need. All the > alternatives are either inefficient (np.ones(...) * np.inf) or > cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But > there's a question of taste here; one could argue instead that these > just add more clutter to the numpy namespace. So, before we merge, > anyone want to chime in? > > (Bonus, extra bike-sheddy survey: do people prefer > np.filled((10, 10), np.nan) > np.filled_like(my_arr, np.nan) > +0 OTOH, it might also be handy to let val to be an array as well, which is then repeated to fill the array. My 2 cents. -eat > or > np.filled(np.nan, (10, 10)) > np.filled_like(np.nan, my_arr) > ?) > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sun Jan 13 13:30:46 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 13 Jan 2013 18:30:46 +0000 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: Message-ID: Hi, On Sun, Jan 13, 2013 at 5:27 PM, Nathaniel Smith wrote: > Hi all, > > PR 2875 adds two new functions, that generalize zeros(), ones(), > zeros_like(), ones_like(), by simply taking an arbitrary fill value: > https://github.com/numpy/numpy/pull/2875 > So > np.ones((10, 10)) > is the same as > np.filled((10, 10), 1) > > The implementations are trivial, but the API seems useful because it > provides an idiomatic way of efficiently creating an array full of > inf, or nan, or None, whatever funny value you need. All the > alternatives are either inefficient (np.ones(...) * np.inf) or > cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But > there's a question of taste here; one could argue instead that these > just add more clutter to the numpy namespace. So, before we merge, > anyone want to chime in? > > (Bonus, extra bike-sheddy survey: do people prefer > np.filled((10, 10), np.nan) > np.filled_like(my_arr, np.nan) > or > np.filled(np.nan, (10, 10)) > np.filled_like(np.nan, my_arr) > ?) I remember there has been a reluctance in the past to add functions that were two-liners. I guess the problem might be that the namespace fills up with many similar things. Is this a worry? Best, Matthew From charlesr.harris at gmail.com Sun Jan 13 13:33:51 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 13 Jan 2013 11:33:51 -0700 Subject: [Numpy-discussion] How many build systems do we need? In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 10:26 AM, David Cournapeau wrote: > On Sun, Jan 13, 2013 at 11:11 AM, Ralf Gommers > wrote: > > > > > > > > On Sun, Jan 13, 2013 at 5:25 PM, David Cournapeau > > wrote: > >> > >> On Sun, Jan 13, 2013 at 9:50 AM, Charles R Harris > >> wrote: > >> > > >> > > >> > On Sun, Jan 13, 2013 at 6:44 AM, David Cournapeau > > >> > wrote: > >> >> > >> >> On Sun, Jan 13, 2013 at 7:29 AM, Ralf Gommers < > ralf.gommers at gmail.com> > >> >> wrote: > >> >> > > >> >> > > >> >> > > >> >> > On Sun, Jan 13, 2013 at 6:34 AM, Charles R Harris > >> >> > wrote: > >> >> >> > >> >> >> Hi All, > >> >> >> > >> >> >> In the continuing proposal for cleanups, note that we currently > >> >> >> support > >> >> >> three (3!) build systems, distutils, scons, and bento. That's a > bit > >> >> >> much to > >> >> >> maintain when contemplating changes, and scons and bento both have > >> >> >> external > >> >> >> dependencies. Can we dispense with any of these? Thoughts? > >> >> > > >> >> > > >> >> > Numscons is the only one that can be dropped. I'm still using it > >> >> > regularly, > >> >> > but the few things it does better than bento can be easily improved > >> >> > in > >> >> > bento. So if removing numscons support from master saves some > >> >> > developer > >> >> > hours, +1 from me. > >> >> > >> >> I think numscons was already scheduled to be dropped in 1.7 (and next > >> >> version of scipy as well) ? I am certainly in favor of dropping it as > >> >> well. > >> > > >> > > >> > Is bento documented anywhere or can you commit to keeping it working > for > >> > numpy? > >> > >> Both. Doc: http://bento.readthedocs.org/en/latest/ and tests > >> (https://travis-ci.org/cournape/Bento) are continuously run/updated. > > > > > > That's bento's own test suite only - we should add a numpy build with > Bento > > for at least Python 2.7 to the numpy Travis config. > > Definitely. I was merely answering Chuck's worries that bento may just > be a one man, undocumented, bitrotted thing :) > > Tsk, tsk, I would never use such extreme language ;) I put up a PR expunging SCons support from numpy, could you take a look at it? There was also a file related to mingw builds on windows (in cpu_id I think) that has no bento equivalent. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Jan 13 14:03:22 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 13 Jan 2013 12:03:22 -0700 Subject: [Numpy-discussion] 1.8 release Message-ID: Now that 1.7 is nearing release, it's time to look forward to the 1.8 release. I'd like us to get back to the twice yearly schedule that we tried to maintain through the 1.3 - 1.6 releases, so I propose a June release as a goal. Call it the Spring Cleaning release. As to content, I'd like to see the following. Removal of Python 2.4-2.5 support. Removal of SCons support. The index work consolidated. Initial stab at removing the need for 2to3. See Pauli's PR for scipy. Miscellaneous enhancements and fixes. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaakko.luttinen at aalto.fi Sun Jan 13 17:46:49 2013 From: jaakko.luttinen at aalto.fi (Jaakko Luttinen) Date: Mon, 14 Jan 2013 00:46:49 +0200 Subject: [Numpy-discussion] numpydoc for python 3? In-Reply-To: <50EEDB50.5000902@aalto.fi> References: <50EDA996.6090806@aalto.fi> <50EED62B.9010105@aalto.fi> <50EEDB50.5000902@aalto.fi> Message-ID: <50F33959.7030203@aalto.fi> On 2013-01-10 17:16, Jaakko Luttinen wrote: > On 01/10/2013 05:04 PM, Pauli Virtanen wrote: >> Jaakko Luttinen aalto.fi> writes: >>> The files in numpy/doc/sphinxext/ and numpydoc/ (from PyPI) are a bit >>> different. Which ones should be modified? >> >> The stuff in sphinxext/ is the development version of the package on >> PyPi, so the changes should be made in sphinxext/ >> > > Thanks! > > I'm trying to run the tests with Python 2 using nosetests, but I get > some errors http://pastebin.com/Mp9i8T2f . Am I doing something wrong? > How should I run the tests? > If I run nosetests on the numpydoc folder from PyPI, all the tests are > successful. I'm a bit stuck trying to make numpydoc Python 3 compatible. I made setup.py try to use distutils.command.build_py.build_py_2to3 in order to transform installed code automatically to Python 3. However, the tests (in tests folder) are not part of the package but rather package_data, so they won't get transformed. How can I automatically transform the tests too? Probably there is some easy and "right" solution to this, but I haven't been able to figure out a nice and simple solution.. Any ideas? Thanks. -Jaakko From matthew.brett at gmail.com Sun Jan 13 17:53:12 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 13 Jan 2013 22:53:12 +0000 Subject: [Numpy-discussion] numpydoc for python 3? In-Reply-To: <50F33959.7030203@aalto.fi> References: <50EDA996.6090806@aalto.fi> <50EED62B.9010105@aalto.fi> <50EEDB50.5000902@aalto.fi> <50F33959.7030203@aalto.fi> Message-ID: Hi, On Sun, Jan 13, 2013 at 10:46 PM, Jaakko Luttinen wrote: > On 2013-01-10 17:16, Jaakko Luttinen wrote: >> On 01/10/2013 05:04 PM, Pauli Virtanen wrote: >>> Jaakko Luttinen aalto.fi> writes: >>>> The files in numpy/doc/sphinxext/ and numpydoc/ (from PyPI) are a bit >>>> different. Which ones should be modified? >>> >>> The stuff in sphinxext/ is the development version of the package on >>> PyPi, so the changes should be made in sphinxext/ >>> >> >> Thanks! >> >> I'm trying to run the tests with Python 2 using nosetests, but I get >> some errors http://pastebin.com/Mp9i8T2f . Am I doing something wrong? >> How should I run the tests? >> If I run nosetests on the numpydoc folder from PyPI, all the tests are >> successful. > > I'm a bit stuck trying to make numpydoc Python 3 compatible. I made > setup.py try to use distutils.command.build_py.build_py_2to3 in order to > transform installed code automatically to Python 3. However, the tests > (in tests folder) are not part of the package but rather package_data, > so they won't get transformed. How can I automatically transform the > tests too? Probably there is some easy and "right" solution to this, but > I haven't been able to figure out a nice and simple solution.. Any > ideas? Thanks. Can you add tests as a package 'numpydoc.tests' and add an __init__.py file to the 'tests' directory? You might be able to get away without 2to3, using the kind of stuff that Pauli has used for scipy recently: https://github.com/scipy/scipy/pull/397 I'm happy to help over email or chat, just let me know. Best, Matthew From efiring at hawaii.edu Sun Jan 13 18:02:01 2013 From: efiring at hawaii.edu (Eric Firing) Date: Sun, 13 Jan 2013 13:02:01 -1000 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: Message-ID: <50F33CE9.6020707@hawaii.edu> On 2013/01/13 7:27 AM, Nathaniel Smith wrote: > Hi all, > > PR 2875 adds two new functions, that generalize zeros(), ones(), > zeros_like(), ones_like(), by simply taking an arbitrary fill value: > https://github.com/numpy/numpy/pull/2875 > So > np.ones((10, 10)) > is the same as > np.filled((10, 10), 1) > > The implementations are trivial, but the API seems useful because it > provides an idiomatic way of efficiently creating an array full of > inf, or nan, or None, whatever funny value you need. All the > alternatives are either inefficient (np.ones(...) * np.inf) or > cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But > there's a question of taste here; one could argue instead that these > just add more clutter to the numpy namespace. So, before we merge, > anyone want to chime in? I'm neutral to negative as to whether it is worth adding these to the namespace; I don't mind using the "cumbersome" alternative. Note also that there is already a numpy.ma.filled() function for quite a different purpose, so putting a filled() in numpy breaks the pattern that ma has masked versions of most numpy functions. This consideration actually tips me quite a bit toward the negative side. I don't think I am unique in relying heavily on masked arrays. > > (Bonus, extra bike-sheddy survey: do people prefer > np.filled((10, 10), np.nan) > np.filled_like(my_arr, np.nan) +1 for this form if you decide to do it despite the problem mentioned above. > or > np.filled(np.nan, (10, 10)) > np.filled_like(np.nan, my_arr) This one is particularly bad for filled_like, therefore bad for both. Eric > ?) > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From robert.kern at gmail.com Sun Jan 13 18:24:48 2013 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 14 Jan 2013 00:24:48 +0100 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 6:27 PM, Nathaniel Smith wrote: > Hi all, > > PR 2875 adds two new functions, that generalize zeros(), ones(), > zeros_like(), ones_like(), by simply taking an arbitrary fill value: > https://github.com/numpy/numpy/pull/2875 > So > np.ones((10, 10)) > is the same as > np.filled((10, 10), 1) > > The implementations are trivial, but the API seems useful because it > provides an idiomatic way of efficiently creating an array full of > inf, or nan, or None, whatever funny value you need. All the > alternatives are either inefficient (np.ones(...) * np.inf) or > cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But > there's a question of taste here; one could argue instead that these > just add more clutter to the numpy namespace. So, before we merge, > anyone want to chime in? One alternative that does not expand the API with two-liners is to let the ndarray.fill() method return self: a = np.empty(...).fill(20.0) -- Robert Kern From njs at pobox.com Sun Jan 13 18:26:43 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 13 Jan 2013 23:26:43 +0000 Subject: [Numpy-discussion] 1.8 release In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 7:03 PM, Charles R Harris wrote: > Now that 1.7 is nearing release, it's time to look forward to the 1.8 > release. I'd like us to get back to the twice yearly schedule that we tried > to maintain through the 1.3 - 1.6 releases, so I propose a June release as a > goal. Call it the Spring Cleaning release. As to content, I'd like to see > the following. > > Removal of Python 2.4-2.5 support. > Removal of SCons support. > The index work consolidated. > Initial stab at removing the need for 2to3. See Pauli's PR for scipy. > Miscellaneous enhancements and fixes. I'd actually like to propose a faster release cycle than this, even. Perhaps 3 months between releases; 2 months from release n to the first beta of n+1? The consequences would be: * Changes get out to users faster. * Each release is smaller, so it's easier for downstream projects to adjust to each release -- instead of having this giant pile of changes to work through all at once every 6-12 months * End-users are less scared of updating, because the changes aren't so overwhelming, so they end up actually testing (and getting to take advantage of) the new stuff more. * We get feedback more quickly, so we can fix up whatever we break while we still know what we did. * And for larger changes, if we release them incrementally, we can get feedback before we've gone miles down the wrong path. * Releases come out on time more often -- sort of paradoxical, but with small, frequent releases, beta cycles go smoother, and it's easier to say "don't worry, I'll get it ready for next time", or "right, that patch was less done than we thought, let's take it out for now" (also this is much easier if we don't have another years worth of changes committed on top of the patch!). * If your schedule does slip, then you still end up with a <6 month release cycle. 1.6.x was branched from master in March 2011 and released in May 2011. 1.7.x was branched from master in July 2012 and still isn't out. But at least we've finally found and fixed the second to last bug! Wouldn't it be nice to have a 2-4 week beta cycle that only found trivial and expected problems? We *already* have 6 months worth of feature work in master that won't be in the *next* release. Note 1: if we do do this, then we'll also want to rethink the deprecation cycle a bit -- right now we've sort of vaguely been saying "well, we'll deprecate it in release n and take it out in n+1. Whenever that is". 3 months definitely isn't long enough for a deprecation period, so if we do do this then we'll want to deprecate things for multiple releases before actually removing them. Details to be determined. Note 2: in this kind of release schedule, you definitely don't want to say "here are the features that will be in the next release!", because then you end up slipping and sliding all over the place. Instead you say "here are some things that I want to work on next, and we'll see which release they end up in". Since we're already following the rule that nothing goes into master until it's done and tested and ready for release anyway, this doesn't really change much. Thoughts? -n From matthew.brett at gmail.com Sun Jan 13 18:28:05 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 13 Jan 2013 23:28:05 +0000 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 11:24 PM, Robert Kern wrote: > On Sun, Jan 13, 2013 at 6:27 PM, Nathaniel Smith wrote: >> Hi all, >> >> PR 2875 adds two new functions, that generalize zeros(), ones(), >> zeros_like(), ones_like(), by simply taking an arbitrary fill value: >> https://github.com/numpy/numpy/pull/2875 >> So >> np.ones((10, 10)) >> is the same as >> np.filled((10, 10), 1) >> >> The implementations are trivial, but the API seems useful because it >> provides an idiomatic way of efficiently creating an array full of >> inf, or nan, or None, whatever funny value you need. All the >> alternatives are either inefficient (np.ones(...) * np.inf) or >> cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But >> there's a question of taste here; one could argue instead that these >> just add more clutter to the numpy namespace. So, before we merge, >> anyone want to chime in? > > One alternative that does not expand the API with two-liners is to let > the ndarray.fill() method return self: > > a = np.empty(...).fill(20.0) Nice. Matthew From njs at pobox.com Sun Jan 13 18:39:09 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 13 Jan 2013 23:39:09 +0000 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 11:24 PM, Robert Kern wrote: > On Sun, Jan 13, 2013 at 6:27 PM, Nathaniel Smith wrote: >> Hi all, >> >> PR 2875 adds two new functions, that generalize zeros(), ones(), >> zeros_like(), ones_like(), by simply taking an arbitrary fill value: >> https://github.com/numpy/numpy/pull/2875 >> So >> np.ones((10, 10)) >> is the same as >> np.filled((10, 10), 1) >> >> The implementations are trivial, but the API seems useful because it >> provides an idiomatic way of efficiently creating an array full of >> inf, or nan, or None, whatever funny value you need. All the >> alternatives are either inefficient (np.ones(...) * np.inf) or >> cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But >> there's a question of taste here; one could argue instead that these >> just add more clutter to the numpy namespace. So, before we merge, >> anyone want to chime in? > > One alternative that does not expand the API with two-liners is to let > the ndarray.fill() method return self: > > a = np.empty(...).fill(20.0) This violates the convention that in-place operations never return self, to avoid confusion with out-of-place operations. E.g. ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus np.sort(), and in the broader Python world, list.sort() versus sorted(), list.reverse() versus reversed(). (This was an explicit reason given for list.sort to not return self, even.) Maybe enabling this idiom is a good enough reason to break the convention ("Special cases aren't special enough to break the rules. / Although practicality beats purity"), but it at least makes me -0 on this... (The nice thing about np.filled() is that it makes np.zeros() and np.ones() feel like clutter, rather than the reverse... not that I'm suggesting ever getting rid of them, but it makes the API conceptually feel smaller, not larger.) -n From jsseabold at gmail.com Sun Jan 13 18:48:10 2013 From: jsseabold at gmail.com (Skipper Seabold) Date: Sun, 13 Jan 2013 18:48:10 -0500 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 6:39 PM, Nathaniel Smith wrote: > On Sun, Jan 13, 2013 at 11:24 PM, Robert Kern > wrote: > > On Sun, Jan 13, 2013 at 6:27 PM, Nathaniel Smith wrote: > >> Hi all, > >> > >> PR 2875 adds two new functions, that generalize zeros(), ones(), > >> zeros_like(), ones_like(), by simply taking an arbitrary fill value: > >> https://github.com/numpy/numpy/pull/2875 > >> So > >> np.ones((10, 10)) > >> is the same as > >> np.filled((10, 10), 1) > >> > >> The implementations are trivial, but the API seems useful because it > >> provides an idiomatic way of efficiently creating an array full of > >> inf, or nan, or None, whatever funny value you need. All the > >> alternatives are either inefficient (np.ones(...) * np.inf) or > >> cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But > >> there's a question of taste here; one could argue instead that these > >> just add more clutter to the numpy namespace. So, before we merge, > >> anyone want to chime in? > > > > One alternative that does not expand the API with two-liners is to let > > the ndarray.fill() method return self: > > > > a = np.empty(...).fill(20.0) > > This violates the convention that in-place operations never return > self, to avoid confusion with out-of-place operations. E.g. > ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus > np.sort(), and in the broader Python world, list.sort() versus > sorted(), list.reverse() versus reversed(). (This was an explicit > reason given for list.sort to not return self, even.) > > Maybe enabling this idiom is a good enough reason to break the > convention ("Special cases aren't special enough to break the rules. / > Although practicality beats purity"), but it at least makes me -0 on > this... > > I tend to agree with the notion that inplace operations shouldn't return self, but I don't know if it's just because I've been conditioned this way. Not returning self breaks the fluid interface pattern [1], as noted in a similar discussion on pandas [2], FWIW, though there's likely some way to have both worlds. Skipper [1] https://en.wikipedia.org/wiki/Fluent_interface [2] https://github.com/pydata/pandas/issues/1893 -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Sun Jan 13 18:54:50 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Sun, 13 Jan 2013 18:54:50 -0500 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: Message-ID: <50F3494A.7000602@gmail.com> > On Sun, Jan 13, 2013 at 11:24 PM, Robert Kern wrote: >> One alternative that does not expand the API with two-liners is to let >> the ndarray.fill() method return self: >> >> a = np.empty(...).fill(20.0) > On 1/13/2013 6:39 PM, Nathaniel Smith wrote: > This violates the convention that in-place operations never return > self, to avoid confusion with out-of-place operations. Strongly agree. It is not worth a violation to save two keystrokes: "\na". (Three or four for a longer name, given name completion.) Alan Isaac From njs at pobox.com Sun Jan 13 19:04:59 2013 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 14 Jan 2013 00:04:59 +0000 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 11:48 PM, Skipper Seabold wrote: > On Sun, Jan 13, 2013 at 6:39 PM, Nathaniel Smith wrote: >> >> On Sun, Jan 13, 2013 at 11:24 PM, Robert Kern >> wrote: >> > On Sun, Jan 13, 2013 at 6:27 PM, Nathaniel Smith wrote: >> >> Hi all, >> >> >> >> PR 2875 adds two new functions, that generalize zeros(), ones(), >> >> zeros_like(), ones_like(), by simply taking an arbitrary fill value: >> >> https://github.com/numpy/numpy/pull/2875 >> >> So >> >> np.ones((10, 10)) >> >> is the same as >> >> np.filled((10, 10), 1) >> >> >> >> The implementations are trivial, but the API seems useful because it >> >> provides an idiomatic way of efficiently creating an array full of >> >> inf, or nan, or None, whatever funny value you need. All the >> >> alternatives are either inefficient (np.ones(...) * np.inf) or >> >> cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But >> >> there's a question of taste here; one could argue instead that these >> >> just add more clutter to the numpy namespace. So, before we merge, >> >> anyone want to chime in? >> > >> > One alternative that does not expand the API with two-liners is to let >> > the ndarray.fill() method return self: >> > >> > a = np.empty(...).fill(20.0) >> >> This violates the convention that in-place operations never return >> self, to avoid confusion with out-of-place operations. E.g. >> ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus >> np.sort(), and in the broader Python world, list.sort() versus >> sorted(), list.reverse() versus reversed(). (This was an explicit >> reason given for list.sort to not return self, even.) >> >> Maybe enabling this idiom is a good enough reason to break the >> convention ("Special cases aren't special enough to break the rules. / >> Although practicality beats purity"), but it at least makes me -0 on >> this... >> > > I tend to agree with the notion that inplace operations shouldn't return > self, but I don't know if it's just because I've been conditioned this way. > Not returning self breaks the fluid interface pattern [1], as noted in a > similar discussion on pandas [2], FWIW, though there's likely some way to > have both worlds. Ah-hah, here's the email where Guide officially proclaims that there shall be no "fluent interface" nonsense applied to in-place operators in Python, because it hurts readability (at least for Dutch people ;-)): http://mail.python.org/pipermail/python-dev/2003-October/038855.html -n From charlesr.harris at gmail.com Sun Jan 13 19:14:27 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 13 Jan 2013 17:14:27 -0700 Subject: [Numpy-discussion] 1.8 release In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 4:26 PM, Nathaniel Smith wrote: > On Sun, Jan 13, 2013 at 7:03 PM, Charles R Harris > wrote: > > Now that 1.7 is nearing release, it's time to look forward to the 1.8 > > release. I'd like us to get back to the twice yearly schedule that we > tried > > to maintain through the 1.3 - 1.6 releases, so I propose a June release > as a > > goal. Call it the Spring Cleaning release. As to content, I'd like to see > > the following. > > > > Removal of Python 2.4-2.5 support. > > Removal of SCons support. > > The index work consolidated. > > Initial stab at removing the need for 2to3. See Pauli's PR for scipy. > > Miscellaneous enhancements and fixes. > > I'd actually like to propose a faster release cycle than this, even. > Perhaps 3 months between releases; 2 months from release n to the > first beta of n+1? > > The consequences would be: > * Changes get out to users faster. > * Each release is smaller, so it's easier for downstream projects to > adjust to each release -- instead of having this giant pile of changes > to work through all at once every 6-12 months > * End-users are less scared of updating, because the changes aren't so > overwhelming, so they end up actually testing (and getting to take > advantage of) the new stuff more. > * We get feedback more quickly, so we can fix up whatever we break > while we still know what we did. > * And for larger changes, if we release them incrementally, we can get > feedback before we've gone miles down the wrong path. > * Releases come out on time more often -- sort of paradoxical, but > with small, frequent releases, beta cycles go smoother, and it's > easier to say "don't worry, I'll get it ready for next time", or > "right, that patch was less done than we thought, let's take it out > for now" (also this is much easier if we don't have another years > worth of changes committed on top of the patch!). > * If your schedule does slip, then you still end up with a <6 month > release cycle. > > 1.6.x was branched from master in March 2011 and released in May 2011. > 1.7.x was branched from master in July 2012 and still isn't out. But > at least we've finally found and fixed the second to last bug! > > Actually, the first branch was late Dec 2011, IIRC, maybe Feb 2012. We've had about a year delay and I'm not convinced it was worth it. > Wouldn't it be nice to have a 2-4 week beta cycle that only found > trivial and expected problems? We *already* have 6 months worth of > feature work in master that won't be in the *next* release. > > Note 1: if we do do this, then we'll also want to rethink the > deprecation cycle a bit -- right now we've sort of vaguely been saying > "well, we'll deprecate it in release n and take it out in n+1. > Whenever that is". 3 months definitely isn't long enough for a > deprecation period, so if we do do this then we'll want to deprecate > things for multiple releases before actually removing them. Details to > be determined. > > Deprecations should probably be time based. > Note 2: in this kind of release schedule, you definitely don't want to > say "here are the features that will be in the next release!", because > then you end up slipping and sliding all over the place. Instead you > say "here are some things that I want to work on next, and we'll see > which release they end up in". Since we're already following the rule > that nothing goes into master until it's done and tested and ready for > release anyway, this doesn't really change much. > > Thoughts? > > I think three months is a bit short. Much will depend on the release manager and I not sure what Andrej's plans are. I'd happily nominate you for that role ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sun Jan 13 19:19:59 2013 From: cournape at gmail.com (David Cournapeau) Date: Sun, 13 Jan 2013 18:19:59 -0600 Subject: [Numpy-discussion] 1.8 release In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 5:26 PM, Nathaniel Smith wrote: > On Sun, Jan 13, 2013 at 7:03 PM, Charles R Harris > wrote: >> Now that 1.7 is nearing release, it's time to look forward to the 1.8 >> release. I'd like us to get back to the twice yearly schedule that we tried >> to maintain through the 1.3 - 1.6 releases, so I propose a June release as a >> goal. Call it the Spring Cleaning release. As to content, I'd like to see >> the following. >> >> Removal of Python 2.4-2.5 support. >> Removal of SCons support. >> The index work consolidated. >> Initial stab at removing the need for 2to3. See Pauli's PR for scipy. >> Miscellaneous enhancements and fixes. > > I'd actually like to propose a faster release cycle than this, even. > Perhaps 3 months between releases; 2 months from release n to the > first beta of n+1? > > The consequences would be: > * Changes get out to users faster. > * Each release is smaller, so it's easier for downstream projects to > adjust to each release -- instead of having this giant pile of changes > to work through all at once every 6-12 months > * End-users are less scared of updating, because the changes aren't so > overwhelming, so they end up actually testing (and getting to take > advantage of) the new stuff more. > * We get feedback more quickly, so we can fix up whatever we break > while we still know what we did. > * And for larger changes, if we release them incrementally, we can get > feedback before we've gone miles down the wrong path. > * Releases come out on time more often -- sort of paradoxical, but > with small, frequent releases, beta cycles go smoother, and it's > easier to say "don't worry, I'll get it ready for next time", or > "right, that patch was less done than we thought, let's take it out > for now" (also this is much easier if we don't have another years > worth of changes committed on top of the patch!). > * If your schedule does slip, then you still end up with a <6 month > release cycle. > > 1.6.x was branched from master in March 2011 and released in May 2011. > 1.7.x was branched from master in July 2012 and still isn't out. But > at least we've finally found and fixed the second to last bug! > > Wouldn't it be nice to have a 2-4 week beta cycle that only found > trivial and expected problems? We *already* have 6 months worth of > feature work in master that won't be in the *next* release. > > Note 1: if we do do this, then we'll also want to rethink the > deprecation cycle a bit -- right now we've sort of vaguely been saying > "well, we'll deprecate it in release n and take it out in n+1. > Whenever that is". 3 months definitely isn't long enough for a > deprecation period, so if we do do this then we'll want to deprecate > things for multiple releases before actually removing them. Details to > be determined. > > Note 2: in this kind of release schedule, you definitely don't want to > say "here are the features that will be in the next release!", because > then you end up slipping and sliding all over the place. Instead you > say "here are some things that I want to work on next, and we'll see > which release they end up in". Since we're already following the rule > that nothing goes into master until it's done and tested and ready for > release anyway, this doesn't really change much. > > Thoughts? Hey, my time to have a time-machine: http://mail.scipy.org/pipermail/numpy-discussion/2008-May/033754.html I still think it is a good idea :) cheers, David From nadavh at visionsense.com Mon Jan 14 01:21:36 2013 From: nadavh at visionsense.com (Nadav Horesh) Date: Mon, 14 Jan 2013 06:21:36 +0000 Subject: [Numpy-discussion] phase unwrapping (1d) In-Reply-To: References: Message-ID: There is an unwrap function in numpy. Doesn't it work for you? Nadav ________________________________________ From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] on behalf of Neal Becker [ndbecker2 at gmail.com] Sent: 11 January 2013 17:40 To: numpy-discussion at scipy.org Subject: [Numpy-discussion] phase unwrapping (1d) np.unwrap was too slow, so I rolled by own (in c++). I wanted to be able to handle the case of unwrap (arg (x1) + arg (x2)) Here, phase can change by more than 2pi. I came up with the following algorithm, any thoughts? In the following, y is normally set to pi. o points to output i points to input nint1 finds nearest integer value_t prev_o = init; for (; i != e; ++i, ++o) { *o = cnt * 2 * y + *i; value_t delta = *o - prev_o; if (delta / y > 1 or delta / y < -1) { int i = nint1 (delta / (2*y)); *o -= 2*y*i; cnt -= i; } prev_o = *o; } _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From robert.kern at gmail.com Mon Jan 14 01:59:24 2013 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 14 Jan 2013 07:59:24 +0100 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: Message-ID: On Mon, Jan 14, 2013 at 1:04 AM, Nathaniel Smith wrote: > On Sun, Jan 13, 2013 at 11:48 PM, Skipper Seabold wrote: >> On Sun, Jan 13, 2013 at 6:39 PM, Nathaniel Smith wrote: >>> >>> On Sun, Jan 13, 2013 at 11:24 PM, Robert Kern >>> wrote: >>> > On Sun, Jan 13, 2013 at 6:27 PM, Nathaniel Smith wrote: >>> >> Hi all, >>> >> >>> >> PR 2875 adds two new functions, that generalize zeros(), ones(), >>> >> zeros_like(), ones_like(), by simply taking an arbitrary fill value: >>> >> https://github.com/numpy/numpy/pull/2875 >>> >> So >>> >> np.ones((10, 10)) >>> >> is the same as >>> >> np.filled((10, 10), 1) >>> >> >>> >> The implementations are trivial, but the API seems useful because it >>> >> provides an idiomatic way of efficiently creating an array full of >>> >> inf, or nan, or None, whatever funny value you need. All the >>> >> alternatives are either inefficient (np.ones(...) * np.inf) or >>> >> cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But >>> >> there's a question of taste here; one could argue instead that these >>> >> just add more clutter to the numpy namespace. So, before we merge, >>> >> anyone want to chime in? >>> > >>> > One alternative that does not expand the API with two-liners is to let >>> > the ndarray.fill() method return self: >>> > >>> > a = np.empty(...).fill(20.0) >>> >>> This violates the convention that in-place operations never return >>> self, to avoid confusion with out-of-place operations. E.g. >>> ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus >>> np.sort(), and in the broader Python world, list.sort() versus >>> sorted(), list.reverse() versus reversed(). (This was an explicit >>> reason given for list.sort to not return self, even.) >>> >>> Maybe enabling this idiom is a good enough reason to break the >>> convention ("Special cases aren't special enough to break the rules. / >>> Although practicality beats purity"), but it at least makes me -0 on >>> this... >>> >> >> I tend to agree with the notion that inplace operations shouldn't return >> self, but I don't know if it's just because I've been conditioned this way. >> Not returning self breaks the fluid interface pattern [1], as noted in a >> similar discussion on pandas [2], FWIW, though there's likely some way to >> have both worlds. > > Ah-hah, here's the email where Guide officially proclaims that there > shall be no "fluent interface" nonsense applied to in-place operators > in Python, because it hurts readability (at least for Dutch people > ;-)): > http://mail.python.org/pipermail/python-dev/2003-October/038855.html That's a statement about the policy for the stdlib, and just one person's opinion. You, and numpy, are permitted to have a different opinion. In any case, I'm not strongly advocating for it. It's violation of principle ("no fluent interfaces") is roughly in the same ballpark as np.filled() ("not every two-liner needs its own function"), so I thought I would toss it out there for consideration. -- Robert Kern From dave.hirschfeld at gmail.com Mon Jan 14 04:02:36 2013 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Mon, 14 Jan 2013 09:02:36 +0000 (UTC) Subject: [Numpy-discussion] =?utf-8?q?New_numpy_functions=3A_filled=2C_fil?= =?utf-8?q?led=5Flike?= References: Message-ID: Robert Kern gmail.com> writes: > > >>> > > >>> > One alternative that does not expand the API with two-liners is to let > >>> > the ndarray.fill() method return self: > >>> > > >>> > a = np.empty(...).fill(20.0) > >>> > >>> This violates the convention that in-place operations never return > >>> self, to avoid confusion with out-of-place operations. E.g. > >>> ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus > >>> np.sort(), and in the broader Python world, list.sort() versus > >>> sorted(), list.reverse() versus reversed(). (This was an explicit > >>> reason given for list.sort to not return self, even.) > >>> > >>> Maybe enabling this idiom is a good enough reason to break the > >>> convention ("Special cases aren't special enough to break the rules. / > >>> Although practicality beats purity"), but it at least makes me -0 on > >>> this... > >>> > >> > >> I tend to agree with the notion that inplace operations shouldn't return > >> self, but I don't know if it's just because I've been conditioned this way. > >> Not returning self breaks the fluid interface pattern [1], as noted in a > >> similar discussion on pandas [2], FWIW, though there's likely some way to > >> have both worlds. > > > > Ah-hah, here's the email where Guide officially proclaims that there > > shall be no "fluent interface" nonsense applied to in-place operators > > in Python, because it hurts readability (at least for Dutch people > > ): > > http://mail.python.org/pipermail/python-dev/2003-October/038855.html > > That's a statement about the policy for the stdlib, and just one > person's opinion. You, and numpy, are permitted to have a different > opinion. > > In any case, I'm not strongly advocating for it. It's violation of > principle ("no fluent interfaces") is roughly in the same ballpark as > np.filled() ("not every two-liner needs its own function"), so I > thought I would toss it out there for consideration. > > -- > Robert Kern > FWIW I'm +1 on the idea. Perhaps because I just don't see many practical downsides to breaking the convention but I regularly see a big issue with there being no way to instantiate an array with a particular value. The one obvious way to do it is use ones and multiply by the value you want. I work with a lot of inexperienced programmers and I see this idiom all the time. It takes a fair amount of numpy knowledge to know that you should do it in two lines by using empty and setting a slice. In [1]: %timeit NaN*ones(10000) 1000 loops, best of 3: 1.74 ms per loop In [2]: %%timeit ...: x = empty(10000, dtype=float) ...: x[:] = NaN ...: 10000 loops, best of 3: 28 us per loop In [3]: 1.74e-3/28e-6 Out[3]: 62.142857142857146 Even when not in the mythical "tight loop" setting an array to one and then multiplying uses up a lot of cycles - it's nearly 2 orders of magnitude slower than what we know they *should* be doing. I'm agnostic as to whether fill should be modified or new functions provided but I think numpy is currently missing this functionality and that providing it would save a lot of new users from shooting themselves in the foot performance- wise. -Dave From jaakko.luttinen at aalto.fi Mon Jan 14 05:35:43 2013 From: jaakko.luttinen at aalto.fi (Jaakko Luttinen) Date: Mon, 14 Jan 2013 12:35:43 +0200 Subject: [Numpy-discussion] numpydoc for python 3? In-Reply-To: References: <50EDA996.6090806@aalto.fi> <50EED62B.9010105@aalto.fi> <50EEDB50.5000902@aalto.fi> <50F33959.7030203@aalto.fi> Message-ID: <50F3DF7F.3060600@aalto.fi> On 01/14/2013 12:53 AM, Matthew Brett wrote: > On Sun, Jan 13, 2013 at 10:46 PM, Jaakko Luttinen > wrote: >> I'm a bit stuck trying to make numpydoc Python 3 compatible. I made >> setup.py try to use distutils.command.build_py.build_py_2to3 in order to >> transform installed code automatically to Python 3. However, the tests >> (in tests folder) are not part of the package but rather package_data, >> so they won't get transformed. How can I automatically transform the >> tests too? Probably there is some easy and "right" solution to this, but >> I haven't been able to figure out a nice and simple solution.. Any >> ideas? Thanks. > > Can you add tests as a package 'numpydoc.tests' and add an __init__.py > file to the 'tests' directory? I thought there is some reason why the 'tests' directory is not added as a package 'numpydoc.tests', so I didn't want to take that route. > You might be able to get away without 2to3, using the kind of stuff > that Pauli has used for scipy recently: > > https://github.com/scipy/scipy/pull/397 Ok, thanks, maybe I'll try to make the tests valid in all Python versions. It seems there's only one line which I'm not able to transform. In doc/sphinxext/tests/test_docscrape.py, on line 559: assert doc['Summary'][0] == u'?????????????'.encode('utf-8') This is invalid in Python 3.0-3.2. How could I write this in such a way that it is valid in all Python versions? I'm a bit lost with these unicode encodings in Python (and in general).. And I didn't want to add dependency on 'six' package. Regards, Jaakko From pierre.haessig at crans.org Mon Jan 14 06:04:57 2013 From: pierre.haessig at crans.org (Pierre Haessig) Date: Mon, 14 Jan 2013 12:04:57 +0100 Subject: [Numpy-discussion] numpydoc for python 3? In-Reply-To: <50F3DF7F.3060600@aalto.fi> References: <50EDA996.6090806@aalto.fi> <50EED62B.9010105@aalto.fi> <50EEDB50.5000902@aalto.fi> <50F33959.7030203@aalto.fi> <50F3DF7F.3060600@aalto.fi> Message-ID: <50F3E659.9070402@crans.org> Hi, Le 14/01/2013 11:35, Jaakko Luttinen a ?crit : > Ok, thanks, maybe I'll try to make the tests valid in all Python > versions. It seems there's only one line which I'm not able to transform. > > In doc/sphinxext/tests/test_docscrape.py, on line 559: > assert doc['Summary'][0] == u'?????????????'.encode('utf-8') > > This is invalid in Python 3.0-3.2. How could I write this in such a way > that it is valid in all Python versions? I'm a bit lost with these > unicode encodings in Python (and in general).. And I didn't want to add > dependency on 'six' package. Just as a side note about Python and encodings, I found great help in watching (by chance) the PyCon 2012 presentation "Pragmatic Unicode or How do I stop the Pain ?" by Ned Batchelder : http://nedbatchelder.com/text/unipain.html Now, if I understand the problem correctly, the u'xxx' syntax was reintroduced in Python 3.3 specifically to enhance the 2to3 compatibility (http://docs.python.org/3/whatsnew/3.3.html#pep-414-explicit-unicode-literals). Maybe the question is then whether it's worth supporting Python 3.0-3.2 or not ? Also, one possible rewrite of the test could be to replace the unicode string with the corresponding utf8-encoded bytes : assert doc['Summary'][0] == b'\xc3\xb6\xc3\xa4\xc3\xb6\xc3\xa4\xc3\xb6\xc3\xa4\xc3\xb6\xc3\xa4\xc3\xb6\xc3\xa5\xc3\xa5\xc3\xa5\xc3\xa5' # output of '?????????????'.encode('utf-8') (One restriction : I think the b'' prefix was introduced in Python 2.6) I'm not sure for the readability though... Best, Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From pelson.pub at gmail.com Mon Jan 14 06:59:28 2013 From: pelson.pub at gmail.com (Phil Elson) Date: Mon, 14 Jan 2013 11:59:28 +0000 Subject: [Numpy-discussion] 1.8 release In-Reply-To: References: Message-ID: I tried to suggest this for our matplotlib development cycle, but it didn't get the roaring response I was hoping for (even though I was being conservative by suggesting a 8-9 month release time): http://matplotlib.1069221.n5.nabble.com/strategy-for-1-2-x-master-PEP8-changes-tp39453p39465.html In essence, I think there is a lot of benefit in getting releases out quicker. The biggest downside, IMHO, is that those who package the binary releases have to work more frequently on what is not a particularly glamorous task. For those who are worried about the quality of releases being diminished by releasing more frequently, an LTS approach could also work. Good luck on getting these frequent releases going, IMHO there is a lot to be said for having users on the latest and greatest, rather than have users on old versions & still finding bug which were introduced 24 months ago and fixed 12 months ago on master... Cheers, Phil On 14 January 2013 00:19, David Cournapeau wrote: > On Sun, Jan 13, 2013 at 5:26 PM, Nathaniel Smith wrote: > > On Sun, Jan 13, 2013 at 7:03 PM, Charles R Harris > > wrote: > >> Now that 1.7 is nearing release, it's time to look forward to the 1.8 > >> release. I'd like us to get back to the twice yearly schedule that we > tried > >> to maintain through the 1.3 - 1.6 releases, so I propose a June release > as a > >> goal. Call it the Spring Cleaning release. As to content, I'd like to > see > >> the following. > >> > >> Removal of Python 2.4-2.5 support. > >> Removal of SCons support. > >> The index work consolidated. > >> Initial stab at removing the need for 2to3. See Pauli's PR for scipy. > >> Miscellaneous enhancements and fixes. > > > > I'd actually like to propose a faster release cycle than this, even. > > Perhaps 3 months between releases; 2 months from release n to the > > first beta of n+1? > > > > The consequences would be: > > * Changes get out to users faster. > > * Each release is smaller, so it's easier for downstream projects to > > adjust to each release -- instead of having this giant pile of changes > > to work through all at once every 6-12 months > > * End-users are less scared of updating, because the changes aren't so > > overwhelming, so they end up actually testing (and getting to take > > advantage of) the new stuff more. > > * We get feedback more quickly, so we can fix up whatever we break > > while we still know what we did. > > * And for larger changes, if we release them incrementally, we can get > > feedback before we've gone miles down the wrong path. > > * Releases come out on time more often -- sort of paradoxical, but > > with small, frequent releases, beta cycles go smoother, and it's > > easier to say "don't worry, I'll get it ready for next time", or > > "right, that patch was less done than we thought, let's take it out > > for now" (also this is much easier if we don't have another years > > worth of changes committed on top of the patch!). > > * If your schedule does slip, then you still end up with a <6 month > > release cycle. > > > > 1.6.x was branched from master in March 2011 and released in May 2011. > > 1.7.x was branched from master in July 2012 and still isn't out. But > > at least we've finally found and fixed the second to last bug! > > > > Wouldn't it be nice to have a 2-4 week beta cycle that only found > > trivial and expected problems? We *already* have 6 months worth of > > feature work in master that won't be in the *next* release. > > > > Note 1: if we do do this, then we'll also want to rethink the > > deprecation cycle a bit -- right now we've sort of vaguely been saying > > "well, we'll deprecate it in release n and take it out in n+1. > > Whenever that is". 3 months definitely isn't long enough for a > > deprecation period, so if we do do this then we'll want to deprecate > > things for multiple releases before actually removing them. Details to > > be determined. > > > > Note 2: in this kind of release schedule, you definitely don't want to > > say "here are the features that will be in the next release!", because > > then you end up slipping and sliding all over the place. Instead you > > say "here are some things that I want to work on next, and we'll see > > which release they end up in". Since we're already following the rule > > that nothing goes into master until it's done and tested and ready for > > release anyway, this doesn't really change much. > > > > Thoughts? > > Hey, my time to have a time-machine: > http://mail.scipy.org/pipermail/numpy-discussion/2008-May/033754.html > > I still think it is a good idea :) > > cheers, > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Jan 14 07:18:49 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 14 Jan 2013 12:18:49 +0000 Subject: [Numpy-discussion] 1.8 release In-Reply-To: References: Message-ID: Hi, On Mon, Jan 14, 2013 at 12:19 AM, David Cournapeau wrote: > On Sun, Jan 13, 2013 at 5:26 PM, Nathaniel Smith wrote: >> On Sun, Jan 13, 2013 at 7:03 PM, Charles R Harris >> wrote: >>> Now that 1.7 is nearing release, it's time to look forward to the 1.8 >>> release. I'd like us to get back to the twice yearly schedule that we tried >>> to maintain through the 1.3 - 1.6 releases, so I propose a June release as a >>> goal. Call it the Spring Cleaning release. As to content, I'd like to see >>> the following. >>> >>> Removal of Python 2.4-2.5 support. >>> Removal of SCons support. >>> The index work consolidated. >>> Initial stab at removing the need for 2to3. See Pauli's PR for scipy. >>> Miscellaneous enhancements and fixes. >> >> I'd actually like to propose a faster release cycle than this, even. >> Perhaps 3 months between releases; 2 months from release n to the >> first beta of n+1? >> >> The consequences would be: >> * Changes get out to users faster. >> * Each release is smaller, so it's easier for downstream projects to >> adjust to each release -- instead of having this giant pile of changes >> to work through all at once every 6-12 months >> * End-users are less scared of updating, because the changes aren't so >> overwhelming, so they end up actually testing (and getting to take >> advantage of) the new stuff more. >> * We get feedback more quickly, so we can fix up whatever we break >> while we still know what we did. >> * And for larger changes, if we release them incrementally, we can get >> feedback before we've gone miles down the wrong path. >> * Releases come out on time more often -- sort of paradoxical, but >> with small, frequent releases, beta cycles go smoother, and it's >> easier to say "don't worry, I'll get it ready for next time", or >> "right, that patch was less done than we thought, let's take it out >> for now" (also this is much easier if we don't have another years >> worth of changes committed on top of the patch!). >> * If your schedule does slip, then you still end up with a <6 month >> release cycle. >> >> 1.6.x was branched from master in March 2011 and released in May 2011. >> 1.7.x was branched from master in July 2012 and still isn't out. But >> at least we've finally found and fixed the second to last bug! >> >> Wouldn't it be nice to have a 2-4 week beta cycle that only found >> trivial and expected problems? We *already* have 6 months worth of >> feature work in master that won't be in the *next* release. >> >> Note 1: if we do do this, then we'll also want to rethink the >> deprecation cycle a bit -- right now we've sort of vaguely been saying >> "well, we'll deprecate it in release n and take it out in n+1. >> Whenever that is". 3 months definitely isn't long enough for a >> deprecation period, so if we do do this then we'll want to deprecate >> things for multiple releases before actually removing them. Details to >> be determined. >> >> Note 2: in this kind of release schedule, you definitely don't want to >> say "here are the features that will be in the next release!", because >> then you end up slipping and sliding all over the place. Instead you >> say "here are some things that I want to work on next, and we'll see >> which release they end up in". Since we're already following the rule >> that nothing goes into master until it's done and tested and ready for >> release anyway, this doesn't really change much. >> >> Thoughts? > > Hey, my time to have a time-machine: > http://mail.scipy.org/pipermail/numpy-discussion/2008-May/033754.html > > I still think it is a good idea :) I guess it is the release manager who has by far the largest say in this. Who will that be for the next year or so? Best, Matthew From pierre.haessig at crans.org Mon Jan 14 07:38:46 2013 From: pierre.haessig at crans.org (Pierre Haessig) Date: Mon, 14 Jan 2013 13:38:46 +0100 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: Message-ID: <50F3FC56.8000100@crans.org> Hi, Le 14/01/2013 00:39, Nathaniel Smith a ?crit : > (The nice thing about np.filled() is that it makes np.zeros() and > np.ones() feel like clutter, rather than the reverse... not that I'm > suggesting ever getting rid of them, but it makes the API conceptually > feel smaller, not larger.) Coming from the Matlab syntax, I feel that np.zeros and np.ones are in numpy for Matlab (and maybe others ?) compatibilty and are useful for that. Now that I've been "enlightened" by Python, I think that those functions (especially np.ones) are indeed clutter. Therefore I favor the introduction of these two new functions. However, I think Eric's remark about masked array API compatibility is important. I don't know what other names are possible ? np.const ? Or maybe np.tile is also useful for that same purpose ? In that case adding a dtype argument to np.tile would be useful. best, Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From matthew.brett at gmail.com Mon Jan 14 07:44:40 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 14 Jan 2013 12:44:40 +0000 Subject: [Numpy-discussion] numpydoc for python 3? In-Reply-To: <50F3DF7F.3060600@aalto.fi> References: <50EDA996.6090806@aalto.fi> <50EED62B.9010105@aalto.fi> <50EEDB50.5000902@aalto.fi> <50F33959.7030203@aalto.fi> <50F3DF7F.3060600@aalto.fi> Message-ID: Hi, On Mon, Jan 14, 2013 at 10:35 AM, Jaakko Luttinen wrote: > On 01/14/2013 12:53 AM, Matthew Brett wrote: >> On Sun, Jan 13, 2013 at 10:46 PM, Jaakko Luttinen >> wrote: >>> I'm a bit stuck trying to make numpydoc Python 3 compatible. I made >>> setup.py try to use distutils.command.build_py.build_py_2to3 in order to >>> transform installed code automatically to Python 3. However, the tests >>> (in tests folder) are not part of the package but rather package_data, >>> so they won't get transformed. How can I automatically transform the >>> tests too? Probably there is some easy and "right" solution to this, but >>> I haven't been able to figure out a nice and simple solution.. Any >>> ideas? Thanks. >> >> Can you add tests as a package 'numpydoc.tests' and add an __init__.py >> file to the 'tests' directory? > > I thought there is some reason why the 'tests' directory is not added as > a package 'numpydoc.tests', so I didn't want to take that route. I think the only reason is so that people can't import 'numpydoc.tests' in case they get confused. We (nipy.org/nipy etc) used to use packagedata for tests, but then we lost interest in preventing people doing the import, and started to enjoy being able to port things across as packages, do relative imports, run 2to3 and so on. So, I'd just go for it. >> You might be able to get away without 2to3, using the kind of stuff >> that Pauli has used for scipy recently: >> >> https://github.com/scipy/scipy/pull/397 > > Ok, thanks, maybe I'll try to make the tests valid in all Python > versions. It seems there's only one line which I'm not able to transform. > > In doc/sphinxext/tests/test_docscrape.py, on line 559: > assert doc['Summary'][0] == u'?????????????'.encode('utf-8') > > This is invalid in Python 3.0-3.2. How could I write this in such a way > that it is valid in all Python versions? I'm a bit lost with these > unicode encodings in Python (and in general).. And I didn't want to add > dependency on 'six' package. Pierre's suggestion is good; you can also do something like this: # -*- coding: utf8 -*- import sys if sys.version_info[0] >= 3: a = '?????????????' else: a = unicode('?????????????', 'utf8') The 'coding' line has to be the first or second line in the file. Best, Matthew From ndbecker2 at gmail.com Mon Jan 14 07:56:59 2013 From: ndbecker2 at gmail.com (Neal Becker) Date: Mon, 14 Jan 2013 07:56:59 -0500 Subject: [Numpy-discussion] phase unwrapping (1d) References: Message-ID: Nadav Horesh wrote: > There is an unwrap function in numpy. Doesn't it work for you? > Like I had said, np.unwrap was too slow. Profiling showed it eating up an absurd proportion of time. My c++ code was much better (although still surprisingly slow). From pierre.haessig at crans.org Mon Jan 14 08:08:34 2013 From: pierre.haessig at crans.org (Pierre Haessig) Date: Mon, 14 Jan 2013 14:08:34 +0100 Subject: [Numpy-discussion] phase unwrapping (1d) In-Reply-To: References: Message-ID: <50F40352.9090603@crans.org> Hi Neal, Le 11/01/2013 16:40, Neal Becker a ?crit : > I wanted to be able to handle the case of > > unwrap (arg (x1) + arg (x2)) > > Here, phase can change by more than 2pi. It's not clear to me what you mean by "change more than 2pi" ? Do you mean that the consecutive points of in input can increase by more than 2pi ? If that's the case, I feel like there is no a priori information in the data to detect such a "giant leap". Also, I copy-paste here for reference the numpy.wrap code from [1] : def unwrap(p, discont=pi, axis=-1): p = asarray(p) nd = len(p.shape) dd = diff(p, axis=axis) slice1 = [slice(None, None)]*nd # full slices slice1[axis] = slice(1, None) ddmod = mod(dd+pi, 2*pi)-pi _nx.copyto(ddmod, pi, where=(ddmod==-pi) & (dd > 0)) ph_correct = ddmod - dd; _nx.copyto(ph_correct, 0, where=abs(dd) From mike.r.anderson.13 at gmail.com Mon Jan 14 08:56:35 2013 From: mike.r.anderson.13 at gmail.com (Mike Anderson) Date: Mon, 14 Jan 2013 21:56:35 +0800 Subject: [Numpy-discussion] Insights / lessons learned from NumPy design In-Reply-To: References: Message-ID: Just wanted to say a big thanks to everyone in the NumPy community who has commented on this topic - it's given us a lot to think about and a lot of good ideas to work into the design! Best regards, Mike. On 4 January 2013 14:29, Mike Anderson wrote: > Hello all, > > In the Clojure community there has been some discussion about creating a > common matrix maths library / API. Currently there are a few different > fledgeling matrix libraries in Clojure, so it seemed like a worthwhile > effort to unify them and have a common base on which to build on. > > NumPy has been something of an inspiration for this, so I though I'd ask > here to see what lessons have been learned. > > We're thinking of a matrix library with roughly the following design > (subject to change!) > - Support for multi-dimensional matrices (but with fast paths for 1D > vectors and 2D matrices as the common cases) > - Immutability by default, i.e. matrix operations are pure functions that > create new matrices. There could be a "backdoor" option to mutate matrices, > but that would be unidiomatic in Clojure > - Support for 64-bit double precision floats only (this is the standard > float type in Clojure) > - Ability to support multiple different back-end matrix implementations > (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.) > - A full range of matrix operations. Operations would be delegated to back > end implementations where they are supported, otherwise generic > implementations could be used. > > Any thoughts on this topic based on the NumPy experience? In particular > would be very interesting to know: > - Features in NumPy which proved to be redundant / not worth the effort > - Features that you wish had been designed in at the start > - Design decisions that turned out to be a particularly big mistake / > success > > Would love to hear your insights, any ideas+advice greatly appreciated! > > Mike. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Mon Jan 14 09:39:34 2013 From: ndbecker2 at gmail.com (Neal Becker) Date: Mon, 14 Jan 2013 09:39:34 -0500 Subject: [Numpy-discussion] phase unwrapping (1d) References: <50F40352.9090603@crans.org> Message-ID: This code should explain all: -------------------------------- import numpy as np arg = np.angle def nint (x): return int (x + 0.5) if x >= 0 else int (x - 0.5) def unwrap (inp, y=np.pi, init=0, cnt=0): o = np.empty_like (inp) prev_o = init for i in range (len (inp)): o[i] = cnt * 2 * y + inp[i] delta = o[i] - prev_o if delta / y > 1 or delta / y < -1: n = nint (delta / (2*y)) o[i] -= 2*y*n cnt -= n prev_o = o[i] return o u = np.linspace (0, 400, 100) * np.pi/100 v = np.cos (u) + 1j * np.sin (u) plot (arg(v)) plot (arg(v) + arg (v)) plot (unwrap (arg (v))) plot (unwrap (arg (v) + arg (v))) ------------------------------- Pierre Haessig wrote: > Hi Neal, > > Le 11/01/2013 16:40, Neal Becker a ?crit : >> I wanted to be able to handle the case of >> >> unwrap (arg (x1) + arg (x2)) >> >> Here, phase can change by more than 2pi. > It's not clear to me what you mean by "change more than 2pi" ? Do you > mean that the consecutive points of in input can increase by more than > 2pi ? If that's the case, I feel like there is no a priori information > in the data to detect such a "giant leap". > > Also, I copy-paste here for reference the numpy.wrap code from [1] : > > def unwrap(p, discont=pi, axis=-1): > p = asarray(p) > nd = len(p.shape) > dd = diff(p, axis=axis) > slice1 = [slice(None, None)]*nd # full slices > slice1[axis] = slice(1, None) > ddmod = mod(dd+pi, 2*pi)-pi > _nx.copyto(ddmod, pi, where=(ddmod==-pi) & (dd > 0)) > ph_correct = ddmod - dd; > _nx.copyto(ph_correct, 0, where=abs(dd) up = array(p, copy=True, dtype='d') > up[slice1] = p[slice1] + ph_correct.cumsum(axis) > return up > > I don't know why it's too slow though. It looks well vectorized. > > Coming back to your C algorithm, I'm not C guru so that I don't have a > clear picture of what it's doing. Do you have a Python prototype ? > > Best, > Pierre > > [1] > https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py#L1117 From ben.root at ou.edu Mon Jan 14 09:57:17 2013 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 14 Jan 2013 09:57:17 -0500 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: <50F3FC56.8000100@crans.org> References: <50F3FC56.8000100@crans.org> Message-ID: On Mon, Jan 14, 2013 at 7:38 AM, Pierre Haessig wrote: > Hi, > > Le 14/01/2013 00:39, Nathaniel Smith a ?crit : > > (The nice thing about np.filled() is that it makes np.zeros() and > > np.ones() feel like clutter, rather than the reverse... not that I'm > > suggesting ever getting rid of them, but it makes the API conceptually > > feel smaller, not larger.) > Coming from the Matlab syntax, I feel that np.zeros and np.ones are in > numpy for Matlab (and maybe others ?) compatibilty and are useful for > that. Now that I've been "enlightened" by Python, I think that those > functions (especially np.ones) are indeed clutter. Therefore I favor the > introduction of these two new functions. > > However, I think Eric's remark about masked array API compatibility is > important. I don't know what other names are possible ? np.const ? > > Or maybe np.tile is also useful for that same purpose ? In that case > adding a dtype argument to np.tile would be useful. > > best, > Pierre > > I am also +1 on the idea of having a filled() and filled_like() function (I learned a long time ago to just do a = np.empty() and a.fill() rather than the multiplication trick I learned from Matlab). However, the collision with the masked array API is a non-starter for me. np.const() and np.const_like() probably make the most sense, but I would prefer a verb over a noun. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Mon Jan 14 10:12:47 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Mon, 14 Jan 2013 10:12:47 -0500 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: <50F3FC56.8000100@crans.org> Message-ID: Why not optimize NumPy to detect a mul of an ndarray by a scalar to call fill? That way, "np.empty * 2" will be as fast as "x=np.empty; x.fill(2)"? Fred On Mon, Jan 14, 2013 at 9:57 AM, Benjamin Root wrote: > > > On Mon, Jan 14, 2013 at 7:38 AM, Pierre Haessig > wrote: >> >> Hi, >> >> Le 14/01/2013 00:39, Nathaniel Smith a ?crit : >> > (The nice thing about np.filled() is that it makes np.zeros() and >> > np.ones() feel like clutter, rather than the reverse... not that I'm >> > suggesting ever getting rid of them, but it makes the API conceptually >> > feel smaller, not larger.) >> Coming from the Matlab syntax, I feel that np.zeros and np.ones are in >> numpy for Matlab (and maybe others ?) compatibilty and are useful for >> that. Now that I've been "enlightened" by Python, I think that those >> functions (especially np.ones) are indeed clutter. Therefore I favor the >> introduction of these two new functions. >> >> However, I think Eric's remark about masked array API compatibility is >> important. I don't know what other names are possible ? np.const ? >> >> Or maybe np.tile is also useful for that same purpose ? In that case >> adding a dtype argument to np.tile would be useful. >> >> best, >> Pierre >> > > I am also +1 on the idea of having a filled() and filled_like() function (I > learned a long time ago to just do a = np.empty() and a.fill() rather than > the multiplication trick I learned from Matlab). However, the collision > with the masked array API is a non-starter for me. np.const() and > np.const_like() probably make the most sense, but I would prefer a verb over > a noun. > > Ben Root > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From robince at gmail.com Mon Jan 14 10:21:57 2013 From: robince at gmail.com (Robin) Date: Mon, 14 Jan 2013 15:21:57 +0000 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: <50F3FC56.8000100@crans.org> Message-ID: On Mon, Jan 14, 2013 at 2:57 PM, Benjamin Root wrote: > I am also +1 on the idea of having a filled() and filled_like() function (I > learned a long time ago to just do a = np.empty() and a.fill() rather than > the multiplication trick I learned from Matlab). However, the collision > with the masked array API is a non-starter for me. np.const() and > np.const_like() probably make the most sense, but I would prefer a verb over > a noun. To get an array of 1's, you call np.ones(shape), to get an array of 0's you call np.zeros(shape) so to get an array of val's why not call np.vals(shape, val)? Cheers Robins From robert.kern at gmail.com Mon Jan 14 10:32:27 2013 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 14 Jan 2013 16:32:27 +0100 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: <50F3FC56.8000100@crans.org> Message-ID: On Mon, Jan 14, 2013 at 4:12 PM, Fr?d?ric Bastien wrote: > Why not optimize NumPy to detect a mul of an ndarray by a scalar to > call fill? That way, "np.empty * 2" will be as fast as "x=np.empty; > x.fill(2)"? In general, each element of an array will be different, so the result of the multiplication will be different, so fill can not be used. -- Robert Kern From matthew.brett at gmail.com Mon Jan 14 10:35:49 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 14 Jan 2013 15:35:49 +0000 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: Message-ID: Hi, On Mon, Jan 14, 2013 at 9:02 AM, Dave Hirschfeld wrote: > Robert Kern gmail.com> writes: > >> >> >>> > >> >>> > One alternative that does not expand the API with two-liners is to let >> >>> > the ndarray.fill() method return self: >> >>> > >> >>> > a = np.empty(...).fill(20.0) >> >>> >> >>> This violates the convention that in-place operations never return >> >>> self, to avoid confusion with out-of-place operations. E.g. >> >>> ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus >> >>> np.sort(), and in the broader Python world, list.sort() versus >> >>> sorted(), list.reverse() versus reversed(). (This was an explicit >> >>> reason given for list.sort to not return self, even.) >> >>> >> >>> Maybe enabling this idiom is a good enough reason to break the >> >>> convention ("Special cases aren't special enough to break the rules. / >> >>> Although practicality beats purity"), but it at least makes me -0 on >> >>> this... >> >>> >> >> >> >> I tend to agree with the notion that inplace operations shouldn't return >> >> self, but I don't know if it's just because I've been conditioned this way. >> >> Not returning self breaks the fluid interface pattern [1], as noted in a >> >> similar discussion on pandas [2], FWIW, though there's likely some way to >> >> have both worlds. >> > >> > Ah-hah, here's the email where Guide officially proclaims that there >> > shall be no "fluent interface" nonsense applied to in-place operators >> > in Python, because it hurts readability (at least for Dutch people >> > ): >> > http://mail.python.org/pipermail/python-dev/2003-October/038855.html >> >> That's a statement about the policy for the stdlib, and just one >> person's opinion. You, and numpy, are permitted to have a different >> opinion. >> >> In any case, I'm not strongly advocating for it. It's violation of >> principle ("no fluent interfaces") is roughly in the same ballpark as >> np.filled() ("not every two-liner needs its own function"), so I >> thought I would toss it out there for consideration. >> >> -- >> Robert Kern >> > > FWIW I'm +1 on the idea. Perhaps because I just don't see many practical > downsides to breaking the convention but I regularly see a big issue with there > being no way to instantiate an array with a particular value. > > The one obvious way to do it is use ones and multiply by the value you want. I > work with a lot of inexperienced programmers and I see this idiom all the time. > It takes a fair amount of numpy knowledge to know that you should do it in two > lines by using empty and setting a slice. > > In [1]: %timeit NaN*ones(10000) > 1000 loops, best of 3: 1.74 ms per loop > > In [2]: %%timeit > ...: x = empty(10000, dtype=float) > ...: x[:] = NaN > ...: > 10000 loops, best of 3: 28 us per loop > > In [3]: 1.74e-3/28e-6 > Out[3]: 62.142857142857146 > > > Even when not in the mythical "tight loop" setting an array to one and then > multiplying uses up a lot of cycles - it's nearly 2 orders of magnitude slower > than what we know they *should* be doing. > > I'm agnostic as to whether fill should be modified or new functions provided but > I think numpy is currently missing this functionality and that providing it > would save a lot of new users from shooting themselves in the foot performance- > wise. Is this a fair summary? => fill(shape, val), fill_like(arr, val) - new functions, as proposed For: readable, seems to fit a pattern often used, presence in namespace may clue people into using the 'fill' rather than * val or + val Con: a very simple alias for a = ones(shape) ; a.fill(val), maybe cluttering already full namespace. => empty(shape).fill(val) - by allowing return value from arr.fill(val) For: readable Con: breaks guideline not to return anything from in-place operations, no presence in namespace means users may not find this pattern. => no new API For : easy maintenance Con : harder for users to discover fill pattern, filling a new array requires two lines instead of one. So maybe the decision rests on: How important is it that users see these function names in the namespace in order to discover the pattern "a = ones(shape) ; a.fill(val)"? How important is it to obey guidelines for no-return-from-in-place? How important is it to avoid expanding the namespace? How common is this pattern? On the last, I'd say that the only common use I have for this pattern is to fill an array with NaN. Cheers, Matthew From shish at keba.be Mon Jan 14 11:15:30 2013 From: shish at keba.be (Olivier Delalleau) Date: Mon, 14 Jan 2013 11:15:30 -0500 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: Message-ID: 2013/1/14 Matthew Brett : > Hi, > > On Mon, Jan 14, 2013 at 9:02 AM, Dave Hirschfeld > wrote: >> Robert Kern gmail.com> writes: >> >>> >>> >>> > >>> >>> > One alternative that does not expand the API with two-liners is to let >>> >>> > the ndarray.fill() method return self: >>> >>> > >>> >>> > a = np.empty(...).fill(20.0) >>> >>> >>> >>> This violates the convention that in-place operations never return >>> >>> self, to avoid confusion with out-of-place operations. E.g. >>> >>> ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus >>> >>> np.sort(), and in the broader Python world, list.sort() versus >>> >>> sorted(), list.reverse() versus reversed(). (This was an explicit >>> >>> reason given for list.sort to not return self, even.) >>> >>> >>> >>> Maybe enabling this idiom is a good enough reason to break the >>> >>> convention ("Special cases aren't special enough to break the rules. / >>> >>> Although practicality beats purity"), but it at least makes me -0 on >>> >>> this... >>> >>> >>> >> >>> >> I tend to agree with the notion that inplace operations shouldn't return >>> >> self, but I don't know if it's just because I've been conditioned this way. >>> >> Not returning self breaks the fluid interface pattern [1], as noted in a >>> >> similar discussion on pandas [2], FWIW, though there's likely some way to >>> >> have both worlds. >>> > >>> > Ah-hah, here's the email where Guide officially proclaims that there >>> > shall be no "fluent interface" nonsense applied to in-place operators >>> > in Python, because it hurts readability (at least for Dutch people >>> > ): >>> > http://mail.python.org/pipermail/python-dev/2003-October/038855.html >>> >>> That's a statement about the policy for the stdlib, and just one >>> person's opinion. You, and numpy, are permitted to have a different >>> opinion. >>> >>> In any case, I'm not strongly advocating for it. It's violation of >>> principle ("no fluent interfaces") is roughly in the same ballpark as >>> np.filled() ("not every two-liner needs its own function"), so I >>> thought I would toss it out there for consideration. >>> >>> -- >>> Robert Kern >>> >> >> FWIW I'm +1 on the idea. Perhaps because I just don't see many practical >> downsides to breaking the convention but I regularly see a big issue with there >> being no way to instantiate an array with a particular value. >> >> The one obvious way to do it is use ones and multiply by the value you want. I >> work with a lot of inexperienced programmers and I see this idiom all the time. >> It takes a fair amount of numpy knowledge to know that you should do it in two >> lines by using empty and setting a slice. >> >> In [1]: %timeit NaN*ones(10000) >> 1000 loops, best of 3: 1.74 ms per loop >> >> In [2]: %%timeit >> ...: x = empty(10000, dtype=float) >> ...: x[:] = NaN >> ...: >> 10000 loops, best of 3: 28 us per loop >> >> In [3]: 1.74e-3/28e-6 >> Out[3]: 62.142857142857146 >> >> >> Even when not in the mythical "tight loop" setting an array to one and then >> multiplying uses up a lot of cycles - it's nearly 2 orders of magnitude slower >> than what we know they *should* be doing. >> >> I'm agnostic as to whether fill should be modified or new functions provided but >> I think numpy is currently missing this functionality and that providing it >> would save a lot of new users from shooting themselves in the foot performance- >> wise. > > Is this a fair summary? > > => fill(shape, val), fill_like(arr, val) - new functions, as proposed > For: readable, seems to fit a pattern often used, presence in > namespace may clue people into using the 'fill' rather than * val or + > val > Con: a very simple alias for a = ones(shape) ; a.fill(val), maybe > cluttering already full namespace. > > => empty(shape).fill(val) - by allowing return value from arr.fill(val) > For: readable > Con: breaks guideline not to return anything from in-place operations, > no presence in namespace means users may not find this pattern. > > => no new API > For : easy maintenance > Con : harder for users to discover fill pattern, filling a new array > requires two lines instead of one. > > So maybe the decision rests on: > > How important is it that users see these function names in the > namespace in order to discover the pattern "a = ones(shape) ; > a.fill(val)"? > > How important is it to obey guidelines for no-return-from-in-place? > > How important is it to avoid expanding the namespace? > > How common is this pattern? > > On the last, I'd say that the only common use I have for this pattern > is to fill an array with NaN. My 2 cts from a user perspective: - +1 to have such a function. I usually use numpy.ones * scalar because honestly, spending two lines of code for such a basic operations seems like a waste. Even if it's slower and potentially dangerous due to casting rules. - I think having a noun rather than a verb makes more sense since we have numpy.ones and numpy.zeros (and I always read "numpy.empty" as "give me an empty array", not "empty an array"). - I agree the name collision with np.ma.filled is a problem. I have no better suggestion though at this point. -=- Olivier From josef.pktd at gmail.com Mon Jan 14 11:22:40 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 14 Jan 2013 11:22:40 -0500 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: Message-ID: On Mon, Jan 14, 2013 at 11:15 AM, Olivier Delalleau wrote: > 2013/1/14 Matthew Brett : >> Hi, >> >> On Mon, Jan 14, 2013 at 9:02 AM, Dave Hirschfeld >> wrote: >>> Robert Kern gmail.com> writes: >>> >>>> >>>> >>> > >>>> >>> > One alternative that does not expand the API with two-liners is to let >>>> >>> > the ndarray.fill() method return self: >>>> >>> > >>>> >>> > a = np.empty(...).fill(20.0) >>>> >>> >>>> >>> This violates the convention that in-place operations never return >>>> >>> self, to avoid confusion with out-of-place operations. E.g. >>>> >>> ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus >>>> >>> np.sort(), and in the broader Python world, list.sort() versus >>>> >>> sorted(), list.reverse() versus reversed(). (This was an explicit >>>> >>> reason given for list.sort to not return self, even.) >>>> >>> >>>> >>> Maybe enabling this idiom is a good enough reason to break the >>>> >>> convention ("Special cases aren't special enough to break the rules. / >>>> >>> Although practicality beats purity"), but it at least makes me -0 on >>>> >>> this... >>>> >>> >>>> >> >>>> >> I tend to agree with the notion that inplace operations shouldn't return >>>> >> self, but I don't know if it's just because I've been conditioned this way. >>>> >> Not returning self breaks the fluid interface pattern [1], as noted in a >>>> >> similar discussion on pandas [2], FWIW, though there's likely some way to >>>> >> have both worlds. >>>> > >>>> > Ah-hah, here's the email where Guide officially proclaims that there >>>> > shall be no "fluent interface" nonsense applied to in-place operators >>>> > in Python, because it hurts readability (at least for Dutch people >>>> > ): >>>> > http://mail.python.org/pipermail/python-dev/2003-October/038855.html >>>> >>>> That's a statement about the policy for the stdlib, and just one >>>> person's opinion. You, and numpy, are permitted to have a different >>>> opinion. >>>> >>>> In any case, I'm not strongly advocating for it. It's violation of >>>> principle ("no fluent interfaces") is roughly in the same ballpark as >>>> np.filled() ("not every two-liner needs its own function"), so I >>>> thought I would toss it out there for consideration. >>>> >>>> -- >>>> Robert Kern >>>> >>> >>> FWIW I'm +1 on the idea. Perhaps because I just don't see many practical >>> downsides to breaking the convention but I regularly see a big issue with there >>> being no way to instantiate an array with a particular value. >>> >>> The one obvious way to do it is use ones and multiply by the value you want. I >>> work with a lot of inexperienced programmers and I see this idiom all the time. >>> It takes a fair amount of numpy knowledge to know that you should do it in two >>> lines by using empty and setting a slice. >>> >>> In [1]: %timeit NaN*ones(10000) >>> 1000 loops, best of 3: 1.74 ms per loop >>> >>> In [2]: %%timeit >>> ...: x = empty(10000, dtype=float) >>> ...: x[:] = NaN >>> ...: >>> 10000 loops, best of 3: 28 us per loop >>> >>> In [3]: 1.74e-3/28e-6 >>> Out[3]: 62.142857142857146 >>> >>> >>> Even when not in the mythical "tight loop" setting an array to one and then >>> multiplying uses up a lot of cycles - it's nearly 2 orders of magnitude slower >>> than what we know they *should* be doing. >>> >>> I'm agnostic as to whether fill should be modified or new functions provided but >>> I think numpy is currently missing this functionality and that providing it >>> would save a lot of new users from shooting themselves in the foot performance- >>> wise. >> >> Is this a fair summary? >> >> => fill(shape, val), fill_like(arr, val) - new functions, as proposed >> For: readable, seems to fit a pattern often used, presence in >> namespace may clue people into using the 'fill' rather than * val or + >> val >> Con: a very simple alias for a = ones(shape) ; a.fill(val), maybe >> cluttering already full namespace. >> >> => empty(shape).fill(val) - by allowing return value from arr.fill(val) >> For: readable >> Con: breaks guideline not to return anything from in-place operations, >> no presence in namespace means users may not find this pattern. >> >> => no new API >> For : easy maintenance >> Con : harder for users to discover fill pattern, filling a new array >> requires two lines instead of one. >> >> So maybe the decision rests on: >> >> How important is it that users see these function names in the >> namespace in order to discover the pattern "a = ones(shape) ; >> a.fill(val)"? >> >> How important is it to obey guidelines for no-return-from-in-place? >> >> How important is it to avoid expanding the namespace? >> >> How common is this pattern? >> >> On the last, I'd say that the only common use I have for this pattern >> is to fill an array with NaN. > > My 2 cts from a user perspective: > > - +1 to have such a function. I usually use numpy.ones * scalar > because honestly, spending two lines of code for such a basic > operations seems like a waste. Even if it's slower and potentially > dangerous due to casting rules. > - I think having a noun rather than a verb makes more sense since we > have numpy.ones and numpy.zeros (and I always read "numpy.empty" as > "give me an empty array", not "empty an array"). > - I agree the name collision with np.ma.filled is a problem. I have no > better suggestion though at this point. np.array_filled(shape, value, dtype) ? maybe more verbose, but unambiguous AFAICS BTW GAUSS http://en.wikipedia.org/wiki/GAUSS_(software) also has zeros and ones. 1st release 1984 np.array_filled((100, 2), -999, int) ? Josef > > -=- Olivier > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From nouiz at nouiz.org Mon Jan 14 11:45:29 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Mon, 14 Jan 2013 11:45:29 -0500 Subject: [Numpy-discussion] 1.8 release In-Reply-To: References: Message-ID: Hi, I don't volontear for the next release manager, but +1 for shorter releases. I heard just good comments from that. Also, I'm not sure it would ask more from the release manager. Do someone have an idea? The most work I do as a release manager for theano is the preparation/tests/release notes and this depend on the amont of new stuff. And this seam exponential on the number of new changes in the release, not linear (no data, just an impression...). Making smaller release make this easier. But yes, this mean more announces. But this isn't what take the most times. Also, doing the release notes more frequently mean it is more recent in memory when you check the PR merged, so it make it easier to do. But what prevent us from making shorter release? Oother priorities that can't wait, like work for papers to submit, or for collaboration with partners. just my 2cents. Fred On Mon, Jan 14, 2013 at 7:18 AM, Matthew Brett wrote: > Hi, > > On Mon, Jan 14, 2013 at 12:19 AM, David Cournapeau wrote: >> On Sun, Jan 13, 2013 at 5:26 PM, Nathaniel Smith wrote: >>> On Sun, Jan 13, 2013 at 7:03 PM, Charles R Harris >>> wrote: >>>> Now that 1.7 is nearing release, it's time to look forward to the 1.8 >>>> release. I'd like us to get back to the twice yearly schedule that we tried >>>> to maintain through the 1.3 - 1.6 releases, so I propose a June release as a >>>> goal. Call it the Spring Cleaning release. As to content, I'd like to see >>>> the following. >>>> >>>> Removal of Python 2.4-2.5 support. >>>> Removal of SCons support. >>>> The index work consolidated. >>>> Initial stab at removing the need for 2to3. See Pauli's PR for scipy. >>>> Miscellaneous enhancements and fixes. >>> >>> I'd actually like to propose a faster release cycle than this, even. >>> Perhaps 3 months between releases; 2 months from release n to the >>> first beta of n+1? >>> >>> The consequences would be: >>> * Changes get out to users faster. >>> * Each release is smaller, so it's easier for downstream projects to >>> adjust to each release -- instead of having this giant pile of changes >>> to work through all at once every 6-12 months >>> * End-users are less scared of updating, because the changes aren't so >>> overwhelming, so they end up actually testing (and getting to take >>> advantage of) the new stuff more. >>> * We get feedback more quickly, so we can fix up whatever we break >>> while we still know what we did. >>> * And for larger changes, if we release them incrementally, we can get >>> feedback before we've gone miles down the wrong path. >>> * Releases come out on time more often -- sort of paradoxical, but >>> with small, frequent releases, beta cycles go smoother, and it's >>> easier to say "don't worry, I'll get it ready for next time", or >>> "right, that patch was less done than we thought, let's take it out >>> for now" (also this is much easier if we don't have another years >>> worth of changes committed on top of the patch!). >>> * If your schedule does slip, then you still end up with a <6 month >>> release cycle. >>> >>> 1.6.x was branched from master in March 2011 and released in May 2011. >>> 1.7.x was branched from master in July 2012 and still isn't out. But >>> at least we've finally found and fixed the second to last bug! >>> >>> Wouldn't it be nice to have a 2-4 week beta cycle that only found >>> trivial and expected problems? We *already* have 6 months worth of >>> feature work in master that won't be in the *next* release. >>> >>> Note 1: if we do do this, then we'll also want to rethink the >>> deprecation cycle a bit -- right now we've sort of vaguely been saying >>> "well, we'll deprecate it in release n and take it out in n+1. >>> Whenever that is". 3 months definitely isn't long enough for a >>> deprecation period, so if we do do this then we'll want to deprecate >>> things for multiple releases before actually removing them. Details to >>> be determined. >>> >>> Note 2: in this kind of release schedule, you definitely don't want to >>> say "here are the features that will be in the next release!", because >>> then you end up slipping and sliding all over the place. Instead you >>> say "here are some things that I want to work on next, and we'll see >>> which release they end up in". Since we're already following the rule >>> that nothing goes into master until it's done and tested and ready for >>> release anyway, this doesn't really change much. >>> >>> Thoughts? >> >> Hey, my time to have a time-machine: >> http://mail.scipy.org/pipermail/numpy-discussion/2008-May/033754.html >> >> I still think it is a good idea :) > > I guess it is the release manager who has by far the largest say in > this. Who will that be for the next year or so? > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From josef.pktd at gmail.com Mon Jan 14 11:55:39 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 14 Jan 2013 11:55:39 -0500 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: Message-ID: On Mon, Jan 14, 2013 at 11:22 AM, wrote: > On Mon, Jan 14, 2013 at 11:15 AM, Olivier Delalleau wrote: >> 2013/1/14 Matthew Brett : >>> Hi, >>> >>> On Mon, Jan 14, 2013 at 9:02 AM, Dave Hirschfeld >>> wrote: >>>> Robert Kern gmail.com> writes: >>>> >>>>> >>>>> >>> > >>>>> >>> > One alternative that does not expand the API with two-liners is to let >>>>> >>> > the ndarray.fill() method return self: >>>>> >>> > >>>>> >>> > a = np.empty(...).fill(20.0) >>>>> >>> >>>>> >>> This violates the convention that in-place operations never return >>>>> >>> self, to avoid confusion with out-of-place operations. E.g. >>>>> >>> ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus >>>>> >>> np.sort(), and in the broader Python world, list.sort() versus >>>>> >>> sorted(), list.reverse() versus reversed(). (This was an explicit >>>>> >>> reason given for list.sort to not return self, even.) >>>>> >>> >>>>> >>> Maybe enabling this idiom is a good enough reason to break the >>>>> >>> convention ("Special cases aren't special enough to break the rules. / >>>>> >>> Although practicality beats purity"), but it at least makes me -0 on >>>>> >>> this... >>>>> >>> >>>>> >> >>>>> >> I tend to agree with the notion that inplace operations shouldn't return >>>>> >> self, but I don't know if it's just because I've been conditioned this way. >>>>> >> Not returning self breaks the fluid interface pattern [1], as noted in a >>>>> >> similar discussion on pandas [2], FWIW, though there's likely some way to >>>>> >> have both worlds. >>>>> > >>>>> > Ah-hah, here's the email where Guide officially proclaims that there >>>>> > shall be no "fluent interface" nonsense applied to in-place operators >>>>> > in Python, because it hurts readability (at least for Dutch people >>>>> > ): >>>>> > http://mail.python.org/pipermail/python-dev/2003-October/038855.html >>>>> >>>>> That's a statement about the policy for the stdlib, and just one >>>>> person's opinion. You, and numpy, are permitted to have a different >>>>> opinion. >>>>> >>>>> In any case, I'm not strongly advocating for it. It's violation of >>>>> principle ("no fluent interfaces") is roughly in the same ballpark as >>>>> np.filled() ("not every two-liner needs its own function"), so I >>>>> thought I would toss it out there for consideration. >>>>> >>>>> -- >>>>> Robert Kern >>>>> >>>> >>>> FWIW I'm +1 on the idea. Perhaps because I just don't see many practical >>>> downsides to breaking the convention but I regularly see a big issue with there >>>> being no way to instantiate an array with a particular value. >>>> >>>> The one obvious way to do it is use ones and multiply by the value you want. I >>>> work with a lot of inexperienced programmers and I see this idiom all the time. >>>> It takes a fair amount of numpy knowledge to know that you should do it in two >>>> lines by using empty and setting a slice. >>>> >>>> In [1]: %timeit NaN*ones(10000) >>>> 1000 loops, best of 3: 1.74 ms per loop >>>> >>>> In [2]: %%timeit >>>> ...: x = empty(10000, dtype=float) >>>> ...: x[:] = NaN >>>> ...: >>>> 10000 loops, best of 3: 28 us per loop >>>> >>>> In [3]: 1.74e-3/28e-6 >>>> Out[3]: 62.142857142857146 >>>> >>>> >>>> Even when not in the mythical "tight loop" setting an array to one and then >>>> multiplying uses up a lot of cycles - it's nearly 2 orders of magnitude slower >>>> than what we know they *should* be doing. >>>> >>>> I'm agnostic as to whether fill should be modified or new functions provided but >>>> I think numpy is currently missing this functionality and that providing it >>>> would save a lot of new users from shooting themselves in the foot performance- >>>> wise. >>> >>> Is this a fair summary? >>> >>> => fill(shape, val), fill_like(arr, val) - new functions, as proposed >>> For: readable, seems to fit a pattern often used, presence in >>> namespace may clue people into using the 'fill' rather than * val or + >>> val >>> Con: a very simple alias for a = ones(shape) ; a.fill(val), maybe >>> cluttering already full namespace. >>> >>> => empty(shape).fill(val) - by allowing return value from arr.fill(val) >>> For: readable >>> Con: breaks guideline not to return anything from in-place operations, >>> no presence in namespace means users may not find this pattern. >>> >>> => no new API >>> For : easy maintenance >>> Con : harder for users to discover fill pattern, filling a new array >>> requires two lines instead of one. >>> >>> So maybe the decision rests on: >>> >>> How important is it that users see these function names in the >>> namespace in order to discover the pattern "a = ones(shape) ; >>> a.fill(val)"? >>> >>> How important is it to obey guidelines for no-return-from-in-place? >>> >>> How important is it to avoid expanding the namespace? >>> >>> How common is this pattern? >>> >>> On the last, I'd say that the only common use I have for this pattern >>> is to fill an array with NaN. >> >> My 2 cts from a user perspective: >> >> - +1 to have such a function. I usually use numpy.ones * scalar >> because honestly, spending two lines of code for such a basic >> operations seems like a waste. Even if it's slower and potentially >> dangerous due to casting rules. >> - I think having a noun rather than a verb makes more sense since we >> have numpy.ones and numpy.zeros (and I always read "numpy.empty" as >> "give me an empty array", not "empty an array"). >> - I agree the name collision with np.ma.filled is a problem. I have no >> better suggestion though at this point. > > np.array_filled(shape, value, dtype) ? > maybe more verbose, but unambiguous AFAICS > > BTW > GAUSS http://en.wikipedia.org/wiki/GAUSS_(software) > also has zeros and ones. 1st release 1984 > > np.array_filled((100, 2), -999, int) ? A quick check of the statsmodels source 20 occassions of np.nan * np.ones(...) 50 occassions of np.emtpy a few filled with other values than nan many filled in a loop (optimistically, more often used by new contributers) It's just a two-liner, but if it's a function it hopefully produces better code. David's argument looks plausible to me. Josef > > Josef > > >> >> -=- Olivier >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion From alan.isaac at gmail.com Mon Jan 14 12:15:12 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 14 Jan 2013 12:15:12 -0500 Subject: [Numpy-discussion] New numpy functions: vals and vals_like or filled, filled_like? In-Reply-To: References: Message-ID: <50F43D20.5070907@gmail.com> Just changing the subject line so a good suggestion does not get lost ... Alan From efiring at hawaii.edu Mon Jan 14 12:27:43 2013 From: efiring at hawaii.edu (Eric Firing) Date: Mon, 14 Jan 2013 07:27:43 -1000 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: Message-ID: <50F4400F.4040709@hawaii.edu> On 2013/01/14 6:15 AM, Olivier Delalleau wrote: > - I agree the name collision with np.ma.filled is a problem. I have no > better suggestion though at this point. How about "initialized()"? From ben.root at ou.edu Mon Jan 14 12:33:52 2013 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 14 Jan 2013 12:33:52 -0500 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: <50F4400F.4040709@hawaii.edu> References: <50F4400F.4040709@hawaii.edu> Message-ID: On Mon, Jan 14, 2013 at 12:27 PM, Eric Firing wrote: > On 2013/01/14 6:15 AM, Olivier Delalleau wrote: > > - I agree the name collision with np.ma.filled is a problem. I have no > > better suggestion though at this point. > > How about "initialized()"? > A verb! +1 from me! For those wondering, I have a personal rule that because functions *do* something, they really should have verbs for their names. I have to learn to read functions like "ones" and "empty" like "give me ones" or "give me an empty array". Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Jan 14 12:56:35 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 14 Jan 2013 10:56:35 -0700 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 4:24 PM, Robert Kern wrote: > On Sun, Jan 13, 2013 at 6:27 PM, Nathaniel Smith wrote: > > Hi all, > > > > PR 2875 adds two new functions, that generalize zeros(), ones(), > > zeros_like(), ones_like(), by simply taking an arbitrary fill value: > > https://github.com/numpy/numpy/pull/2875 > > So > > np.ones((10, 10)) > > is the same as > > np.filled((10, 10), 1) > > > > The implementations are trivial, but the API seems useful because it > > provides an idiomatic way of efficiently creating an array full of > > inf, or nan, or None, whatever funny value you need. All the > > alternatives are either inefficient (np.ones(...) * np.inf) or > > cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But > > there's a question of taste here; one could argue instead that these > > just add more clutter to the numpy namespace. So, before we merge, > > anyone want to chime in? > > One alternative that does not expand the API with two-liners is to let > the ndarray.fill() method return self: > > a = np.empty(...).fill(20.0) > > My thought also. Shades of the Python `.sort` method... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.haessig at crans.org Mon Jan 14 13:12:37 2013 From: pierre.haessig at crans.org (Pierre Haessig) Date: Mon, 14 Jan 2013 19:12:37 +0100 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: <50F4400F.4040709@hawaii.edu> Message-ID: <50F44A95.2030202@crans.org> Le 14/01/2013 18:33, Benjamin Root a ?crit : > > > How about "initialized()"? > > > A verb! +1 from me! Shouldn't it be "initialize()" then ? I'm not so fond of it though, because initialize is pretty broad in the field of programming. What about "refurbishing" the already existing "tile()" function ? As of now it almost does the job : In [8]: tile(nan, (3,3)) # (it's a verb ! ) Out[8]: array([[ nan, nan, nan], [ nan, nan, nan], [ nan, nan, nan]]) though with two restrictions: * tile doesn't have a dtype keyword. Could this be added ? * tile performance on my computer seems to be twice as bad as "ones() * val" Best, Pierre -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From d.warde.farley at gmail.com Mon Jan 14 13:49:19 2013 From: d.warde.farley at gmail.com (David Warde-Farley) Date: Mon, 14 Jan 2013 13:49:19 -0500 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: <50F3FC56.8000100@crans.org> Message-ID: On Mon, Jan 14, 2013 at 9:57 AM, Benjamin Root wrote: > > > On Mon, Jan 14, 2013 at 7:38 AM, Pierre Haessig > wrote: >> >> Hi, >> >> Le 14/01/2013 00:39, Nathaniel Smith a ?crit : >> > (The nice thing about np.filled() is that it makes np.zeros() and >> > np.ones() feel like clutter, rather than the reverse... not that I'm >> > suggesting ever getting rid of them, but it makes the API conceptually >> > feel smaller, not larger.) >> Coming from the Matlab syntax, I feel that np.zeros and np.ones are in >> numpy for Matlab (and maybe others ?) compatibilty and are useful for >> that. Now that I've been "enlightened" by Python, I think that those >> functions (especially np.ones) are indeed clutter. Therefore I favor the >> introduction of these two new functions. >> >> However, I think Eric's remark about masked array API compatibility is >> important. I don't know what other names are possible ? np.const ? >> >> Or maybe np.tile is also useful for that same purpose ? In that case >> adding a dtype argument to np.tile would be useful. >> >> best, >> Pierre >> > > I am also +1 on the idea of having a filled() and filled_like() function (I > learned a long time ago to just do a = np.empty() and a.fill() rather than > the multiplication trick I learned from Matlab). However, the collision > with the masked array API is a non-starter for me. np.const() and > np.const_like() probably make the most sense, but I would prefer a verb over > a noun. Definitely -1 on const. Falsely implies immutability, to my mind. David From d.warde.farley at gmail.com Mon Jan 14 13:56:54 2013 From: d.warde.farley at gmail.com (David Warde-Farley) Date: Mon, 14 Jan 2013 13:56:54 -0500 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: <50F44A95.2030202@crans.org> References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> Message-ID: On Mon, Jan 14, 2013 at 1:12 PM, Pierre Haessig wrote: > In [8]: tile(nan, (3,3)) # (it's a verb ! ) tile, in my opinion, is useful in some cases (for people who think in terms of repmat()) but not very NumPy-ish. What I'd like is a function that takes - an initial array_like "a" - a shape "s" - optionally, a dtype (otherwise inherit from a) and broadcasts "a" to the shape "s". In the case of scalars this is just a fill. In the case of, say, a (5,) vector and a (10, 5) shape, this broadcasts across rows, etc. I don't think it's worth special-casing scalar fills (except perhaps as an implementation detail) when you have rich broadcasting semantics that are already a fundamental part of NumPy, allowing for a much handier primitive. David From ben.root at ou.edu Mon Jan 14 14:05:21 2013 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 14 Jan 2013 14:05:21 -0500 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> Message-ID: On Mon, Jan 14, 2013 at 1:56 PM, David Warde-Farley < d.warde.farley at gmail.com> wrote: > On Mon, Jan 14, 2013 at 1:12 PM, Pierre Haessig > wrote: > > In [8]: tile(nan, (3,3)) # (it's a verb ! ) > > tile, in my opinion, is useful in some cases (for people who think in > terms of repmat()) but not very NumPy-ish. What I'd like is a function > that takes > > - an initial array_like "a" > - a shape "s" > - optionally, a dtype (otherwise inherit from a) > > and broadcasts "a" to the shape "s". In the case of scalars this is > just a fill. In the case of, say, a (5,) vector and a (10, 5) shape, > this broadcasts across rows, etc. > > I don't think it's worth special-casing scalar fills (except perhaps > as an implementation detail) when you have rich broadcasting semantics > that are already a fundamental part of NumPy, allowing for a much > handier primitive. > I have similar problems with "tile". I learned it for a particular use in numpy, and it would be hard for me to see it for another (contextually) different use. I do like the way you are thinking in terms of the broadcasting semantics, but I wonder if that is a bit awkward. What I mean is, if one were to use broadcasting semantics for creating an array, wouldn't one have just simply used broadcasting anyway? The point of broadcasting is to _avoid_ the creation of unneeded arrays. But maybe I can be convinced with some examples. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Mon Jan 14 14:17:51 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 14 Jan 2013 14:17:51 -0500 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> Message-ID: <50F459DF.2070900@gmail.com> Thanks Pierre for noting that np.tile already provides a chunk of this functionality: >>> a = np.tile(5,(1,2,3)) >>> a array([[[5, 5, 5], [5, 5, 5]]]) >>> np.tile(1,a.shape) array([[[1, 1, 1], [1, 1, 1]]]) I had not realized a scalar first argument was possible. Alan Isaac From ralf.gommers at gmail.com Mon Jan 14 16:26:31 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 14 Jan 2013 22:26:31 +0100 Subject: [Numpy-discussion] 1.8 release In-Reply-To: References: Message-ID: On Mon, Jan 14, 2013 at 1:19 AM, David Cournapeau wrote: > On Sun, Jan 13, 2013 at 5:26 PM, Nathaniel Smith wrote: > > On Sun, Jan 13, 2013 at 7:03 PM, Charles R Harris > > wrote: > >> Now that 1.7 is nearing release, it's time to look forward to the 1.8 > >> release. I'd like us to get back to the twice yearly schedule that we > tried > >> to maintain through the 1.3 - 1.6 releases, so I propose a June release > as a > >> goal. Call it the Spring Cleaning release. As to content, I'd like to > see > >> the following. > >> > >> Removal of Python 2.4-2.5 support. > >> Removal of SCons support. > >> The index work consolidated. > >> Initial stab at removing the need for 2to3. See Pauli's PR for scipy. > >> Miscellaneous enhancements and fixes. > > > > I'd actually like to propose a faster release cycle than this, even. > > Perhaps 3 months between releases; 2 months from release n to the > > first beta of n+1? > > > > The consequences would be: > > * Changes get out to users faster. > > * Each release is smaller, so it's easier for downstream projects to > > adjust to each release -- instead of having this giant pile of changes > > to work through all at once every 6-12 months > > * End-users are less scared of updating, because the changes aren't so > > overwhelming, so they end up actually testing (and getting to take > > advantage of) the new stuff more. > > * We get feedback more quickly, so we can fix up whatever we break > > while we still know what we did. > > * And for larger changes, if we release them incrementally, we can get > > feedback before we've gone miles down the wrong path. > > * Releases come out on time more often -- sort of paradoxical, but > > with small, frequent releases, beta cycles go smoother, and it's > > easier to say "don't worry, I'll get it ready for next time", or > > "right, that patch was less done than we thought, let's take it out > > for now" (also this is much easier if we don't have another years > > worth of changes committed on top of the patch!). > > * If your schedule does slip, then you still end up with a <6 month > > release cycle. > > > > 1.6.x was branched from master in March 2011 and released in May 2011. > > 1.7.x was branched from master in July 2012 and still isn't out. But > > at least we've finally found and fixed the second to last bug! > > > > Wouldn't it be nice to have a 2-4 week beta cycle that only found > > trivial and expected problems? We *already* have 6 months worth of > > feature work in master that won't be in the *next* release. > > > > Note 1: if we do do this, then we'll also want to rethink the > > deprecation cycle a bit -- right now we've sort of vaguely been saying > > "well, we'll deprecate it in release n and take it out in n+1. > > Whenever that is". 3 months definitely isn't long enough for a > > deprecation period, so if we do do this then we'll want to deprecate > > things for multiple releases before actually removing them. Details to > > be determined. > > > > Note 2: in this kind of release schedule, you definitely don't want to > > say "here are the features that will be in the next release!", because > > then you end up slipping and sliding all over the place. Instead you > > say "here are some things that I want to work on next, and we'll see > > which release they end up in". Since we're already following the rule > > that nothing goes into master until it's done and tested and ready for > > release anyway, this doesn't really change much. > > > > Thoughts? > > Hey, my time to have a time-machine: > http://mail.scipy.org/pipermail/numpy-discussion/2008-May/033754.html > > I still think it is a good idea :) > +1 for faster and time-based releases. 3 months does sound a little too short to me (5 or 6 would be better), since a release cycle typically doesn't fit in one month. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Jan 14 17:08:50 2013 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 14 Jan 2013 22:08:50 +0000 Subject: [Numpy-discussion] 1.8 release In-Reply-To: References: Message-ID: On Mon, Jan 14, 2013 at 4:45 PM, Fr?d?ric Bastien wrote: > I don't volontear for the next release manager, but +1 for shorter > releases. I heard just good comments from that. Also, I'm not sure it > would ask more from the release manager. Do someone have an idea? The > most work I do as a release manager for theano is the > preparation/tests/release notes and this depend on the amont of new > stuff. And this seam exponential on the number of new changes in the > release, not linear (no data, just an impression...). Making smaller > release make this easier. > > But yes, this mean more announces. But this isn't what take the most > times. Also, doing the release notes more frequently mean it is more > recent in memory when you check the PR merged, so it make it easier to > do. Right, this is my experience too -- that it's actually easier to put out more releases, because each one is manageable and you get a routine going. ("Oops, it's March, better find an hour this week to check the release notes and run the 'release beta1' script.") It becomes almost boring, which is awesome. Putting out 5 small releases is much, MUCH easier than putting out one giant 5x bigger release. On Mon, Jan 14, 2013 at 9:26 PM, Ralf Gommers wrote: > +1 for faster and time-based releases. > > 3 months does sound a little too short to me (5 or 6 would be better), since > a release cycle typically doesn't fit in one month. The release cycle for 6-12+ months of changes doesn't typically fit in one month, but we've never tried for a smaller release, so who knows. I suppose that theoretically, as scientists, what we ought to do is to attempt 1-2 releases at as aggressive a pace as we can imagine to see how it goes, and then we'll have the data to interpolate the correct speed instead of extrapolating... ;-) On Mon, Jan 14, 2013 at 12:14 AM, Charles R Harris wrote: > I think three months is a bit short. Much will depend on the release manager > and I not sure what Andrej's plans are. I'd happily nominate you for that > role ;) Careful, or I'll nominate you back! ;-) Seriously, though, Ondrej is doing a great job, I doubt I'd do as well... Ondrej: I know you're still doing heroic work getting 1.7 pulled together, but if you have a moment-- Are you planning to stick around as release manager after 1.7? And if so, what are your thoughts on attempting such a short cycle? -n From madsipsen at gmail.com Tue Jan 15 06:50:20 2013 From: madsipsen at gmail.com (Mads Ipsen) Date: Tue, 15 Jan 2013 12:50:20 +0100 Subject: [Numpy-discussion] argsort Message-ID: <50F5427C.8060006@gmail.com> Hi, I simply can't understand this. I'm trying to use argsort to produce indices that can be used to sort an array: from numpy import * indices = array([[4,3],[1,12],[23,7],[11,6],[8,9]]) args = argsort(indices, axis=0) print indices[args] gives: [[[ 1 12] [ 4 3]] [[ 4 3] [11 6]] [[ 8 9] [23 7]] [[11 6] [ 8 9]] [[23 7] [ 1 12]]] I thought this should produce a sorted version of the indices array. Any help is appreciated. Best regards, Mads -- +-----------------------------------------------------+ | Mads Ipsen | +----------------------+------------------------------+ | G?seb?ksvej 7, 4. tv | | | DK-2500 Valby | phone: +45-29716388 | | Denmark | email: mads.ipsen at gmail.com | +----------------------+------------------------------+ -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Jan 15 09:44:19 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 15 Jan 2013 07:44:19 -0700 Subject: [Numpy-discussion] argsort In-Reply-To: <50F5427C.8060006@gmail.com> References: <50F5427C.8060006@gmail.com> Message-ID: On Tue, Jan 15, 2013 at 4:50 AM, Mads Ipsen wrote: > Hi, > > I simply can't understand this. I'm trying to use argsort to produce > indices that can be used to sort an array: > > from numpy import * > > indices = array([[4,3],[1,12],[23,7],[11,6],[8,9]]) > args = argsort(indices, axis=0) > print indices[args] > > gives: > > [[[ 1 12] > [ 4 3]] > > [[ 4 3] > [11 6]] > > [[ 8 9] > [23 7]] > > [[11 6] > [ 8 9]] > > [[23 7] > [ 1 12]]] > > I thought this should produce a sorted version of the indices array. > > Any help is appreciated. > > Fancy indexing is a funny creature and not easy to understand in more than one dimension. What is happening is that each index is replaced by the corresponding row of a and the result is of shape (5,2,2). To do what you want to do: In [20]: a[i, [[0,1]]*5] Out[20]: array([[ 1, 3], [ 4, 6], [ 8, 7], [11, 9], [23, 12]]) I agree that there should be an easier way to do this. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Jan 15 09:56:10 2013 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 15 Jan 2013 15:56:10 +0100 Subject: [Numpy-discussion] argsort In-Reply-To: References: <50F5427C.8060006@gmail.com> Message-ID: On Tue, Jan 15, 2013 at 3:44 PM, Charles R Harris wrote: > Fancy indexing is a funny creature and not easy to understand in more than > one dimension. What is happening is that each index is replaced by the > corresponding row of a and the result is of shape (5,2,2). To do what you > want to do: > > In [20]: a[i, [[0,1]]*5] > Out[20]: > array([[ 1, 3], > [ 4, 6], > [ 8, 7], > [11, 9], > [23, 12]]) > > I agree that there should be an easier way to do this. Slightly easier, though no more transparent: a[i, [0,1]] http://docs.scipy.org/doc/numpy/user/basics.indexing.html#indexing-multi-dimensional-arrays -- Robert Kern From Nicolas.Rougier at inria.fr Tue Jan 15 12:37:52 2013 From: Nicolas.Rougier at inria.fr (Nicolas Rougier) Date: Tue, 15 Jan 2013 18:37:52 +0100 Subject: [Numpy-discussion] dtype "reduction" [SOLVED] In-Reply-To: References: <88940E94-C5A2-4CBB-8D44-554193B9CF05@inria.fr> Message-ID: <23DC4442-3DCC-411A-AB9D-69C3FDD5CCD5@inria.fr> I ended coding the dtype reduction, it's not foolproof but it might be useful for others as well. Nicolas import numpy as np def dtype_reduce(dtype, level=0, depth=0): """ Try to reduce dtype up to a given level when it is possible dtype = [ ('vertex', [('x', 'f4'), ('y', 'f4'), ('z', 'f4')]), ('normal', [('x', 'f4'), ('y', 'f4'), ('z', 'f4')]), ('color', [('r', 'f4'), ('g', 'f4'), ('b', 'f4'), ('a', 'f4')])] level 0: ['color,vertex,normal,', 10, 'float32'] level 1: [['color', 4, 'float32'] ['normal', 3, 'float32'] ['vertex', 3, 'float32']] """ dtype = np.dtype(dtype) fields = dtype.fields # No fields if fields is None: if dtype.shape: count = reduce(mul, dtype.shape) else: count = 1 size = dtype.itemsize/count if dtype.subdtype: name = str( dtype.subdtype[0] ) else: name = str( dtype ) return ['', count, name] else: items = [] name = '' # Get reduced fields for key,value in fields.items(): l = dtype_reduce(value[0], level, depth+1) if type(l[0]) is str: items.append( [key, l[1], l[2]] ) else: items.append( l ) name += key+',' # Check if we can reduce item list ctype = None count = 0 for i,item in enumerate(items): # One item is a list, we cannot reduce if type(item[0]) is not str: return items else: if i==0: ctype = item[2] count += item[1] else: if item[2] != ctype: return items count += item[1] if depth >= level: return [name, count, ctype] else: return items if __name__ == '__main__': # Fully reductible dtype = [ ('vertex', [('x', 'f4'), ('y', 'f4'), ('z', 'f4')]), ('normal', [('x', 'f4'), ('y', 'f4'), ('z', 'f4')]), ('color', [('r', 'f4'), ('g', 'f4'), ('b', 'f4'), ('a', 'f4')])] print 'level 0:' print dtype_reduce(dtype,level=0) print 'level 1:' print dtype_reduce(dtype,level=1) print # Not fully reductible dtype = [ ('vertex', [('x', 'i4'), ('y', 'i4'), ('z', 'i4')]), ('normal', [('x', 'f4'), ('y', 'f4'), ('z', 'f4')]), ('color', [('r', 'f4'), ('g', 'f4'), ('b', 'f4'), ('a', 'f4')])] print 'level 0:' print dtype_reduce(dtype,level=0) print # Not reductible at all dtype = [ ('vertex', [('x', 'f4'), ('y', 'f4'), ('z', 'i4')]), ('normal', [('x', 'f4'), ('y', 'f4'), ('z', 'i4')]), ('color', [('r', 'f4'), ('g', 'f4'), ('b', 'i4'), ('a', 'f4')])] print 'level 0:' print dtype_reduce(dtype,level=0) On Dec 27, 2012, at 9:11 , Nicolas Rougier wrote: > > Yep, I'm trying to construct dtype2 programmaticaly and was hoping for some function giving me a "canonical" expression of the dtype. I've started playing with fields but it's just a bit harder than I though (lot of different cases and recursion). > > Thanks for the answer. > > > Nicolas > > On Dec 27, 2012, at 1:32 , Nathaniel Smith wrote: > >> On Wed, Dec 26, 2012 at 8:09 PM, Nicolas Rougier >> wrote: >>> >>> >>> Hi all, >>> >>> >>> I'm looking for a way to "reduce" dtype1 into dtype2 (when it is possible of course). >>> Is there some easy way to do that by any chance ? >>> >>> >>> dtype1 = np.dtype( [ ('vertex', [('x', 'f4'), >>> ('y', 'f4'), >>> ('z', 'f4')]), >>> ('normal', [('x', 'f4'), >>> ('y', 'f4'), >>> ('z', 'f4')]), >>> ('color', [('r', 'f4'), >>> ('g', 'f4'), >>> ('b', 'f4'), >>> ('a', 'f4')]) ] ) >>> >>> dtype2 = np.dtype( [ ('vertex', 'f4', 3), >>> ('normal', 'f4', 3), >>> ('color', 'f4', 4)] ) >>> >> >> If you have an array whose dtype is dtype1, and you want to convert it >> into an array with dtype2, then you just do >> my_dtype2_array = my_dtype1_array.view(dtype2) >> >> If you have dtype1 and you want to programmaticaly construct dtype2, >> then that's a little more fiddly and depends on what exactly you're >> trying to do, but start by poking around with dtype1.names and >> dtype1.fields, which contain information on how dtype1 is put together >> in the form of regular python structures. >> >> -n >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From jerome_caron_astro at ymail.com Tue Jan 15 14:31:25 2013 From: jerome_caron_astro at ymail.com (Jerome Caron) Date: Tue, 15 Jan 2013 19:31:25 +0000 (GMT) Subject: [Numpy-discussion] algorithm for faster median calculation ? Message-ID: <1358278285.14438.YahooMailNeo@web171404.mail.ir2.yahoo.com> Dear all, I am new to the Numpy-discussion list. I would like to follow up some possibly useful information about calculating median. The message below was posted today on the AstroPy mailing list. Kind regards Jerome Caron #---------------------------------------- I think the calculation of median values in Numpy is not optimal. I don't know if there are other libraries that do better? On my machine I get these results: >>> data = numpy.random.rand(5000,5000) >>> t0=time.time();print numpy.ma.median(data);print time.time()-t0 0.499845739822 15.1949999332 >>> t0=time.time();print numpy.median(data);print time.time()-t0 0.499845739822 4.32100009918 >>> t0=time.time();print aspylib.astro.get_median(data);print time.time()-t0 [ 0.49984574] 0.90499997139 >>> The median calculation in Aspylib is using C code from Nicolas Devillard (can be found here: http://ndevilla.free.fr/median/index.html) interfaced with ctypes. It could be easily re-used for other, more official packages. I think the code also finds quantiles efficiently. See: http://www.aspylib.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From deil.christoph at googlemail.com Tue Jan 15 14:49:47 2013 From: deil.christoph at googlemail.com (Christoph Deil) Date: Tue, 15 Jan 2013 20:49:47 +0100 Subject: [Numpy-discussion] algorithm for faster median calculation ? In-Reply-To: <1358278285.14438.YahooMailNeo@web171404.mail.ir2.yahoo.com> References: <1358278285.14438.YahooMailNeo@web171404.mail.ir2.yahoo.com> Message-ID: On Jan 15, 2013, at 8:31 PM, Jerome Caron wrote: > Dear all, > I am new to the Numpy-discussion list. > I would like to follow up some possibly useful information about calculating median. > The message below was posted today on the AstroPy mailing list. > Kind regards > Jerome Caron > > #---------------------------------------- > I think the calculation of median values in Numpy is not optimal. I don't know if there are other libraries that do better? > On my machine I get these results: > >>> data = numpy.random.rand(5000,5000) > >>> t0=time.time();print numpy.ma.median(data);print time.time()-t0 > 0.499845739822 > 15.1949999332 > >>> t0=time.time();print numpy.median(data);print time.time()-t0 > 0.499845739822 > 4.32100009918 > >>> t0=time.time();print aspylib.astro.get_median(data);print time.time()-t0 > [ 0.49984574] > 0.90499997139 > >>> > The median calculation in Aspylib is using C code from Nicolas Devillard (can be found here: http://ndevilla.free.fr/median/index.html) interfaced with ctypes. > It could be easily re-used for other, more official packages. I think the code also finds quantiles efficiently. > See: http://www.aspylib.com/ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Hi Jerome, some of the numpy devs are already discussing how to best implement the fast median for numpy here: https://github.com/numpy/numpy/issues/1811 "median in average O(n) time" If you want to get an email when someone posts a comment on that github ticket, sign up for a free github account, then click on "watch tread" at the bottom of that issue. Note that numpy is BSD-licensed, so they can't take GPL-licensed code. But I think looking at the method you have in aspylib is OK, so thanks for sharing! Christoph -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Tue Jan 15 14:50:21 2013 From: sturla at molden.no (Sturla Molden) Date: Tue, 15 Jan 2013 20:50:21 +0100 Subject: [Numpy-discussion] algorithm for faster median calculation ? In-Reply-To: <1358278285.14438.YahooMailNeo@web171404.mail.ir2.yahoo.com> References: <1358278285.14438.YahooMailNeo@web171404.mail.ir2.yahoo.com> Message-ID: <50F5B2FD.6070504@molden.no> You might want to look at this first: https://github.com/numpy/numpy/issues/1811 Yes it is possible to compute the median faster by doing quickselect instead of quicksort. Best case O(n) for quickselect, O(n log n) for quicksort. But adding selection and partial sorting to NumPy is a bigger issue than just computing medians and percentiles faster. If we are to do this I think we should add partial sorting and selection to npysort, not patch in some C or Cython quickselect just for the median. When npysort has quickselect, changing the Python code to use it for medians and percentiles is a nobrainer. https://github.com/numpy/numpy/tree/master/numpy/core/src/npysort Sturla On 15.01.2013 20:31, Jerome Caron wrote: > Dear all, > I am new to the Numpy-discussion list. > I would like to follow up some possibly useful information about > calculating median. > The message below was posted today on the AstroPy mailing list. > Kind regards > Jerome Caron > #---------------------------------------- > I think the calculation of median values in Numpy is not optimal. I > don't know if there are other libraries that do better? > On my machine I get these results: > >>> data = numpy.random.rand(5000,5000) > >>> t0=time.time();print numpy.ma.median(data);print time.time()-t0 > 0.499845739822 > 15.1949999332 > >>> t0=time.time();print numpy.median(data);print time.time()-t0 > 0.499845739822 > 4.32100009918 > >>> t0=time.time();print aspylib.astro.get_median(data);print > time.time()-t0 > [ 0.49984574] > 0.90499997139 > >>> > The median calculation in Aspylib is using C code from Nicolas Devillard > (can be found here: http://ndevilla.free.fr/median/index.html > ) interfaced with ctypes. > It could be easily re-used for other, more official packages. I think > the code also finds quantiles efficiently. > See: http://www.aspylib.com/ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From e.antero.tammi at gmail.com Tue Jan 15 15:53:39 2013 From: e.antero.tammi at gmail.com (eat) Date: Tue, 15 Jan 2013 22:53:39 +0200 Subject: [Numpy-discussion] argsort In-Reply-To: <50F5427C.8060006@gmail.com> References: <50F5427C.8060006@gmail.com> Message-ID: Hi, On Tue, Jan 15, 2013 at 1:50 PM, Mads Ipsen wrote: > Hi, > > I simply can't understand this. I'm trying to use argsort to produce > indices that can be used to sort an array: > > from numpy import * > > indices = array([[4,3],[1,12],[23,7],[11,6],[8,9]]) > args = argsort(indices, axis=0) > print indices[args] > > gives: > > [[[ 1 12] > [ 4 3]] > > [[ 4 3] > [11 6]] > > [[ 8 9] > [23 7]] > > [[11 6] > [ 8 9]] > > [[23 7] > [ 1 12]]] > > I thought this should produce a sorted version of the indices array. > > Any help is appreciated. > Perhaps these three different point of views will help you a little bit more to move on: In []: x Out[]: array([[ 4, 3], [ 1, 12], [23, 7], [11, 6], [ 8, 9]]) In []: ind= x.argsort(axis= 0) In []: ind Out[]: array([[1, 0], [0, 3], [4, 2], [3, 4], [2, 1]]) In []: x[ind[:, 0]] Out[]: array([[ 1, 12], [ 4, 3], [ 8, 9], [11, 6], [23, 7]]) In []: x[ind[:, 1]] Out[]: array([[ 4, 3], [11, 6], [23, 7], [ 8, 9], [ 1, 12]]) In []: x[ind, [0, 1]] Out[]: array([[ 1, 3], [ 4, 6], [ 8, 7], [11, 9], [23, 12]]) -eat > > Best regards, > > Mads > > -- > +-----------------------------------------------------+ > | Mads Ipsen | > +----------------------+------------------------------+ > | G?seb?ksvej 7, 4. tv | | > | DK-2500 Valby | phone: +45-29716388 | > | Denmark | email: mads.ipsen at gmail.com | > +----------------------+------------------------------+ > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Tue Jan 15 16:18:19 2013 From: sturla at molden.no (Sturla Molden) Date: Tue, 15 Jan 2013 22:18:19 +0100 Subject: [Numpy-discussion] algorithm for faster median calculation ? In-Reply-To: <50F5B2FD.6070504@molden.no> References: <1358278285.14438.YahooMailNeo@web171404.mail.ir2.yahoo.com> <50F5B2FD.6070504@molden.no> Message-ID: <50F5C79B.80309@molden.no> On 15.01.2013 20:50, Sturla Molden wrote: > You might want to look at this first: > > https://github.com/numpy/numpy/issues/1811 > > Yes it is possible to compute the median faster by doing quickselect > instead of quicksort. Best case O(n) for quickselect, O(n log n) for > quicksort. But adding selection and partial sorting to NumPy is a bigger > issue than just computing medians and percentiles faster. Anyway, here is the code, a bit updated. I prefer quickselect with a better pivot though. Sturla -------------- next part -------------- A non-text attachment was scrubbed... Name: median.py Type: text/x-python Size: 5604 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: quickselect.pyx Type: / Size: 3346 bytes Desc: not available URL: From madsipsen at gmail.com Wed Jan 16 03:30:55 2013 From: madsipsen at gmail.com (Mads Ipsen) Date: Wed, 16 Jan 2013 09:30:55 +0100 Subject: [Numpy-discussion] argsort In-Reply-To: References: <50F5427C.8060006@gmail.com> Message-ID: <50F6653F.8060409@gmail.com> Hi, Thanks everybody for all the answers that make perfect sense when axis=0. Now suppose I want to sort the array in such a way that each row is sorted individually. Then I suppose I should do this: from numpy import * v = array([[4,3], [1,12], [23,7], [11,6], [8,9]]) idx = argsort(v, axis=1) idx is then [[1 0] [0 1] [1 0] [1 0] [0 1]] which makes sense, since these are the indices in an order that would sort each row. But when I try a[idx, variuos_additional_arguments] I just get strange results. Anybody that can point me towards the correct solution. Best regards, Mads On 01/15/2013 09:53 PM, eat wrote: > Hi, > > On Tue, Jan 15, 2013 at 1:50 PM, Mads Ipsen > wrote: > > Hi, > > I simply can't understand this. I'm trying to use argsort to > produce indices that can be used to sort an array: > > from numpy import * > > indices = array([[4,3],[1,12],[23,7],[11,6],[8,9]]) > args = argsort(indices, axis=0) > print indices[args] > > gives: > > [[[ 1 12] > [ 4 3]] > > [[ 4 3] > [11 6]] > > [[ 8 9] > [23 7]] > > [[11 6] > [ 8 9]] > > [[23 7] > [ 1 12]]] > > I thought this should produce a sorted version of the indices array. > > Any help is appreciated. > > Perhaps these three different point of views will help you a little > bit more to move on: > In []: x > Out[]: > array([[ 4, 3], > [ 1, 12], > [23, 7], > [11, 6], > [ 8, 9]]) > In []: ind= x.argsort(axis= 0) > In []: ind > Out[]: > array([[1, 0], > [0, 3], > [4, 2], > [3, 4], > [2, 1]]) > > In []: x[ind[:, 0]] > Out[]: > array([[ 1, 12], > [ 4, 3], > [ 8, 9], > [11, 6], > [23, 7]]) > > In []: x[ind[:, 1]] > Out[]: > array([[ 4, 3], > [11, 6], > [23, 7], > [ 8, 9], > [ 1, 12]]) > > In []: x[ind, [0, 1]] > Out[]: > array([[ 1, 3], > [ 4, 6], > [ 8, 7], > [11, 9], > [23, 12]]) > -eat > > > Best regards, > > Mads > > -- > +-----------------------------------------------------+ > | Mads Ipsen | > +----------------------+------------------------------+ > | G?seb?ksvej 7, 4. tv | | > | DK-2500 Valby | phone:+45-29716388 | > | Denmark | email:mads.ipsen at gmail.com | > +----------------------+------------------------------+ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- +-----------------------------------------------------+ | Mads Ipsen | +----------------------+------------------------------+ | G?seb?ksvej 7, 4. tv | | | DK-2500 Valby | phone: +45-29716388 | | Denmark | email: mads.ipsen at gmail.com | +----------------------+------------------------------+ -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Jan 16 03:39:10 2013 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 16 Jan 2013 09:39:10 +0100 Subject: [Numpy-discussion] argsort In-Reply-To: <50F6653F.8060409@gmail.com> References: <50F5427C.8060006@gmail.com> <50F6653F.8060409@gmail.com> Message-ID: On Wed, Jan 16, 2013 at 9:30 AM, Mads Ipsen wrote: > Hi, > > Thanks everybody for all the answers that make perfect sense when axis=0. > > Now suppose I want to sort the array in such a way that each row is sorted > individually. Then I suppose I should do this: > > from numpy import * > > > v = array([[4,3], > [1,12], > [23,7], > [11,6], > [8,9]]) > idx = argsort(v, axis=1) > > idx is then > > [[1 0] > [0 1] > [1 0] > [1 0] > [0 1]] > > which makes sense, since these are the indices in an order that would sort > each row. But when I try > > a[idx, variuos_additional_arguments] > > I just get strange results. Anybody that can point me towards the correct > solution. Please have a look at the documentation again. If idx has indices for the second axis, you need to put it into the second place. http://docs.scipy.org/doc/numpy/user/basics.indexing.html#indexing-multi-dimensional-arrays http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing [~] |4> idx0 = np.arange(v.shape[0])[:,np.newaxis] [~] |5> idx0 array([[0], [1], [2], [3], [4]]) [~] |7> v[idx0, idx] array([[ 3, 4], [ 1, 12], [ 7, 23], [ 6, 11], [ 8, 9]]) -- Robert Kern From jaakko.luttinen at aalto.fi Wed Jan 16 06:32:33 2013 From: jaakko.luttinen at aalto.fi (Jaakko Luttinen) Date: Wed, 16 Jan 2013 13:32:33 +0200 Subject: [Numpy-discussion] numpydoc for python 3? In-Reply-To: References: <50EDA996.6090806@aalto.fi> <50EED62B.9010105@aalto.fi> <50EEDB50.5000902@aalto.fi> <50F33959.7030203@aalto.fi> <50F3DF7F.3060600@aalto.fi> Message-ID: <50F68FD1.70302@aalto.fi> On 01/14/2013 02:44 PM, Matthew Brett wrote: > On Mon, Jan 14, 2013 at 10:35 AM, Jaakko Luttinen > wrote: >> On 01/14/2013 12:53 AM, Matthew Brett wrote: >>> You might be able to get away without 2to3, using the kind of stuff >>> that Pauli has used for scipy recently: >>> >>> https://github.com/scipy/scipy/pull/397 >> >> Ok, thanks, maybe I'll try to make the tests valid in all Python >> versions. It seems there's only one line which I'm not able to transform. >> >> In doc/sphinxext/tests/test_docscrape.py, on line 559: >> assert doc['Summary'][0] == u'?????????????'.encode('utf-8') >> >> This is invalid in Python 3.0-3.2. How could I write this in such a way >> that it is valid in all Python versions? I'm a bit lost with these >> unicode encodings in Python (and in general).. And I didn't want to add >> dependency on 'six' package. > > Pierre's suggestion is good; you can also do something like this: > > # -*- coding: utf8 -*- > import sys > > if sys.version_info[0] >= 3: > a = '?????????????' > else: > a = unicode('?????????????', 'utf8') > > The 'coding' line has to be the first or second line in the file. Thanks for all the comments! I reported an issue and made a pull request: https://github.com/numpy/numpy/pull/2919 However, I haven't been able to make nosetests work. I get error: "ValueError: Attempted relative import in non-package" Don't know how to fix it properly.. -Jaakko From ondrej.certik at gmail.com Wed Jan 16 12:51:42 2013 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Wed, 16 Jan 2013 09:51:42 -0800 Subject: [Numpy-discussion] Travis failures with no errors In-Reply-To: References: Message-ID: On Thu, Dec 20, 2012 at 6:32 PM, Charles R Harris wrote: > > > On Thu, Dec 20, 2012 at 6:25 PM, Ond?ej ?ert?k > wrote: >> >> On Thu, Dec 13, 2012 at 4:39 PM, Ond?ej ?ert?k >> wrote: >> > Hi, >> > >> > I found these recent weird "failures" in Travis, but I can't find any >> > problem with the log and all tests pass. Any ideas what is going on? >> > >> > https://travis-ci.org/numpy/numpy/jobs/3570123 >> > https://travis-ci.org/numpy/numpy/jobs/3539549 >> > https://travis-ci.org/numpy/numpy/jobs/3369629 >> >> And here is another one: >> >> https://travis-ci.org/numpy/numpy/jobs/3768782 > > > Hmm, that is strange indeed. The first three are old, >= 12 days, but the > last is new, although the run time was getting up there. Might try running > the last one again. I don't know if the is an easy way to do that. And another one from 3 days ago: https://travis-ci.org/numpy/numpy/jobs/4118113 Ondrej From nouiz at nouiz.org Wed Jan 16 12:55:08 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Wed, 16 Jan 2013 12:55:08 -0500 Subject: [Numpy-discussion] Travis failures with no errors In-Reply-To: References: Message-ID: Hi, go to the site tracis-ci(the the next.travis-ci.org part): https://next.travis-ci.org/numpy/numpy/jobs/4118113 When you go that way, in a drop-down menu in the screen, when you are autorized, you can ask travis-ci to rerun the tests. You can do it in the particular test or in the commit page too to rerun all test for that commit. I find this usefull to rerun failed tests caused by VM errors... HTH Fred On Wed, Jan 16, 2013 at 12:51 PM, Ond?ej ?ert?k wrote: > On Thu, Dec 20, 2012 at 6:32 PM, Charles R Harris > wrote: >> >> >> On Thu, Dec 20, 2012 at 6:25 PM, Ond?ej ?ert?k >> wrote: >>> >>> On Thu, Dec 13, 2012 at 4:39 PM, Ond?ej ?ert?k >>> wrote: >>> > Hi, >>> > >>> > I found these recent weird "failures" in Travis, but I can't find any >>> > problem with the log and all tests pass. Any ideas what is going on? >>> > >>> > https://travis-ci.org/numpy/numpy/jobs/3570123 >>> > https://travis-ci.org/numpy/numpy/jobs/3539549 >>> > https://travis-ci.org/numpy/numpy/jobs/3369629 >>> >>> And here is another one: >>> >>> https://travis-ci.org/numpy/numpy/jobs/3768782 >> >> >> Hmm, that is strange indeed. The first three are old, >= 12 days, but the >> last is new, although the run time was getting up there. Might try running >> the last one again. I don't know if the is an easy way to do that. > > And another one from 3 days ago: > > https://travis-ci.org/numpy/numpy/jobs/4118113 > > Ondrej > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ndbecker2 at gmail.com Wed Jan 16 14:04:41 2013 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 16 Jan 2013 14:04:41 -0500 Subject: [Numpy-discussion] find points unique within some epsilon Message-ID: Any suggestion how to take a 2d complex array and find the set of points that are unique within some tolerance? (My preferred metric here would be Euclidean distance) From pav at iki.fi Wed Jan 16 15:36:13 2013 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 16 Jan 2013 22:36:13 +0200 Subject: [Numpy-discussion] numpydoc for python 3? In-Reply-To: References: <50EDA996.6090806@aalto.fi> <50EED62B.9010105@aalto.fi> <50EEDB50.5000902@aalto.fi> <50F33959.7030203@aalto.fi> <50F3DF7F.3060600@aalto.fi> Message-ID: 14.01.2013 14:44, Matthew Brett kirjoitti: [clip] > Pierre's suggestion is good; you can also do something like this: > > # -*- coding: utf8 -*- > import sys > > if sys.version_info[0] >= 3: > a = '?????????????' > else: > a = unicode('?????????????', 'utf8') > > The 'coding' line has to be the first or second line in the file. Another useful option would be from __future__ import unicode_literals This makes the literal 'spam' be unicode also on Python 2, so that b'spam' is bytes. This might make unicode unification easier. OTOH, it might open some cans of worms. -- Pauli Virtanen From e.antero.tammi at gmail.com Wed Jan 16 19:11:33 2013 From: e.antero.tammi at gmail.com (eat) Date: Thu, 17 Jan 2013 02:11:33 +0200 Subject: [Numpy-discussion] Shouldn't all in-place operations simply return self? Message-ID: Hi, In a recent thread http://article.gmane.org/gmane.comp.python.numeric.general/52772 it was proposed that .fill(.) should return self as an alternative for a trivial two-liner. I'm raising now the question: what if all in-place operations indeed could return self? How bad this would be? A 'strong' counter argument may be found at http://mail.python.org/pipermail/python-dev/2003-October/038855.html. But anyway, at least for me. it would be much more straightforward to implement simple mini dsl's ( http://en.wikipedia.org/wiki/Domain-specific_language) a much more straightforward manner. What do you think? -eat P.S. FWIW, if this idea really gains momentum obviously I'm volunteering to create a PR of it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrickmarshwx at gmail.com Wed Jan 16 19:16:44 2013 From: patrickmarshwx at gmail.com (Patrick Marsh) Date: Wed, 16 Jan 2013 18:16:44 -0600 Subject: [Numpy-discussion] Casting Bug or a "Feature"? Message-ID: Greetings, I spent a couple hours today tracking down a bug in one of my programs. I was getting different answers depending on whether I passed in a numpy array or a single number. Ultimately, I tracked it down to something I would consider a bug, but I'm not sure if others do. The case comes from taking a numpy integer array and adding a float to it. When doing var = np.array(ints) + float, var is cast to an array of floats, which is what I would expect. However, if I do np.array(ints) += float, the result is an array of integers. I can understand why this happens -- you are shoving the sum back into an integer array -- but without thinking through that I would expect the behavior of the two additions to be equal...or at least be consistent with what occurs with numbers, instead of arrays. Here's a trivial example demonstrating this import numpy as np a = np.arange(10) print a.dtype b = a + 0.5 print b.dtype a += 0.5 print a.dtype >> int64 >> float64 >> int64 >> >> >> An implication of this arrises from a simple function that "does math". The function returns different values depending on whether a number or array was passed in. def add_n_multiply(var): var += 0.5 var *= 10 return var aaa = np.arange(5) print aaa print add_n_multiply(aaa.copy()) print [add_n_multiply(x) for x in aaa.copy()] >> [0 1 2 3 4] >> [ 0 10 20 30 40] >> [5.0, 15.0, 25.0, 35.0, 45.0] Am I alone in thinking this is a bug? Or is this the behavior that others would have expected? Cheers, Patrick --- Patrick Marsh Ph.D. Candidate / Liaison to the HWT School of Meteorology / University of Oklahoma Cooperative Institute for Mesoscale Meteorological Studies National Severe Storms Laboratory http://www.patricktmarsh.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From brad.froehle at gmail.com Wed Jan 16 19:39:56 2013 From: brad.froehle at gmail.com (Bradley M. Froehle) Date: Wed, 16 Jan 2013 16:39:56 -0800 Subject: [Numpy-discussion] Casting Bug or a "Feature"? In-Reply-To: References: Message-ID: Hi Patrick: I think it is the behavior I have come to expect. The only "gotcha" here might be the difference between "var = var + 0.5" and "var += 0.5" For example: >>> import numpy as np >>> x = np.arange(5); x += 0.5; x array([0, 1, 2, 3, 4]) >>> x = np.arange(5); x = x + 0.5; x array([ 0.5, 1.5, 2.5, 3.5, 4.5]) The first line is definitely what I expect. The second, the automatic casting from int64 -> double, is documented and generally desirable. It's hard to avoid these casting issues without making code unnecessarily complex or allowing only one data type (e.g., as MATLAB does). If you worry about standardizing behavior you can always use `var = np.array(var, dtype=np.double, copy=True)` or similar at the start of your function. -Brad On Wed, Jan 16, 2013 at 4:16 PM, Patrick Marsh wrote: > Greetings, > > I spent a couple hours today tracking down a bug in one of my programs. I > was getting different answers depending on whether I passed in a numpy > array or a single number. Ultimately, I tracked it down to something I > would consider a bug, but I'm not sure if others do. The case comes from > taking a numpy integer array and adding a float to it. When doing var = > np.array(ints) + float, var is cast to an array of floats, which is what I > would expect. However, if I do np.array(ints) += float, the result is an > array of integers. I can understand why this happens -- you are shoving the > sum back into an integer array -- but without thinking through that I would > expect the behavior of the two additions to be equal...or at least be > consistent with what occurs with numbers, instead of arrays. Here's a > trivial example demonstrating this > > > import numpy as np > a = np.arange(10) > print a.dtype > b = a + 0.5 > print b.dtype > a += 0.5 > print a.dtype > > >> int64 > >> float64 > >> int64 > >> > >> > >> > > > An implication of this arrises from a simple function that "does math". > The function returns different values depending on whether a number or > array was passed in. > > > def add_n_multiply(var): > var += 0.5 > var *= 10 > return var > > aaa = np.arange(5) > print aaa > print add_n_multiply(aaa.copy()) > print [add_n_multiply(x) for x in aaa.copy()] > > > >> [0 1 2 3 4] > >> [ 0 10 20 30 40] > >> [5.0, 15.0, 25.0, 35.0, 45.0] > > > > > Am I alone in thinking this is a bug? Or is this the behavior that others > would have expected? > > > > Cheers, > Patrick > --- > Patrick Marsh > Ph.D. Candidate / Liaison to the HWT > School of Meteorology / University of Oklahoma > Cooperative Institute for Mesoscale Meteorological Studies > National Severe Storms Laboratory > http://www.patricktmarsh.com > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Jan 16 19:42:09 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Wed, 16 Jan 2013 16:42:09 -0800 Subject: [Numpy-discussion] Casting Bug or a "Feature"? In-Reply-To: References: Message-ID: Patrick, Not a bug but is it a mis-feature? See the recent thread: "Do we want scalar casting to behave as it does at the moment" In short, this is an complex issue with no easy answer... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From njs at pobox.com Wed Jan 16 20:24:19 2013 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 17 Jan 2013 01:24:19 +0000 Subject: [Numpy-discussion] Casting Bug or a "Feature"? In-Reply-To: References: Message-ID: This is separate from the scalar casting thing. This is a disguised version of the discussion about what we should do with implicit casts caused by assignment: into_array[i] = 0.5 Traditionally numpy just happily casts this stuff, possibly mangling data in the process, and this has caused many actual bugs in user code. In 1.6 some of these assignments cause errors, but we reverted this in 1.7 because this was also breaking things. Supposedly we also deprecated these at the same time, with an eye towards making them errors eventually, but I'm not sure we did this properly, and our carrying rules need revisiting in any case. (Sorry for lack of links to earlier discussion; traveling and on my phone.) -n On 16 Jan 2013 16:42, "Chris Barker - NOAA Federal" wrote: > Patrick, > > Not a bug but is it a mis-feature? > > See the recent thread: "Do we want scalar casting to behave as it does > at the moment" > > In short, this is an complex issue with no easy answer... > > -Chris > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Jan 16 20:53:58 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 16 Jan 2013 20:53:58 -0500 Subject: [Numpy-discussion] Shouldn't all in-place operations simply return self? In-Reply-To: References: Message-ID: On Wed, Jan 16, 2013 at 7:11 PM, eat wrote: > Hi, > > In a recent thread > http://article.gmane.org/gmane.comp.python.numeric.general/52772 it was > proposed that .fill(.) should return self as an alternative for a trivial > two-liner. > > I'm raising now the question: what if all in-place operations indeed could > return self? How bad this would be? A 'strong' counter argument may be found > at http://mail.python.org/pipermail/python-dev/2003-October/038855.html. > > But anyway, at least for me. it would be much more straightforward to > implement simple mini dsl's > (http://en.wikipedia.org/wiki/Domain-specific_language) a much more > straightforward manner. > > What do you think? I'm against it. I think it requires too much thinking by users and developers. The function in numpy are conceptually much closer to basic python, not some heavy object oriented framework where we need lots of chaining. (I thought I remembered some discussion and justification for returning self in sqlalchemy for this, but couldn't find it.) I'm chasing quite a few bugs with inplace operations >>> a = np.arange(10) >>> a *= np.pi >>> a ??? >>> a = np.random.random_integers(0, 5, size=5) >>> b = a.sort() >>> b >>> a array([0, 1, 2, 5, 5]) >>> b = np.random.shuffle(a) >>> b >>> b = np.random.permutation(a) >>> b array([0, 5, 5, 2, 1]) How do I remember if shuffle shuffles or permutes ? Do we have a list of functions that are inplace? Josef > > > -eat > > P.S. FWIW, if this idea really gains momentum obviously I'm volunteering to > create a PR of it. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From patrickmarshwx at gmail.com Wed Jan 16 22:43:22 2013 From: patrickmarshwx at gmail.com (Patrick Marsh) Date: Wed, 16 Jan 2013 21:43:22 -0600 Subject: [Numpy-discussion] Casting Bug or a "Feature"? In-Reply-To: References: Message-ID: Thanks, everyone for chiming in. Now that I know this behavior exists, I can explicitly prevent it in my code. However, it would be nice if a warning or something was generated to alert users about the inconsistency between var += ... and var = var + ... Patrick --- Patrick Marsh Ph.D. Candidate / Liaison to the HWT School of Meteorology / University of Oklahoma Cooperative Institute for Mesoscale Meteorological Studies National Severe Storms Laboratory http://www.patricktmarsh.com On Wed, Jan 16, 2013 at 7:24 PM, Nathaniel Smith wrote: > This is separate from the scalar casting thing. This is a disguised > version of the discussion about what we should do with implicit casts > caused by assignment: > into_array[i] = 0.5 > > Traditionally numpy just happily casts this stuff, possibly mangling data > in the process, and this has caused many actual bugs in user code. In 1.6 > some of these assignments cause errors, but we reverted this in 1.7 because > this was also breaking things. Supposedly we also deprecated these at the > same time, with an eye towards making them errors eventually, but I'm not > sure we did this properly, and our carrying rules need revisiting in any > case. > > (Sorry for lack of links to earlier discussion; traveling and on my phone.) > > -n > On 16 Jan 2013 16:42, "Chris Barker - NOAA Federal" > wrote: > >> Patrick, >> >> Not a bug but is it a mis-feature? >> >> See the recent thread: "Do we want scalar casting to behave as it does >> at the moment" >> >> In short, this is an complex issue with no easy answer... >> >> -Chris >> >> >> -- >> >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R (206) 526-6959 voice >> 7600 Sand Point Way NE (206) 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> Chris.Barker at noaa.gov >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Jan 16 22:54:40 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 16 Jan 2013 22:54:40 -0500 Subject: [Numpy-discussion] Casting Bug or a "Feature"? In-Reply-To: References: Message-ID: On Wed, Jan 16, 2013 at 10:43 PM, Patrick Marsh wrote: > Thanks, everyone for chiming in. Now that I know this behavior exists, I > can explicitly prevent it in my code. However, it would be nice if a warning > or something was generated to alert users about the inconsistency between > var += ... and var = var + ... Since I also got bitten by this recently in my code, I fully agree. I could live with an exception for lossy down casting in this case. Josef > > > > Patrick > > > --- > Patrick Marsh > Ph.D. Candidate / Liaison to the HWT > School of Meteorology / University of Oklahoma > Cooperative Institute for Mesoscale Meteorological Studies > National Severe Storms Laboratory > http://www.patricktmarsh.com > > > On Wed, Jan 16, 2013 at 7:24 PM, Nathaniel Smith wrote: >> >> This is separate from the scalar casting thing. This is a disguised >> version of the discussion about what we should do with implicit casts caused >> by assignment: >> into_array[i] = 0.5 >> >> Traditionally numpy just happily casts this stuff, possibly mangling data >> in the process, and this has caused many actual bugs in user code. In 1.6 >> some of these assignments cause errors, but we reverted this in 1.7 because >> this was also breaking things. Supposedly we also deprecated these at the >> same time, with an eye towards making them errors eventually, but I'm not >> sure we did this properly, and our carrying rules need revisiting in any >> case. >> >> (Sorry for lack of links to earlier discussion; traveling and on my >> phone.) >> >> -n >> >> On 16 Jan 2013 16:42, "Chris Barker - NOAA Federal" >> wrote: >>> >>> Patrick, >>> >>> Not a bug but is it a mis-feature? >>> >>> See the recent thread: "Do we want scalar casting to behave as it does >>> at the moment" >>> >>> In short, this is an complex issue with no easy answer... >>> >>> -Chris >>> >>> >>> -- >>> >>> Christopher Barker, Ph.D. >>> Oceanographer >>> >>> Emergency Response Division >>> NOAA/NOS/OR&R (206) 526-6959 voice >>> 7600 Sand Point Way NE (206) 526-6329 fax >>> Seattle, WA 98115 (206) 526-6317 main reception >>> >>> Chris.Barker at noaa.gov >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From njs at pobox.com Thu Jan 17 01:41:29 2013 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 17 Jan 2013 06:41:29 +0000 Subject: [Numpy-discussion] Shouldn't all in-place operations simply return self? In-Reply-To: References: Message-ID: On 16 Jan 2013 17:54, wrote: > >>> a = np.random.random_integers(0, 5, size=5) > >>> b = a.sort() > >>> b > >>> a > array([0, 1, 2, 5, 5]) > > >>> b = np.random.shuffle(a) > >>> b > >>> b = np.random.permutation(a) > >>> b > array([0, 5, 5, 2, 1]) > > How do I remember if shuffle shuffles or permutes ? > > Do we have a list of functions that are inplace? I rather like the convention used elsewhere in Python of naming in-place operations with present tense imperative verbs, and out-of-place operations with past participles. So you have sort/sorted, reverse/reversed, etc. Here this would suggest we name these two operations as either shuffle() and shuffled(), or permute() and permuted(). -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.anton.letnes at gmail.com Thu Jan 17 02:14:47 2013 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Thu, 17 Jan 2013 08:14:47 +0100 Subject: [Numpy-discussion] Casting Bug or a "Feature"? In-Reply-To: References: Message-ID: <50F7A4E7.7050608@gmail.com> On 17.01.2013 04:43, Patrick Marsh wrote: > Thanks, everyone for chiming in. Now that I know this behavior > exists, I can explicitly prevent it in my code. However, it would be > nice if a warning or something was generated to alert users about the > inconsistency between var += ... and var = var + ... > > > Patrick > I agree wholeheartedly. I actually, for a long time, used to believe that python would translate a += b to a = a + b and was bitten several times by this bug. A warning (which can be silenced if you desperately want to) would be really nice, imho. Keep up the good work, Paul From matthieu.brucher at gmail.com Thu Jan 17 02:34:27 2013 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Thu, 17 Jan 2013 08:34:27 +0100 Subject: [Numpy-discussion] Casting Bug or a "Feature"? In-Reply-To: <50F7A4E7.7050608@gmail.com> References: <50F7A4E7.7050608@gmail.com> Message-ID: Hi, Actually, this behavior is already present in other languages, so I'm -1 on additional verbosity. Of course a += b is not the same as a = a + b. The first one modifies the object a, the second one creates a new object and puts it inside a. The behavior IS consistent. Cheers, Matthieu 2013/1/17 Paul Anton Letnes > On 17.01.2013 04:43, Patrick Marsh wrote: > > Thanks, everyone for chiming in. Now that I know this behavior > > exists, I can explicitly prevent it in my code. However, it would be > > nice if a warning or something was generated to alert users about the > > inconsistency between var += ... and var = var + ... > > > > > > Patrick > > > > I agree wholeheartedly. I actually, for a long time, used to believe > that python would translate > a += b > to > a = a + b > and was bitten several times by this bug. A warning (which can be > silenced if you desperately want to) would be really nice, imho. > > Keep up the good work, > Paul > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From burger.ga at gmail.com Thu Jan 17 05:01:11 2013 From: burger.ga at gmail.com (Gerhard Burger) Date: Thu, 17 Jan 2013 11:01:11 +0100 Subject: [Numpy-discussion] Fwd: numpy test fails with "Illegal instruction' In-Reply-To: References: Message-ID: Dear numpy users, I am trying to get numpy to work on my computer, but so far no luck. When I run `numpy.test(verbose=10)` it crashes with test_polyfit (test_polynomial.TestDocs) ... Illegal instruction In the FAQ it states that I should provide the following information (running Ubuntu 12.04 64bit): os.name = 'posix' uname -r = 3.2.0-35-generic sys.platform = 'linux2' sys.version = '2.7.3 (default, Aug 1 2012, 05:14:39) \n[GCC 4.6.3]' Atlas is not installed (not required for numpy, only for scipy right?) It fails both when I install numpy 1.6.2 with `pip install numpy` and if I install the latest dev version from git. Can someone give me some pointers on how to solve this? I will be grateful for any help you can provide. Kind regards, Gerhard Burger -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Jan 17 07:27:44 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 17 Jan 2013 07:27:44 -0500 Subject: [Numpy-discussion] Casting Bug or a "Feature"? In-Reply-To: References: <50F7A4E7.7050608@gmail.com> Message-ID: On Thu, Jan 17, 2013 at 2:34 AM, Matthieu Brucher wrote: > Hi, > > Actually, this behavior is already present in other languages, so I'm -1 on > additional verbosity. > Of course a += b is not the same as a = a + b. The first one modifies the > object a, the second one creates a new object and puts it inside a. The > behavior IS consistent. The inplace operation is standard, but my guess is that the silent downcasting is not. in python >>> a = 1 >>> a += 5.3 >>> a 6.2999999999999998 >>> a = 1 >>> a *= 1j >>> a 1j I have no idea about other languages. Josef > > Cheers, > > Matthieu > > > 2013/1/17 Paul Anton Letnes >> >> On 17.01.2013 04:43, Patrick Marsh wrote: >> > Thanks, everyone for chiming in. Now that I know this behavior >> > exists, I can explicitly prevent it in my code. However, it would be >> > nice if a warning or something was generated to alert users about the >> > inconsistency between var += ... and var = var + ... >> > >> > >> > Patrick >> > >> >> I agree wholeheartedly. I actually, for a long time, used to believe >> that python would translate >> a += b >> to >> a = a + b >> and was bitten several times by this bug. A warning (which can be >> silenced if you desperately want to) would be really nice, imho. >> >> Keep up the good work, >> Paul >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > -- > Information System Engineer, Ph.D. > Blog: http://matt.eifelle.com > LinkedIn: http://www.linkedin.com/in/matthieubrucher > Music band: http://liliejay.com/ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From d.s.seljebotn at astro.uio.no Thu Jan 17 07:49:09 2013 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 17 Jan 2013 13:49:09 +0100 Subject: [Numpy-discussion] Casting Bug or a "Feature"? In-Reply-To: References: <50F7A4E7.7050608@gmail.com> Message-ID: <50F7F345.4040504@astro.uio.no> On 01/17/2013 01:27 PM, josef.pktd at gmail.com wrote: > On Thu, Jan 17, 2013 at 2:34 AM, Matthieu Brucher > wrote: >> Hi, >> >> Actually, this behavior is already present in other languages, so I'm -1 on >> additional verbosity. >> Of course a += b is not the same as a = a + b. The first one modifies the >> object a, the second one creates a new object and puts it inside a. The >> behavior IS consistent. > > The inplace operation is standard, but my guess is that the silent > downcasting is not. > > in python > >>>> a = 1 >>>> a += 5.3 >>>> a > 6.2999999999999998 >>>> a = 1 >>>> a *= 1j >>>> a > 1j > > I have no idea about other languages. I don't think the comparison with Python scalars is relevant since they are immutable: In [9]: a = 1 In [10]: b = a In [11]: a *= 1j In [12]: b Out[12]: 1 In-place operators exists for lists, but I don't know what the equivalent of a down-cast would be... In [3]: a = [0, 1] In [4]: b = a In [5]: a *= 2 In [6]: b Out[6]: [0, 1, 0, 1] Dag Sverre From jim.vickroy at noaa.gov Thu Jan 17 08:54:03 2013 From: jim.vickroy at noaa.gov (Jim Vickroy) Date: Thu, 17 Jan 2013 06:54:03 -0700 Subject: [Numpy-discussion] Shouldn't all in-place operations simply return self? In-Reply-To: References: Message-ID: <50F8027B.5040301@noaa.gov> On 1/16/2013 11:41 PM, Nathaniel Smith wrote: > > On 16 Jan 2013 17:54, > wrote: > > >>> a = np.random.random_integers(0, 5, size=5) > > >>> b = a.sort() > > >>> b > > >>> a > > array([0, 1, 2, 5, 5]) > > > > >>> b = np.random.shuffle(a) > > >>> b > > >>> b = np.random.permutation(a) > > >>> b > > array([0, 5, 5, 2, 1]) > > > > How do I remember if shuffle shuffles or permutes ? > > > > Do we have a list of functions that are inplace? > > I rather like the convention used elsewhere in Python of naming > in-place operations with present tense imperative verbs, and > out-of-place operations with past participles. So you have > sort/sorted, reverse/reversed, etc. > > Here this would suggest we name these two operations as either > shuffle() and shuffled(), or permute() and permuted(). > I like this (tense) suggestion. It seems easy to remember. --jv > -n > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.haessig at crans.org Thu Jan 17 09:02:51 2013 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 17 Jan 2013 15:02:51 +0100 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: <50F459DF.2070900@gmail.com> References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> <50F459DF.2070900@gmail.com> Message-ID: <50F8048B.6070506@crans.org> Hi, Le 14/01/2013 20:17, Alan G Isaac a ?crit : > >>> a = np.tile(5,(1,2,3)) > >>> a > array([[[5, 5, 5], > [5, 5, 5]]]) > >>> np.tile(1,a.shape) > array([[[1, 1, 1], > [1, 1, 1]]]) > > I had not realized a scalar first argument was possible. I didn't know either ! I discovered this use in the thread of this discussion. Just like Ben, I've almost never used "np.tile" neither its cousin "np.repeat"... Now, in the process of rediscovering those two functions, I was just wondering whether it would make sense to repackage them in order to allow the simple functionality of initializing a non-empty array. In term of choosing the name (or actually the verb), I prefer "repeat" because it's a more familiar concept than "tile". However, repeat may need more changes to make it work than tile. Indeed we currently have : >>> tile(nan, (3,3)) # works fine, but is pretty slow for that purpose, And doesn't accept a dtype arg array([[ nan, nan, nan], [ nan, nan, nan], [ nan, nan, nan]]) Doesn't work for that purpose: >>>repeat(nan, (3,3)) [...] ValueError: a.shape[axis] != len(repeats) So what people think of this "green" approach of recycling existing API into a slightly different function (without breaking current behavior of course) Best, Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From scott.sinclair.za at gmail.com Thu Jan 17 09:12:53 2013 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Thu, 17 Jan 2013 16:12:53 +0200 Subject: [Numpy-discussion] Fwd: numpy test fails with "Illegal instruction' In-Reply-To: References: Message-ID: On 17 January 2013 12:01, Gerhard Burger wrote: > When I run `numpy.test(verbose=10)` it crashes with > > test_polyfit (test_polynomial.TestDocs) ... Illegal instruction > > In the FAQ it states that I should provide the following information > (running Ubuntu 12.04 64bit): > > os.name = 'posix' > uname -r = 3.2.0-35-generic > sys.platform = 'linux2' > sys.version = '2.7.3 (default, Aug 1 2012, 05:14:39) \n[GCC 4.6.3]' > > Atlas is not installed (not required for numpy, only for scipy right?) > > It fails both when I install numpy 1.6.2 with `pip install numpy` and if I > install the latest dev version from git. Very strange. I tried to reproduce this on 64-bit Ubuntu 12.04 (by removing my ATLAS, BLAS, LAPACK etc..) but couldn't: $ python -c "import numpy; numpy.test()" Running unit tests for numpy NumPy version 1.6.2 NumPy is installed in /home/scott/.virtualenvs/numpy-tmp/local/lib/python2.7/site-packages/numpy Python version 2.7.3 (default, Aug 1 2012, 05:14:39) [GCC 4.6.3] nose version 1.2.1 ......... ---------------------------------------------------------------------- Ran 3568 tests in 14.170s OK (KNOWNFAIL=5, SKIP=5) $ python -c "import numpy; numpy.show_config()" blas_info: NOT AVAILABLE lapack_info: NOT AVAILABLE atlas_threads_info: NOT AVAILABLE blas_src_info: NOT AVAILABLE lapack_src_info: NOT AVAILABLE atlas_blas_threads_info: NOT AVAILABLE lapack_opt_info: NOT AVAILABLE blas_opt_info: NOT AVAILABLE atlas_info: NOT AVAILABLE lapack_mkl_info: NOT AVAILABLE blas_mkl_info: NOT AVAILABLE atlas_blas_info: NOT AVAILABLE mkl_info: NOT AVAILABLE Cheers, Scott From pierre.haessig at crans.org Thu Jan 17 09:13:37 2013 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 17 Jan 2013 15:13:37 +0100 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> Message-ID: <50F80711.9010204@crans.org> Hi, Le 14/01/2013 20:05, Benjamin Root a ?crit : > I do like the way you are thinking in terms of the broadcasting > semantics, but I wonder if that is a bit awkward. What I mean is, if > one were to use broadcasting semantics for creating an array, wouldn't > one have just simply used broadcasting anyway? The point of > broadcasting is to _avoid_ the creation of unneeded arrays. But maybe > I can be convinced with some examples. I feel that one of the point of the discussion is : although a new (or not so new...) function to create a filled array would be more elegant than the existing pair of functions "np.zeros" and "np.ones", there are maybe not so many usecases for filled arrays *other than zeros values*. I can remember having initialized a non-zero array *some months ago*. For the anecdote it was a vector of discretized vehicule speed values which I wanted to be initialized with a predefined mean speed value prior to some optimization. In that usecase, I really didn't care about the performance of this initialization step. So my overall feeling after this thread is - *yes* a single dedicated fill/init/someverb function would give a slightly better API, - but *no* it's not important because np.empty and np.zeros covers 95 % usecases ! best, Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From burger.ga at gmail.com Thu Jan 17 09:18:04 2013 From: burger.ga at gmail.com (Gerhard Burger) Date: Thu, 17 Jan 2013 15:18:04 +0100 Subject: [Numpy-discussion] Fwd: numpy test fails with "Illegal instruction' In-Reply-To: References: Message-ID: I read somewhere that it could have to do with the sse instructions that your processor is capable of, but my processor is not that old, so I would think that is not the problem... On Thu, Jan 17, 2013 at 3:12 PM, Scott Sinclair wrote: > On 17 January 2013 12:01, Gerhard Burger wrote: > > When I run `numpy.test(verbose=10)` it crashes with > > > > test_polyfit (test_polynomial.TestDocs) ... Illegal instruction > > > > In the FAQ it states that I should provide the following information > > (running Ubuntu 12.04 64bit): > > > > os.name = 'posix' > > uname -r = 3.2.0-35-generic > > sys.platform = 'linux2' > > sys.version = '2.7.3 (default, Aug 1 2012, 05:14:39) \n[GCC 4.6.3]' > > > > Atlas is not installed (not required for numpy, only for scipy right?) > > > > It fails both when I install numpy 1.6.2 with `pip install numpy` and if > I > > install the latest dev version from git. > > Very strange. I tried to reproduce this on 64-bit Ubuntu 12.04 (by > removing my ATLAS, BLAS, LAPACK etc..) but couldn't: > > $ python -c "import numpy; numpy.test()" > Running unit tests for numpy > NumPy version 1.6.2 > NumPy is installed in > /home/scott/.virtualenvs/numpy-tmp/local/lib/python2.7/site-packages/numpy > Python version 2.7.3 (default, Aug 1 2012, 05:14:39) [GCC 4.6.3] > nose version 1.2.1 > ......... > ---------------------------------------------------------------------- > Ran 3568 tests in 14.170s > > OK (KNOWNFAIL=5, SKIP=5) > > $ python -c "import numpy; numpy.show_config()" > blas_info: > NOT AVAILABLE > lapack_info: > NOT AVAILABLE > atlas_threads_info: > NOT AVAILABLE > blas_src_info: > NOT AVAILABLE > lapack_src_info: > NOT AVAILABLE > atlas_blas_threads_info: > NOT AVAILABLE > lapack_opt_info: > NOT AVAILABLE > blas_opt_info: > NOT AVAILABLE > atlas_info: > NOT AVAILABLE > lapack_mkl_info: > NOT AVAILABLE > blas_mkl_info: > NOT AVAILABLE > atlas_blas_info: > NOT AVAILABLE > mkl_info: > NOT AVAILABLE > > Cheers, > Scott > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Jan 17 09:26:16 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 17 Jan 2013 14:26:16 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: <50EDB1D4.5090909@astro.uio.no> References: <50EDB1D4.5090909@astro.uio.no> Message-ID: Hi, On Wed, Jan 9, 2013 at 6:07 PM, Dag Sverre Seljebotn wrote: > On 01/09/2013 06:22 PM, Chris Barker - NOAA Federal wrote: >> On Wed, Jan 9, 2013 at 7:09 AM, Nathaniel Smith wrote: >>>> This is a general issue applying to data which is read from real-world >>>> external sources. For example, digitizers routinely represent their >>>> samples as int8's or int16's, and you apply a scale and offset to get >>>> a reading in volts. >>> >>> This particular case is actually handled fine by 1.5, because int >>> array + float scalar *does* upcast to float. It's width that's ignored >>> (int8 versus int32), not the basic "kind" of data (int versus float). >>> >>> But overall this does sound like a problem -- but it's not a problem >>> with the scalar/array rules, it's a problem with working with narrow >>> width data in general. >> >> Exactly -- this is key. details asside, we essentially have a choice >> between an approach that makes it easy to preserver your values -- >> upcasting liberally, or making it easy to preserve your dtype -- >> requiring users to specifically upcast where needed. >> >> IIRC, our experience with earlier versions of numpy (and Numeric >> before that) is that all too often folks would choose a small dtype >> quite deliberately, then have it accidentally upcast for them -- this >> was determined to be not-so-good behavior. >> >> I think the HDF (and also netcdf...) case is a special case -- the >> small dtype+scaling has been chosen deliberately by whoever created >> the data file (to save space), but we would want it generally opaque >> to the consumer of the file -- to me, that means the issue should be >> adressed by the file reading tools, not numpy. If your HDF5 reader >> chooses the the resulting dtype explicitly, it doesn't matter what >> numpy's defaults are. If the user wants to work with the raw, unscaled >> arrays, then they should know what they are doing. > > +1. I think h5py should consider: > > File("my.h5")['int8_dset'].dtype == int64 > File("my.h5", preserve_dtype=True)['int8_dset'].dtype == int8 Returning to this thread - did we have a decision? With further reflection, it seems to me we will have a tough time going back to the 1.5 behavior now - we might be shutting the stable door after the cat is out of the bag, if you see what I mean. Maybe we should change the question to the desirable behavior in the long term. I am starting to wonder if we should aim for making * scalar and array casting rules the same; * Python int / float scalars become int32 / 64 or float64; This has the benefit of being very easy to understand and explain. It makes dtypes predictable in the sense they don't depend on value. Those wanting to maintain - say - float32 will need to cast scalars to float32. Maybe the use-cases motivating the scalar casting rules - maintaining float32 precision in particular - can be dealt with by careful casting of scalars, throwing the burden onto the memory-conscious to maintain their dtypes. Or is there a way of using flags to ufuncs to emulate the 1.5 casting rules? Do y'all agree this is desirable in the long term? If so, how should we get there? It seems to me we're about 25 percent of the way there with the current scalar casting rule. Cheers, Matthew From alan.isaac at gmail.com Thu Jan 17 09:32:34 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Thu, 17 Jan 2013 09:32:34 -0500 Subject: [Numpy-discussion] Shouldn't all in-place operations simply return self? In-Reply-To: <50F8027B.5040301@noaa.gov> References: <50F8027B.5040301@noaa.gov> Message-ID: <50F80B82.9080606@gmail.com> Is it really better to have `permute` and `permuted` than to add a keyword? (Note that these are actually still ambiguous, except by convention.) Btw, two separate issues seem to be running side by side. i. should in-place operations return their result? ii. how can we signal that an operation is inplace? I expect NumPy to do inplace operations when feasible, so maybe they could take an `out` keyword with a None default. Possibly recognize `out=True` as asking for the original array object to be returned (mutated); `out='copy'` as asking for a copy to be created, operated upon, and returned; and `out=a` to ask for array `a` to be used for the output (without changing the original object, and with a return value of None). Alan Isaac From ben.root at ou.edu Thu Jan 17 09:49:51 2013 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 17 Jan 2013 09:49:51 -0500 Subject: [Numpy-discussion] Shouldn't all in-place operations simply return self? In-Reply-To: <50F8027B.5040301@noaa.gov> References: <50F8027B.5040301@noaa.gov> Message-ID: On Thu, Jan 17, 2013 at 8:54 AM, Jim Vickroy wrote: > On 1/16/2013 11:41 PM, Nathaniel Smith wrote: > > On 16 Jan 2013 17:54, wrote: > > >>> a = np.random.random_integers(0, 5, size=5) > > >>> b = a.sort() > > >>> b > > >>> a > > array([0, 1, 2, 5, 5]) > > > > >>> b = np.random.shuffle(a) > > >>> b > > >>> b = np.random.permutation(a) > > >>> b > > array([0, 5, 5, 2, 1]) > > > > How do I remember if shuffle shuffles or permutes ? > > > > Do we have a list of functions that are inplace? > > I rather like the convention used elsewhere in Python of naming in-place > operations with present tense imperative verbs, and out-of-place operations > with past participles. So you have sort/sorted, reverse/reversed, etc. > > Here this would suggest we name these two operations as either shuffle() > and shuffled(), or permute() and permuted(). > > > I like this (tense) suggestion. It seems easy to remember. --jv > > > And another score for functions as verbs! :-P Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From burger.ga at gmail.com Thu Jan 17 09:59:36 2013 From: burger.ga at gmail.com (Gerhard Burger) Date: Thu, 17 Jan 2013 15:59:36 +0100 Subject: [Numpy-discussion] Fwd: numpy test fails with "Illegal instruction' In-Reply-To: References: Message-ID: Solved it, did a backtrace with gdb and the error came somewhere from an old lapack version that was installed on my machine (I thought I wouldn't have these issues in a virtualenv). but anyway after I removed it, and installed numpy again, it ran without problems! On Thu, Jan 17, 2013 at 3:18 PM, Gerhard Burger wrote: > I read somewhere that it could have to do with the sse instructions that > your processor is capable of, but my processor is not that old, so I would > think that is not the problem... > > > > On Thu, Jan 17, 2013 at 3:12 PM, Scott Sinclair < > scott.sinclair.za at gmail.com> wrote: > >> On 17 January 2013 12:01, Gerhard Burger wrote: >> > When I run `numpy.test(verbose=10)` it crashes with >> > >> > test_polyfit (test_polynomial.TestDocs) ... Illegal instruction >> > >> > In the FAQ it states that I should provide the following information >> > (running Ubuntu 12.04 64bit): >> > >> > os.name = 'posix' >> > uname -r = 3.2.0-35-generic >> > sys.platform = 'linux2' >> > sys.version = '2.7.3 (default, Aug 1 2012, 05:14:39) \n[GCC 4.6.3]' >> > >> > Atlas is not installed (not required for numpy, only for scipy right?) >> > >> > It fails both when I install numpy 1.6.2 with `pip install numpy` and >> if I >> > install the latest dev version from git. >> >> Very strange. I tried to reproduce this on 64-bit Ubuntu 12.04 (by >> removing my ATLAS, BLAS, LAPACK etc..) but couldn't: >> >> $ python -c "import numpy; numpy.test()" >> Running unit tests for numpy >> NumPy version 1.6.2 >> NumPy is installed in >> /home/scott/.virtualenvs/numpy-tmp/local/lib/python2.7/site-packages/numpy >> Python version 2.7.3 (default, Aug 1 2012, 05:14:39) [GCC 4.6.3] >> nose version 1.2.1 >> ......... >> ---------------------------------------------------------------------- >> Ran 3568 tests in 14.170s >> >> OK (KNOWNFAIL=5, SKIP=5) >> >> $ python -c "import numpy; numpy.show_config()" >> blas_info: >> NOT AVAILABLE >> lapack_info: >> NOT AVAILABLE >> atlas_threads_info: >> NOT AVAILABLE >> blas_src_info: >> NOT AVAILABLE >> lapack_src_info: >> NOT AVAILABLE >> atlas_blas_threads_info: >> NOT AVAILABLE >> lapack_opt_info: >> NOT AVAILABLE >> blas_opt_info: >> NOT AVAILABLE >> atlas_info: >> NOT AVAILABLE >> lapack_mkl_info: >> NOT AVAILABLE >> blas_mkl_info: >> NOT AVAILABLE >> atlas_blas_info: >> NOT AVAILABLE >> mkl_info: >> NOT AVAILABLE >> >> Cheers, >> Scott >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Jan 17 10:24:29 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 17 Jan 2013 10:24:29 -0500 Subject: [Numpy-discussion] Shouldn't all in-place operations simply return self? In-Reply-To: References: <50F8027B.5040301@noaa.gov> Message-ID: On Thu, Jan 17, 2013 at 9:49 AM, Benjamin Root wrote: > > > On Thu, Jan 17, 2013 at 8:54 AM, Jim Vickroy wrote: >> >> On 1/16/2013 11:41 PM, Nathaniel Smith wrote: >> >> On 16 Jan 2013 17:54, wrote: >> > >>> a = np.random.random_integers(0, 5, size=5) >> > >>> b = a.sort() >> > >>> b >> > >>> a >> > array([0, 1, 2, 5, 5]) >> > >> > >>> b = np.random.shuffle(a) >> > >>> b >> > >>> b = np.random.permutation(a) >> > >>> b >> > array([0, 5, 5, 2, 1]) >> > >> > How do I remember if shuffle shuffles or permutes ? >> > >> > Do we have a list of functions that are inplace? >> >> I rather like the convention used elsewhere in Python of naming in-place >> operations with present tense imperative verbs, and out-of-place operations >> with past participles. So you have sort/sorted, reverse/reversed, etc. >> >> Here this would suggest we name these two operations as either shuffle() >> and shuffled(), or permute() and permuted(). >> >> >> I like this (tense) suggestion. It seems easy to remember. --jv >> >> > > And another score for functions as verbs! I don't thing the filled we discuss here is an action. The current ``fill`` is an inplace operation, operating on an existing array. ``filled`` would be the analog that returns a copy. However ``filled`` here is creating an object I still think ``array_filled`` is the most precise '''Create an array and initialize it with the ``value``, returning the array ''' my 2.5c Josef > > :-P > > Ben Root > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Thu Jan 17 10:27:25 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 17 Jan 2013 08:27:25 -0700 Subject: [Numpy-discussion] Shouldn't all in-place operations simply return self? In-Reply-To: References: Message-ID: On Wed, Jan 16, 2013 at 5:11 PM, eat wrote: > Hi, > > In a recent thread > http://article.gmane.org/gmane.comp.python.numeric.general/52772 it was > proposed that .fill(.) should return self as an alternative for a trivial > two-liner. > > I'm raising now the question: what if all in-place operations indeed could > return self? How bad this would be? A 'strong' counter argument may be > found at > http://mail.python.org/pipermail/python-dev/2003-October/038855.html. > > But anyway, at least for me. it would be much more straightforward to > implement simple mini dsl's ( > http://en.wikipedia.org/wiki/Domain-specific_language) a much more > straightforward manner. > > What do you think? > > I've read Guido about why he didn't like inplace operations returning self and found him convincing for a while. And then I listened to other folks express a preference for the freight train style and found them convincing also. I think it comes down to a preference for one style over another and I go back and forth myself. If I had to vote, I'd go for returning self, but I'm not sure it's worth breaking python conventions to do so. Chuck > > -eat > > P.S. FWIW, if this idea really gains momentum obviously I'm volunteering > to create a PR of it. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Jan 17 10:28:20 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 17 Jan 2013 10:28:20 -0500 Subject: [Numpy-discussion] Shouldn't all in-place operations simply return self? In-Reply-To: References: <50F8027B.5040301@noaa.gov> Message-ID: On Thu, Jan 17, 2013 at 10:24 AM, wrote: > On Thu, Jan 17, 2013 at 9:49 AM, Benjamin Root wrote: >> >> >> On Thu, Jan 17, 2013 at 8:54 AM, Jim Vickroy wrote: >>> >>> On 1/16/2013 11:41 PM, Nathaniel Smith wrote: >>> >>> On 16 Jan 2013 17:54, wrote: >>> > >>> a = np.random.random_integers(0, 5, size=5) >>> > >>> b = a.sort() >>> > >>> b >>> > >>> a >>> > array([0, 1, 2, 5, 5]) >>> > >>> > >>> b = np.random.shuffle(a) >>> > >>> b >>> > >>> b = np.random.permutation(a) >>> > >>> b >>> > array([0, 5, 5, 2, 1]) >>> > >>> > How do I remember if shuffle shuffles or permutes ? >>> > >>> > Do we have a list of functions that are inplace? >>> >>> I rather like the convention used elsewhere in Python of naming in-place >>> operations with present tense imperative verbs, and out-of-place operations >>> with past participles. So you have sort/sorted, reverse/reversed, etc. >>> >>> Here this would suggest we name these two operations as either shuffle() >>> and shuffled(), or permute() and permuted(). >>> >>> >>> I like this (tense) suggestion. It seems easy to remember. --jv >>> >>> >> >> And another score for functions as verbs! > > I don't thing the filled we discuss here is an action. > > The current ``fill`` is an inplace operation, operating on an existing array. > ``filled`` would be the analog that returns a copy. > > However ``filled`` here is creating an object > > I still think ``array_filled`` is the most precise > > '''Create an array and initialize it with the ``value``, returning the array ''' > > > my 2.5c > > Josef Sorry, completely out of context. I shouldn't write emails, when I'm running in and out the office. Josef > >> >> :-P >> >> Ben Root >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> From pierre.haessig at crans.org Thu Jan 17 10:48:36 2013 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 17 Jan 2013 16:48:36 +0100 Subject: [Numpy-discussion] phase unwrapping (1d) In-Reply-To: References: <50F40352.9090603@crans.org> Message-ID: <50F81D54.7020105@crans.org> Hi Neal, Le 14/01/2013 15:39, Neal Becker a ?crit : > This code should explain all: > -------------------------------- > import numpy as np > arg = np.angle > > def nint (x): > return int (x + 0.5) if x >= 0 else int (x - 0.5) > > def unwrap (inp, y=np.pi, init=0, cnt=0): > o = np.empty_like (inp) > prev_o = init > for i in range (len (inp)): > o[i] = cnt * 2 * y + inp[i] > delta = o[i] - prev_o > > if delta / y > 1 or delta / y < -1: > n = nint (delta / (2*y)) > o[i] -= 2*y*n > cnt -= n > > prev_o = o[i] > > return o > > > u = np.linspace (0, 400, 100) * np.pi/100 > v = np.cos (u) + 1j * np.sin (u) > plot (arg(v)) > plot (arg(v) + arg (v)) > plot (unwrap (arg (v))) > plot (unwrap (arg (v) + arg (v))) I think your code does the job. I tried the following simplification, without the use of nint (which by the way could be replaced by int(floor(x)) I think) : def unwrap (inp, y=np.pi, init=0, cnt=0): o = np.empty_like (inp) prev_o = init for i in range (len (inp)): o[i] = cnt * 2 * y + inp[i] delta = o[i] - prev_o if delta / y > 1: o[i] -= 2*y cnt -= 1 elif delta / y < -1: o[i] += 2*y cnt += 1 prev_o = o[i] return o And now I understand the issue you described of "phase changes of more than 2pi" because the above indeed fail to unwrap (arg (v) + arg (v)). On the other hand np.unwrap handles it correctly. (I still don't know for the speed issue). Best, Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From chris.barker at noaa.gov Thu Jan 17 11:21:26 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 17 Jan 2013 08:21:26 -0800 Subject: [Numpy-discussion] Casting Bug or a "Feature"? In-Reply-To: References: <50F7A4E7.7050608@gmail.com> Message-ID: On Wed, Jan 16, 2013 at 11:34 PM, Matthieu Brucher > Of course a += b is not the same as a = a + b. The first one modifies the > object a, the second one creates a new object and puts it inside a. The > behavior IS consistent. Exactly -- if you ask me, the bug is that Python allows "in_place" operators for immutable objects -- they should be more than syntactic sugar. Of course, the temptation for += on regular numbers was just too much to resist. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From njs at pobox.com Thu Jan 17 11:33:47 2013 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 17 Jan 2013 16:33:47 +0000 Subject: [Numpy-discussion] Shouldn't all in-place operations simply return self? In-Reply-To: <50F80B82.9080606@gmail.com> References: <50F8027B.5040301@noaa.gov> <50F80B82.9080606@gmail.com> Message-ID: On Thu, Jan 17, 2013 at 2:32 PM, Alan G Isaac wrote: > Is it really better to have `permute` and `permuted` > than to add a keyword? (Note that these are actually > still ambiguous, except by convention.) The convention in question, though, is that of English grammar. In practice everyone who uses numpy is a more-or-less skilled English speaker in any case, so re-using the conventions is helpful! "Shake the martini!" <- an imperative command This is a complete statement all by itself. You can't say "Hand me the shake the martini". In procedural languages like Python, there's a strong distinction between statements (whole lines, a = 1), which only matter because of their side-effects, and expressions (a + b) which have a value and can be embedded into a larger statement or expression ((a + b) + c). "Shake the martini" is clearly a statement, not an expression, and therefore clearly has a side-effect. "shaken martini" <- a noun phrase Grammatically, this is like plain "martini", you can use it anywhere you can use a noun. "Hand me the martini", "Hand me the shaken martini". In programming terms, it's an expression, not a statement. And side-effecting expressions are poor style, because when you read procedural code, you know each statement contains at least 1 side-effect, and it's much easier to figure out what's going on if each statement contains *exactly* one side-effect, and it's the top-most operation. This underlying readability guideline is actually baked much more deeply into Python than the sort/sorted distinction -- this is why in Python, 'a = 1' is *not* an expression, but a statement. C allows you to say things like "b = (a = 1)", but in Python you have to say "a = 1; b = a". > Btw, two separate issues seem to be running side by side. > > i. should in-place operations return their result? > ii. how can we signal that an operation is inplace? > > I expect NumPy to do inplace operations when feasible, > so maybe they could take an `out` keyword with a None default. > Possibly recognize `out=True` as asking for the original array > object to be returned (mutated); `out='copy'` as asking for a copy to > be created, operated upon, and returned; and `out=a` to ask > for array `a` to be used for the output (without changing > the original object, and with a return value of None). Good point that numpy also has a nice convention with out= arguments for ufuncs. I guess that convention is, by default return a new array, but also allow one to modify the same (or another!) array in-place, by passing out=. So this would suggest that we'd have b = shuffled(a) shuffled(a, out=a) shuffled(a, out=b) shuffle(a) # same as shuffled(a, out=a) and if people are bothered by having both 'shuffled' and 'shuffle', then we drop 'shuffle'. (And the decision about whether to include the imperative form can be made on a case-by-case basis; having both shuffled and shuffle seems fine to me, but probably there are other cases where this is less clear.) There is also an argument that if out= is given, then we should always return None, in general. I'm having a lot of trouble thinking of any situation where it would be acceptable style (or even useful) to write something like: c = np.add(a, b, out=a) + 1 But, 'out=' is very large and visible (which makes the readability less terrible than it could be). And np.add always returns the out array when working out-of-place (so there's at least a weak countervailing convention). So I feel much more strongly that shuffle() should return None, than I do that np.add(out=...) should return None. A compromise position would be to make all new functions that take out= return None when out= is given, while leaving existing ufuncs and such as they are for now. -n From d.s.seljebotn at astro.uio.no Thu Jan 17 13:08:36 2013 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 17 Jan 2013 19:08:36 +0100 Subject: [Numpy-discussion] Shouldn't all in-place operations simply return self? In-Reply-To: References: <50F8027B.5040301@noaa.gov> <50F80B82.9080606@gmail.com> Message-ID: <50F83E24.9030808@astro.uio.no> On 01/17/2013 05:33 PM, Nathaniel Smith wrote: > On Thu, Jan 17, 2013 at 2:32 PM, Alan G Isaac wrote: >> Is it really better to have `permute` and `permuted` >> than to add a keyword? (Note that these are actually >> still ambiguous, except by convention.) > > The convention in question, though, is that of English grammar. In > practice everyone who uses numpy is a more-or-less skilled English > speaker in any case, so re-using the conventions is helpful! > > "Shake the martini!" <- an imperative command > > This is a complete statement all by itself. You can't say "Hand me the > shake the martini". In procedural languages like Python, there's a > strong distinction between statements (whole lines, a = 1), which only > matter because of their side-effects, and expressions (a + b) which > have a value and can be embedded into a larger statement or expression > ((a + b) + c). "Shake the martini" is clearly a statement, not an > expression, and therefore clearly has a side-effect. > > "shaken martini" <- a noun phrase > > Grammatically, this is like plain "martini", you can use it anywhere > you can use a noun. "Hand me the martini", "Hand me the shaken > martini". In programming terms, it's an expression, not a statement. > And side-effecting expressions are poor style, because when you read > procedural code, you know each statement contains at least 1 > side-effect, and it's much easier to figure out what's going on if > each statement contains *exactly* one side-effect, and it's the > top-most operation. > > This underlying readability guideline is actually baked much more > deeply into Python than the sort/sorted distinction -- this is why in > Python, 'a = 1' is *not* an expression, but a statement. C allows you > to say things like "b = (a = 1)", but in Python you have to say "a = > 1; b = a". > >> Btw, two separate issues seem to be running side by side. >> >> i. should in-place operations return their result? >> ii. how can we signal that an operation is inplace? >> >> I expect NumPy to do inplace operations when feasible, >> so maybe they could take an `out` keyword with a None default. >> Possibly recognize `out=True` as asking for the original array >> object to be returned (mutated); `out='copy'` as asking for a copy to >> be created, operated upon, and returned; and `out=a` to ask >> for array `a` to be used for the output (without changing >> the original object, and with a return value of None). > > Good point that numpy also has a nice convention with out= arguments > for ufuncs. I guess that convention is, by default return a new array, > but also allow one to modify the same (or another!) array in-place, by > passing out=. So this would suggest that we'd have > b = shuffled(a) > shuffled(a, out=a) > shuffled(a, out=b) > shuffle(a) # same as shuffled(a, out=a) > and if people are bothered by having both 'shuffled' and 'shuffle', > then we drop 'shuffle'. (And the decision about whether to include the > imperative form can be made on a case-by-case basis; having both > shuffled and shuffle seems fine to me, but probably there are other > cases where this is less clear.) In addition to the verb tense, I think it's important that mutators are methods whereas functions do not mutate their arguments: lst.sort() sorted(lst) So -1 on shuffle(a) and a.shuffled(). Dag Sverre > > There is also an argument that if out= is given, then we should always > return None, in general. I'm having a lot of trouble thinking of any > situation where it would be acceptable style (or even useful) to write > something like: > c = np.add(a, b, out=a) + 1 > But, 'out=' is very large and visible (which makes the readability > less terrible than it could be). And np.add always returns the out > array when working out-of-place (so there's at least a weak > countervailing convention). So I feel much more strongly that > shuffle() should return None, than I do that np.add(out=...) should > return None. > > A compromise position would be to make all new functions that take > out= return None when out= is given, while leaving existing ufuncs and > such as they are for now. > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From njs at pobox.com Thu Jan 17 13:29:08 2013 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 17 Jan 2013 18:29:08 +0000 Subject: [Numpy-discussion] Shouldn't all in-place operations simply return self? In-Reply-To: <50F83E24.9030808@astro.uio.no> References: <50F8027B.5040301@noaa.gov> <50F80B82.9080606@gmail.com> <50F83E24.9030808@astro.uio.no> Message-ID: On Thu, Jan 17, 2013 at 6:08 PM, Dag Sverre Seljebotn < d.s.seljebotn at astro.uio.no> wrote: > In addition to the verb tense, I think it's important that mutators are > methods whereas functions do not mutate their arguments: > > lst.sort() > sorted(lst) Unfortunately this isn't really viable in a language like Python where you can't add methods to a class. (list.sort() versus sorted() has as much or more to do with the fact that sort's implementation only works on lists, while sorted takes an arbitrary iterable.) Even core python provides a function for in-place list randomization, not a method. Following the proposed rule would just mean that we couldn't provide in-place shuffles at all, which is clearly not going to be acceptable. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Thu Jan 17 14:04:24 2013 From: mwwiebe at gmail.com (Mark Wiebe) Date: Thu, 17 Jan 2013 11:04:24 -0800 Subject: [Numpy-discussion] memory leak in 1.7 Message-ID: I've tracked down and fixed a memory leak in 1.7 and master. The pull request to check and backport is here: https://github.com/numpy/numpy/pull/2928 Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Thu Jan 17 17:04:43 2013 From: efiring at hawaii.edu (Eric Firing) Date: Thu, 17 Jan 2013 12:04:43 -1000 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: <50F80711.9010204@crans.org> References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> <50F80711.9010204@crans.org> Message-ID: <50F8757B.7060008@hawaii.edu> On 2013/01/17 4:13 AM, Pierre Haessig wrote: > Hi, > > Le 14/01/2013 20:05, Benjamin Root a ?crit : >> I do like the way you are thinking in terms of the broadcasting >> semantics, but I wonder if that is a bit awkward. What I mean is, if >> one were to use broadcasting semantics for creating an array, wouldn't >> one have just simply used broadcasting anyway? The point of >> broadcasting is to _avoid_ the creation of unneeded arrays. But maybe >> I can be convinced with some examples. > > I feel that one of the point of the discussion is : although a new (or > not so new...) function to create a filled array would be more elegant > than the existing pair of functions "np.zeros" and "np.ones", there are > maybe not so many usecases for filled arrays *other than zeros values*. > > I can remember having initialized a non-zero array *some months ago*. > For the anecdote it was a vector of discretized vehicule speed values > which I wanted to be initialized with a predefined mean speed value > prior to some optimization. In that usecase, I really didn't care about > the performance of this initialization step. > > So my overall feeling after this thread is > - *yes* a single dedicated fill/init/someverb function would give a > slightly better API, > - but *no* it's not important because np.empty and np.zeros covers 95 > % usecases ! I agree with your summary and conclusion. Eric > > best, > Pierre > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ben.root at ou.edu Thu Jan 17 17:10:14 2013 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 17 Jan 2013 17:10:14 -0500 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: <50F8757B.7060008@hawaii.edu> References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> <50F80711.9010204@crans.org> <50F8757B.7060008@hawaii.edu> Message-ID: On Thu, Jan 17, 2013 at 5:04 PM, Eric Firing wrote: > On 2013/01/17 4:13 AM, Pierre Haessig wrote: > > Hi, > > > > Le 14/01/2013 20:05, Benjamin Root a ?crit : > >> I do like the way you are thinking in terms of the broadcasting > >> semantics, but I wonder if that is a bit awkward. What I mean is, if > >> one were to use broadcasting semantics for creating an array, wouldn't > >> one have just simply used broadcasting anyway? The point of > >> broadcasting is to _avoid_ the creation of unneeded arrays. But maybe > >> I can be convinced with some examples. > > > > I feel that one of the point of the discussion is : although a new (or > > not so new...) function to create a filled array would be more elegant > > than the existing pair of functions "np.zeros" and "np.ones", there are > > maybe not so many usecases for filled arrays *other than zeros values*. > > > > I can remember having initialized a non-zero array *some months ago*. > > For the anecdote it was a vector of discretized vehicule speed values > > which I wanted to be initialized with a predefined mean speed value > > prior to some optimization. In that usecase, I really didn't care about > > the performance of this initialization step. > > > > So my overall feeling after this thread is > > - *yes* a single dedicated fill/init/someverb function would give a > > slightly better API, > > - but *no* it's not important because np.empty and np.zeros covers 95 > > % usecases ! > > I agree with your summary and conclusion. > > Eric > > Can we at least have a np.nans() and np.infs() functions? This should cover an additional 4% of use-cases. Ben Root P.S. - I know they aren't verbs... -------------- next part -------------- An HTML attachment was scrubbed... URL: From thouis at gmail.com Thu Jan 17 17:13:44 2013 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Thu, 17 Jan 2013 17:13:44 -0500 Subject: [Numpy-discussion] Shouldn't all in-place operations simply return self? In-Reply-To: References: Message-ID: On Thu, Jan 17, 2013 at 10:27 AM, Charles R Harris wrote: > > > On Wed, Jan 16, 2013 at 5:11 PM, eat wrote: >> >> Hi, >> >> In a recent thread >> http://article.gmane.org/gmane.comp.python.numeric.general/52772 it was >> proposed that .fill(.) should return self as an alternative for a trivial >> two-liner. >> >> I'm raising now the question: what if all in-place operations indeed could >> return self? How bad this would be? A 'strong' counter argument may be found >> at http://mail.python.org/pipermail/python-dev/2003-October/038855.html. >> >> But anyway, at least for me. it would be much more straightforward to >> implement simple mini dsl's >> (http://en.wikipedia.org/wiki/Domain-specific_language) a much more >> straightforward manner. >> >> What do you think? >> > > I've read Guido about why he didn't like inplace operations returning self > and found him convincing for a while. And then I listened to other folks > express a preference for the freight train style and found them convincing > also. I think it comes down to a preference for one style over another and I > go back and forth myself. If I had to vote, I'd go for returning self, but > I'm not sure it's worth breaking python conventions to do so. > > Chuck I'm -1 on breaking with Python convention without very good reasons. Ray From matthew.brett at gmail.com Thu Jan 17 17:23:50 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 17 Jan 2013 22:23:50 +0000 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> <50F80711.9010204@crans.org> <50F8757B.7060008@hawaii.edu> Message-ID: Hi, On Thu, Jan 17, 2013 at 10:10 PM, Benjamin Root wrote: > > > On Thu, Jan 17, 2013 at 5:04 PM, Eric Firing wrote: >> >> On 2013/01/17 4:13 AM, Pierre Haessig wrote: >> > Hi, >> > >> > Le 14/01/2013 20:05, Benjamin Root a ?crit : >> >> I do like the way you are thinking in terms of the broadcasting >> >> semantics, but I wonder if that is a bit awkward. What I mean is, if >> >> one were to use broadcasting semantics for creating an array, wouldn't >> >> one have just simply used broadcasting anyway? The point of >> >> broadcasting is to _avoid_ the creation of unneeded arrays. But maybe >> >> I can be convinced with some examples. >> > >> > I feel that one of the point of the discussion is : although a new (or >> > not so new...) function to create a filled array would be more elegant >> > than the existing pair of functions "np.zeros" and "np.ones", there are >> > maybe not so many usecases for filled arrays *other than zeros values*. >> > >> > I can remember having initialized a non-zero array *some months ago*. >> > For the anecdote it was a vector of discretized vehicule speed values >> > which I wanted to be initialized with a predefined mean speed value >> > prior to some optimization. In that usecase, I really didn't care about >> > the performance of this initialization step. >> > >> > So my overall feeling after this thread is >> > - *yes* a single dedicated fill/init/someverb function would give a >> > slightly better API, >> > - but *no* it's not important because np.empty and np.zeros covers 95 >> > % usecases ! >> >> I agree with your summary and conclusion. >> >> Eric >> > > Can we at least have a np.nans() and np.infs() functions? This should cover > an additional 4% of use-cases. I'm a -0.5 on the new functions, just because they only save one line of code, and the use-case is fairly rare in my experience.. Cheers, Matthew From mwwiebe at gmail.com Thu Jan 17 17:27:13 2013 From: mwwiebe at gmail.com (Mark Wiebe) Date: Thu, 17 Jan 2013 14:27:13 -0800 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> <50F80711.9010204@crans.org> <50F8757B.7060008@hawaii.edu> Message-ID: On Thu, Jan 17, 2013 at 2:10 PM, Benjamin Root wrote: > > > On Thu, Jan 17, 2013 at 5:04 PM, Eric Firing wrote: > >> On 2013/01/17 4:13 AM, Pierre Haessig wrote: >> > Hi, >> > >> > Le 14/01/2013 20:05, Benjamin Root a ?crit : >> >> I do like the way you are thinking in terms of the broadcasting >> >> semantics, but I wonder if that is a bit awkward. What I mean is, if >> >> one were to use broadcasting semantics for creating an array, wouldn't >> >> one have just simply used broadcasting anyway? The point of >> >> broadcasting is to _avoid_ the creation of unneeded arrays. But maybe >> >> I can be convinced with some examples. >> > >> > I feel that one of the point of the discussion is : although a new (or >> > not so new...) function to create a filled array would be more elegant >> > than the existing pair of functions "np.zeros" and "np.ones", there are >> > maybe not so many usecases for filled arrays *other than zeros values*. >> > >> > I can remember having initialized a non-zero array *some months ago*. >> > For the anecdote it was a vector of discretized vehicule speed values >> > which I wanted to be initialized with a predefined mean speed value >> > prior to some optimization. In that usecase, I really didn't care about >> > the performance of this initialization step. >> > >> > So my overall feeling after this thread is >> > - *yes* a single dedicated fill/init/someverb function would give a >> > slightly better API, >> > - but *no* it's not important because np.empty and np.zeros covers 95 >> > % usecases ! >> >> I agree with your summary and conclusion. >> >> Eric >> >> > Can we at least have a np.nans() and np.infs() functions? This should > cover an additional 4% of use-cases. > > Ben Root > > P.S. - I know they aren't verbs... > Would it be too weird or clumsy to extend the empty and empty_like functions to do the filling? np.empty((10, 10), fill=np.nan) np.empty_like(my_arr, fill=np.nan) -Mark > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Jan 17 17:31:04 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 17 Jan 2013 22:31:04 +0000 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> <50F80711.9010204@crans.org> <50F8757B.7060008@hawaii.edu> Message-ID: Hi, On Thu, Jan 17, 2013 at 10:27 PM, Mark Wiebe wrote: > > On Thu, Jan 17, 2013 at 2:10 PM, Benjamin Root wrote: >> >> >> >> On Thu, Jan 17, 2013 at 5:04 PM, Eric Firing wrote: >>> >>> On 2013/01/17 4:13 AM, Pierre Haessig wrote: >>> > Hi, >>> > >>> > Le 14/01/2013 20:05, Benjamin Root a ?crit : >>> >> I do like the way you are thinking in terms of the broadcasting >>> >> semantics, but I wonder if that is a bit awkward. What I mean is, if >>> >> one were to use broadcasting semantics for creating an array, wouldn't >>> >> one have just simply used broadcasting anyway? The point of >>> >> broadcasting is to _avoid_ the creation of unneeded arrays. But maybe >>> >> I can be convinced with some examples. >>> > >>> > I feel that one of the point of the discussion is : although a new (or >>> > not so new...) function to create a filled array would be more elegant >>> > than the existing pair of functions "np.zeros" and "np.ones", there are >>> > maybe not so many usecases for filled arrays *other than zeros values*. >>> > >>> > I can remember having initialized a non-zero array *some months ago*. >>> > For the anecdote it was a vector of discretized vehicule speed values >>> > which I wanted to be initialized with a predefined mean speed value >>> > prior to some optimization. In that usecase, I really didn't care about >>> > the performance of this initialization step. >>> > >>> > So my overall feeling after this thread is >>> > - *yes* a single dedicated fill/init/someverb function would give a >>> > slightly better API, >>> > - but *no* it's not important because np.empty and np.zeros covers >>> > 95 >>> > % usecases ! >>> >>> I agree with your summary and conclusion. >>> >>> Eric >>> >> >> Can we at least have a np.nans() and np.infs() functions? This should >> cover an additional 4% of use-cases. >> >> Ben Root >> >> P.S. - I know they aren't verbs... > > > Would it be too weird or clumsy to extend the empty and empty_like functions > to do the filling? > > np.empty((10, 10), fill=np.nan) > np.empty_like(my_arr, fill=np.nan) That sounds like a good idea to me. Someone wanting a fast way to fill an array will probably check out the 'empty' docstring first. See you, Matthew From shish at keba.be Thu Jan 17 20:01:26 2013 From: shish at keba.be (Olivier Delalleau) Date: Thu, 17 Jan 2013 20:01:26 -0500 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> <50F80711.9010204@crans.org> <50F8757B.7060008@hawaii.edu> Message-ID: 2013/1/17 Matthew Brett : > Hi, > > On Thu, Jan 17, 2013 at 10:27 PM, Mark Wiebe wrote: >> >> On Thu, Jan 17, 2013 at 2:10 PM, Benjamin Root wrote: >>> >>> >>> >>> On Thu, Jan 17, 2013 at 5:04 PM, Eric Firing wrote: >>>> >>>> On 2013/01/17 4:13 AM, Pierre Haessig wrote: >>>> > Hi, >>>> > >>>> > Le 14/01/2013 20:05, Benjamin Root a ?crit : >>>> >> I do like the way you are thinking in terms of the broadcasting >>>> >> semantics, but I wonder if that is a bit awkward. What I mean is, if >>>> >> one were to use broadcasting semantics for creating an array, wouldn't >>>> >> one have just simply used broadcasting anyway? The point of >>>> >> broadcasting is to _avoid_ the creation of unneeded arrays. But maybe >>>> >> I can be convinced with some examples. >>>> > >>>> > I feel that one of the point of the discussion is : although a new (or >>>> > not so new...) function to create a filled array would be more elegant >>>> > than the existing pair of functions "np.zeros" and "np.ones", there are >>>> > maybe not so many usecases for filled arrays *other than zeros values*. >>>> > >>>> > I can remember having initialized a non-zero array *some months ago*. >>>> > For the anecdote it was a vector of discretized vehicule speed values >>>> > which I wanted to be initialized with a predefined mean speed value >>>> > prior to some optimization. In that usecase, I really didn't care about >>>> > the performance of this initialization step. >>>> > >>>> > So my overall feeling after this thread is >>>> > - *yes* a single dedicated fill/init/someverb function would give a >>>> > slightly better API, >>>> > - but *no* it's not important because np.empty and np.zeros covers >>>> > 95 >>>> > % usecases ! >>>> >>>> I agree with your summary and conclusion. >>>> >>>> Eric >>>> >>> >>> Can we at least have a np.nans() and np.infs() functions? This should >>> cover an additional 4% of use-cases. >>> >>> Ben Root >>> >>> P.S. - I know they aren't verbs... >> >> >> Would it be too weird or clumsy to extend the empty and empty_like functions >> to do the filling? >> >> np.empty((10, 10), fill=np.nan) >> np.empty_like(my_arr, fill=np.nan) > > That sounds like a good idea to me. Someone wanting a fast way to > fill an array will probably check out the 'empty' docstring first. > > See you, > > Matthew +1 from me. Even though it *is* weird to have both "empty" and "fill" ;) -=- Olivier From chris.barker at noaa.gov Thu Jan 17 20:04:15 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 17 Jan 2013 17:04:15 -0800 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50EDB1D4.5090909@astro.uio.no> Message-ID: On Thu, Jan 17, 2013 at 6:26 AM, Matthew Brett wrote: > I am starting to wonder if we should aim for making > > * scalar and array casting rules the same; > * Python int / float scalars become int32 / 64 or float64; aren't they already? I'm not sure what you are proposing. > This has the benefit of being very easy to understand and explain. It > makes dtypes predictable in the sense they don't depend on value. That is key -- I don't think casting should ever depend on value. > Those wanting to maintain - say - float32 will need to cast scalars to float32. > > Maybe the use-cases motivating the scalar casting rules - maintaining > float32 precision in particular - can be dealt with by careful casting > of scalars, throwing the burden onto the memory-conscious to maintain > their dtypes. IIRC this is how it worked "back in the day" (the Numeric day? -- and I'm pretty sure that in the long run it worked out badly. the core problem is that there are only python literals for a couple types, and it was oh so easy to do things like: my_arr = np,zeros(shape, dtype-float32) another_array = my_array * 4.0 and you'd suddenly get a float64 array. (of course, we already know all that..) I suppose this has the up side of being safe, and having scalar and array casting rules be the same is of course appealing, but you use a particular size dtype for a reason,and it's a real pain to maintain it. Casual users will use the defaults that match the Python types anyway. So in the in the spirit of "practicality beats purity" -- I"d like accidental upcasting to be hard to do. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From chris.barker at noaa.gov Thu Jan 17 20:05:54 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 17 Jan 2013 17:05:54 -0800 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50EDB1D4.5090909@astro.uio.no> Message-ID: On Thu, Jan 17, 2013 at 5:04 PM, Chris Barker - NOAA Federal wrote: > So in the in the spirit of "practicality beats purity" -- I"d like > accidental upcasting to be hard to do. and then: arr = arr + scalar would yield the same type as: arr += scalar so we buy some consistency! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From matthew.brett at gmail.com Thu Jan 17 20:18:14 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 18 Jan 2013 01:18:14 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50EDB1D4.5090909@astro.uio.no> Message-ID: Hi, On Fri, Jan 18, 2013 at 1:04 AM, Chris Barker - NOAA Federal wrote: > On Thu, Jan 17, 2013 at 6:26 AM, Matthew Brett wrote: > >> I am starting to wonder if we should aim for making >> >> * scalar and array casting rules the same; >> * Python int / float scalars become int32 / 64 or float64; > > aren't they already? I'm not sure what you are proposing. Sorry - yes that is what they are already, this sentence refers back to an earlier suggestion of mine on the thread, which I am discarding. >> This has the benefit of being very easy to understand and explain. It >> makes dtypes predictable in the sense they don't depend on value. > > That is key -- I don't think casting should ever depend on value. > >> Those wanting to maintain - say - float32 will need to cast scalars to float32. >> >> Maybe the use-cases motivating the scalar casting rules - maintaining >> float32 precision in particular - can be dealt with by careful casting >> of scalars, throwing the burden onto the memory-conscious to maintain >> their dtypes. > > IIRC this is how it worked "back in the day" (the Numeric day? -- and > I'm pretty sure that in the long run it worked out badly. the core > problem is that there are only python literals for a couple types, and > it was oh so easy to do things like: > > my_arr = np,zeros(shape, dtype-float32) > > another_array = my_array * 4.0 > > and you'd suddenly get a float64 array. (of course, we already know > all that..) I suppose this has the up side of being safe, and having > scalar and array casting rules be the same is of course appealing, but > you use a particular size dtype for a reason,and it's a real pain to > maintain it. Yes, I do understand that. The difference - as I understand it - is that back in the day, numeric did not have the the float32 etc scalars, so you could not do: another_array = my_array * np.float32(4.0) (please someone correct me if I'm wrong). > Casual users will use the defaults that match the Python types anyway. I think what we are reading in this thread is that even experienced numpy users can find the scalar casting rules surprising, and that's a real problem, it seems to me. The person with a massive float32 array certainly should have the ability to control upcasting, but I think the default should be the least surprising thing, and that, it seems to me, is for the casting rules to be the same for arrays and scalars. In the very long term. Cheers, Matthew From shish at keba.be Thu Jan 17 20:19:50 2013 From: shish at keba.be (Olivier Delalleau) Date: Thu, 17 Jan 2013 20:19:50 -0500 Subject: [Numpy-discussion] Casting Bug or a "Feature"? In-Reply-To: References: Message-ID: 2013/1/16 : > On Wed, Jan 16, 2013 at 10:43 PM, Patrick Marsh > wrote: >> Thanks, everyone for chiming in. Now that I know this behavior exists, I >> can explicitly prevent it in my code. However, it would be nice if a warning >> or something was generated to alert users about the inconsistency between >> var += ... and var = var + ... > > Since I also got bitten by this recently in my code, I fully agree. > I could live with an exception for lossy down casting in this case. About exceptions: someone mentioned in another thread about casting how having exceptions can make it difficult to write code. I've thought a bit more about this issue and I tend to agree, especially on code that used to "work" (in the sense of doing something -- not necessarily what you'd want -- without complaining). Don't get me wrong, when I write code I love when a library crashes and forces me to be more explicit about what I want, thus saving me the trouble of hunting down a tricky overflow / casting bug. However, in a production environment for instance, such an unexpected crash could have much worse consequences than an incorrect output. And although you may blame the programmer for not being careful enough about types, he couldn't expect it might crash the application back when this code was written.... Long story short, +1 for warning, -1 for exception, and +1 for a config flag that allows one to change to exceptions by default, if desired. -=- Olivier From stsci.perry at gmail.com Thu Jan 17 20:24:18 2013 From: stsci.perry at gmail.com (Perry Greenfield) Date: Thu, 17 Jan 2013 20:24:18 -0500 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50EDB1D4.5090909@astro.uio.no> Message-ID: I'd like to echo what Chris is saying. It was a big annoyance with Numeric to make it so hard to preserve the array type in ordinary expressions. Perry On Jan 17, 2013, at 8:04 PM, Chris Barker - NOAA Federal wrote: > On Thu, Jan 17, 2013 at 6:26 AM, Matthew Brett wrote: > >> I am starting to wonder if we should aim for making >> >> * scalar and array casting rules the same; >> * Python int / float scalars become int32 / 64 or float64; > > aren't they already? I'm not sure what you are proposing. > >> This has the benefit of being very easy to understand and explain. It >> makes dtypes predictable in the sense they don't depend on value. > > That is key -- I don't think casting should ever depend on value. > >> Those wanting to maintain - say - float32 will need to cast scalars to float32. >> >> Maybe the use-cases motivating the scalar casting rules - maintaining >> float32 precision in particular - can be dealt with by careful casting >> of scalars, throwing the burden onto the memory-conscious to maintain >> their dtypes. > > IIRC this is how it worked "back in the day" (the Numeric day? -- and > I'm pretty sure that in the long run it worked out badly. the core > problem is that there are only python literals for a couple types, and > it was oh so easy to do things like: > > my_arr = np,zeros(shape, dtype-float32) > > another_array = my_array * 4.0 > > and you'd suddenly get a float64 array. (of course, we already know > all that..) I suppose this has the up side of being safe, and having > scalar and array casting rules be the same is of course appealing, but > you use a particular size dtype for a reason,and it's a real pain to > maintain it. > > Casual users will use the defaults that match the Python types anyway. > > So in the in the spirit of "practicality beats purity" -- I"d like > accidental upcasting to be hard to do. > > -Chris > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From shish at keba.be Thu Jan 17 20:34:13 2013 From: shish at keba.be (Olivier Delalleau) Date: Thu, 17 Jan 2013 20:34:13 -0500 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50EDB1D4.5090909@astro.uio.no> Message-ID: 2013/1/17 Matthew Brett : > Hi, > > On Fri, Jan 18, 2013 at 1:04 AM, Chris Barker - NOAA Federal > wrote: >> On Thu, Jan 17, 2013 at 6:26 AM, Matthew Brett wrote: >> >>> I am starting to wonder if we should aim for making >>> >>> * scalar and array casting rules the same; >>> * Python int / float scalars become int32 / 64 or float64; >> >> aren't they already? I'm not sure what you are proposing. > > Sorry - yes that is what they are already, this sentence refers back > to an earlier suggestion of mine on the thread, which I am discarding. > >>> This has the benefit of being very easy to understand and explain. It >>> makes dtypes predictable in the sense they don't depend on value. >> >> That is key -- I don't think casting should ever depend on value. >> >>> Those wanting to maintain - say - float32 will need to cast scalars to float32. >>> >>> Maybe the use-cases motivating the scalar casting rules - maintaining >>> float32 precision in particular - can be dealt with by careful casting >>> of scalars, throwing the burden onto the memory-conscious to maintain >>> their dtypes. >> >> IIRC this is how it worked "back in the day" (the Numeric day? -- and >> I'm pretty sure that in the long run it worked out badly. the core >> problem is that there are only python literals for a couple types, and >> it was oh so easy to do things like: >> >> my_arr = np,zeros(shape, dtype-float32) >> >> another_array = my_array * 4.0 >> >> and you'd suddenly get a float64 array. (of course, we already know >> all that..) I suppose this has the up side of being safe, and having >> scalar and array casting rules be the same is of course appealing, but >> you use a particular size dtype for a reason,and it's a real pain to >> maintain it. > > Yes, I do understand that. The difference - as I understand it - is > that back in the day, numeric did not have the the float32 etc > scalars, so you could not do: > > another_array = my_array * np.float32(4.0) > > (please someone correct me if I'm wrong). > >> Casual users will use the defaults that match the Python types anyway. > > I think what we are reading in this thread is that even experienced > numpy users can find the scalar casting rules surprising, and that's a > real problem, it seems to me. > > The person with a massive float32 array certainly should have the > ability to control upcasting, but I think the default should be the > least surprising thing, and that, it seems to me, is for the casting > rules to be the same for arrays and scalars. In the very long term. That would also be my preference, after banging my head against this problem for a while now, because it's simple and consistent. Since most of the related issues seem to come from integer arrays, a middle-ground may be the following: - Integer-type arrays get upcasted by scalars as in usual array / array operations. - Float/Complex-type arrays don't get upcasted by scalars except when the scalar is complex and the array is float. It makes the rule a bit more complex, but has the advantage of better preserving float types while getting rid of most issues related to integer overflows. -=- Olivier From thouis at gmail.com Thu Jan 17 23:05:14 2013 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Thu, 17 Jan 2013 23:05:14 -0500 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> <50F80711.9010204@crans.org> <50F8757B.7060008@hawaii.edu> Message-ID: On Jan 17, 2013 8:01 PM, "Olivier Delalleau" wrote: > > 2013/1/17 Matthew Brett : > > Hi, > > > > On Thu, Jan 17, 2013 at 10:27 PM, Mark Wiebe wrote: > >> > >> On Thu, Jan 17, 2013 at 2:10 PM, Benjamin Root wrote: > >>> > >>> > >>> > >>> On Thu, Jan 17, 2013 at 5:04 PM, Eric Firing wrote: > >>>> > >>>> On 2013/01/17 4:13 AM, Pierre Haessig wrote: > >>>> > Hi, > >>>> > > >>>> > Le 14/01/2013 20:05, Benjamin Root a ?crit : > >>>> >> I do like the way you are thinking in terms of the broadcasting > >>>> >> semantics, but I wonder if that is a bit awkward. What I mean is, if > >>>> >> one were to use broadcasting semantics for creating an array, wouldn't > >>>> >> one have just simply used broadcasting anyway? The point of > >>>> >> broadcasting is to _avoid_ the creation of unneeded arrays. But maybe > >>>> >> I can be convinced with some examples. > >>>> > > >>>> > I feel that one of the point of the discussion is : although a new (or > >>>> > not so new...) function to create a filled array would be more elegant > >>>> > than the existing pair of functions "np.zeros" and "np.ones", there are > >>>> > maybe not so many usecases for filled arrays *other than zeros values*. > >>>> > > >>>> > I can remember having initialized a non-zero array *some months ago*. > >>>> > For the anecdote it was a vector of discretized vehicule speed values > >>>> > which I wanted to be initialized with a predefined mean speed value > >>>> > prior to some optimization. In that usecase, I really didn't care about > >>>> > the performance of this initialization step. > >>>> > > >>>> > So my overall feeling after this thread is > >>>> > - *yes* a single dedicated fill/init/someverb function would give a > >>>> > slightly better API, > >>>> > - but *no* it's not important because np.empty and np.zeros covers > >>>> > 95 > >>>> > % usecases ! > >>>> > >>>> I agree with your summary and conclusion. > >>>> > >>>> Eric > >>>> > >>> > >>> Can we at least have a np.nans() and np.infs() functions? This should > >>> cover an additional 4% of use-cases. > >>> > >>> Ben Root > >>> > >>> P.S. - I know they aren't verbs... > >> > >> > >> Would it be too weird or clumsy to extend the empty and empty_like functions > >> to do the filling? > >> > >> np.empty((10, 10), fill=np.nan) > >> np.empty_like(my_arr, fill=np.nan) > > > > That sounds like a good idea to me. Someone wanting a fast way to > > fill an array will probably check out the 'empty' docstring first. > > > > See you, > > > > Matthew > > +1 from me. Even though it *is* weird to have both "empty" and "fill" ;) I'd almost prefer such a keyword be added to np.ones() to avoid that weirdness. (something like "an array of ones where one equals X") Ray -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Jan 18 01:23:10 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 17 Jan 2013 22:23:10 -0800 Subject: [Numpy-discussion] Casting Bug or a "Feature"? In-Reply-To: References: Message-ID: On Thu, Jan 17, 2013 at 5:19 PM, Olivier Delalleau wrote: > 2013/1/16 : >> On Wed, Jan 16, 2013 at 10:43 PM, Patrick Marsh >> wrote: >> I could live with an exception for lossy down casting in this case. I'm not sure what the idea here is -- would you only get an exception if the value was such that the downcast would be lossy? If so, a major -1 The other option would be to always raise an exception if types would cause a downcast, i.e: arr = np.zeros(shape, dtype-uint8) arr2 = arr + 30 # this would raise an exception arr2 = arr + np.uint8(30) # you'd have to do this That sure would be clear and result if few errors of this type, but sure seems verbose and "static language like" to me. > Long story short, +1 for warning, -1 for exception, and +1 for a > config flag that allows one to change to exceptions by default, if > desired. is this for value-dependent or any casting of this sort? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From scott.sinclair.za at gmail.com Fri Jan 18 01:32:06 2013 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Fri, 18 Jan 2013 08:32:06 +0200 Subject: [Numpy-discussion] Fwd: numpy test fails with "Illegal instruction' In-Reply-To: References: Message-ID: On 17 January 2013 16:59, Gerhard Burger wrote: > Solved it, did a backtrace with gdb and the error came somewhere from an old > lapack version that was installed on my machine (I thought I wouldn't have > these issues in a virtualenv). but anyway after I removed it, and installed > numpy again, it ran without problems! Virtualenv only creates an isolated Python install, it doesn't trick the Numpy build process into ignoring system libraries like LAPACK, ATLAS etc. Glad it's fixed. Cheers, Scott From chris.barker at noaa.gov Fri Jan 18 01:32:55 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 17 Jan 2013 22:32:55 -0800 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50EDB1D4.5090909@astro.uio.no> Message-ID: On Thu, Jan 17, 2013 at 5:34 PM, Olivier Delalleau wrote: >> Yes, I do understand that. The difference - as I understand it - is >> that back in the day, numeric did not have the the float32 etc >> scalars, so you could not do: >> >> another_array = my_array * np.float32(4.0) >> >> (please someone correct me if I'm wrong). correct, it didn't have any scalars, but you could (and had to) still do something like: another_array = my_array * np.array(4.0, dtype=np.float32) a bit more verbose, but the verbosity wasn't the key issue -- it was doing anything special at all. >>> Casual users will use the defaults that match the Python types anyway. >> >> I think what we are reading in this thread is that even experienced >> numpy users can find the scalar casting rules surprising, and that's a >> real problem, it seems to me. for sure -- but it's still relevant -- if you want non-default types, you need to understand the rules an be more careful. >> The person with a massive float32 array certainly should have the >> ability to control upcasting, but I think the default should be the >> least surprising thing, and that, it seems to me, is for the casting >> rules to be the same for arrays and scalars. In the very long term. "A foolish consistency is the hobgoblin of little minds" -- just kidding. But in all seriousness -- accidental upcasting really was a big old pain back in the day -- we are not making this up. We re using the term "least surprising", but I now I was often surprised that I had lost my nice compact array. The user will need to think about it no matter how you slice it. > Since most of the related issues seem to come from integer arrays, a > middle-ground may be the following: > - Integer-type arrays get upcasted by scalars as in usual array / > array operations. > - Float/Complex-type arrays don't get upcasted by scalars except when > the scalar is complex and the array is float. I'm not sure that integer arrays are any more of an an issue, and having integer types and float typed behave differently is really asking for trouble! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From g.brandl at gmx.net Fri Jan 18 03:31:23 2013 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 18 Jan 2013 09:31:23 +0100 Subject: [Numpy-discussion] Casting Bug or a "Feature"? In-Reply-To: References: <50F7A4E7.7050608@gmail.com> Message-ID: <50F9085B.3010908@gmx.net> Am 17.01.2013 17:21, schrieb Chris Barker - NOAA Federal: > On Wed, Jan 16, 2013 at 11:34 PM, Matthieu Brucher > >> Of course a += b is not the same as a = a + b. The first one modifies the >> object a, the second one creates a new object and puts it inside a. The >> behavior IS consistent. > > Exactly -- if you ask me, the bug is that Python allows "in_place" > operators for immutable objects -- they should be more than syntactic > sugar. They are not -- the "+=" translation is well defined: the equivalents are a += b a = a.__iadd__(b) Now __iadd__ can choose to return self (for mutable objects) or a new object (for immutable objects). The confusion about immutables is simply the "usual" confusion about "=" assigning names, not variable space. > Of course, the temptation for += on regular numbers was just too much to resist. And probably 95% of the use of +=/-= *is* with regular numbers. Georg -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: From daniele at grinta.net Fri Jan 18 03:44:03 2013 From: daniele at grinta.net (Daniele Nicolodi) Date: Fri, 18 Jan 2013 09:44:03 +0100 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> <50F80711.9010204@crans.org> <50F8757B.7060008@hawaii.edu> Message-ID: <50F90B53.4030000@grinta.net> On 17/01/2013 23:27, Mark Wiebe wrote: > Would it be too weird or clumsy to extend the empty and empty_like > functions to do the filling? > > np.empty((10, 10), fill=np.nan) > np.empty_like(my_arr, fill=np.nan) Wouldn't it be more natural to extend the ndarray constructor? np.ndarray((10, 10), fill=np.nan) It looks more natural to me. In this way it is not possible to have the _like extension, but I don't see it as a major drawback. Cheers, Daniele From shish at keba.be Fri Jan 18 07:26:57 2013 From: shish at keba.be (Olivier Delalleau) Date: Fri, 18 Jan 2013 07:26:57 -0500 Subject: [Numpy-discussion] Casting Bug or a "Feature"? In-Reply-To: References: Message-ID: Le vendredi 18 janvier 2013, Chris Barker - NOAA Federal a ?crit : > On Thu, Jan 17, 2013 at 5:19 PM, Olivier Delalleau > > wrote: > > 2013/1/16 >: > >> On Wed, Jan 16, 2013 at 10:43 PM, Patrick Marsh > >> > wrote: > > >> I could live with an exception for lossy down casting in this case. > > I'm not sure what the idea here is -- would you only get an exception > if the value was such that the downcast would be lossy? If so, a major > -1 > > The other option would be to always raise an exception if types would > cause a downcast, i.e: > > arr = np.zeros(shape, dtype-uint8) > > arr2 = arr + 30 # this would raise an exception > > arr2 = arr + np.uint8(30) # you'd have to do this > > That sure would be clear and result if few errors of this type, but > sure seems verbose and "static language like" to me. > > > Long story short, +1 for warning, -1 for exception, and +1 for a > > config flag that allows one to change to exceptions by default, if > > desired. > > is this for value-dependent or any casting of this sort? What I had in mind here is the situation where the scalar's dtype is fundamentally different from the array's dtype (i.e. float vs int, complex vs float) and can't be cast exactly into the array's dtype (so, value-dependent), which is the situation that originated this thread. I don't mind removing the second part ("and can't be cast exactly...") to have it value-independent. Other tricky situations with integer arrays are to some extent related to how regular (not in-place) additions are handled, something that should probably be settled first. -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Fri Jan 18 07:39:01 2013 From: shish at keba.be (Olivier Delalleau) Date: Fri, 18 Jan 2013 07:39:01 -0500 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50EDB1D4.5090909@astro.uio.no> Message-ID: Le vendredi 18 janvier 2013, Chris Barker - NOAA Federal a ?crit : > On Thu, Jan 17, 2013 at 5:34 PM, Olivier Delalleau > > wrote: > >> Yes, I do understand that. The difference - as I understand it - is > >> that back in the day, numeric did not have the the float32 etc > >> scalars, so you could not do: > >> > >> another_array = my_array * np.float32(4.0) > >> > >> (please someone correct me if I'm wrong). > > correct, it didn't have any scalars, but you could (and had to) still > do something like: > > another_array = my_array * np.array(4.0, dtype=np.float32) > > a bit more verbose, but the verbosity wasn't the key issue -- it was > doing anything special at all. > > >>> Casual users will use the defaults that match the Python types anyway. > >> > >> I think what we are reading in this thread is that even experienced > >> numpy users can find the scalar casting rules surprising, and that's a > >> real problem, it seems to me. > > for sure -- but it's still relevant -- if you want non-default types, > you need to understand the rules an be more careful. > > >> The person with a massive float32 array certainly should have the > >> ability to control upcasting, but I think the default should be the > >> least surprising thing, and that, it seems to me, is for the casting > >> rules to be the same for arrays and scalars. In the very long term. > > "A foolish consistency is the hobgoblin of little minds" > > -- just kidding. > > But in all seriousness -- accidental upcasting really was a big old > pain back in the day -- we are not making this up. We re using the > term "least surprising", but I now I was often surprised that I had > lost my nice compact array. > > The user will need to think about it no matter how you slice it. > > > Since most of the related issues seem to come from integer arrays, a > > middle-ground may be the following: > > - Integer-type arrays get upcasted by scalars as in usual array / > > array operations. > > - Float/Complex-type arrays don't get upcasted by scalars except when > > the scalar is complex and the array is float. > > I'm not sure that integer arrays are any more of an an issue, and > having integer types and float typed behave differently is really > asking for trouble! "A foolish consistency is the hobgoblin of little minds" :P If you check again the examples in this thread exhibiting surprising / unexpected behavior, you'll notice most of them are with integers. The tricky thing about integers is that downcasting can dramatically change your result. With floats, not so much: you get approximation errors (usually what you want) and the occasional nan / inf creeping in (usally noticeable). I too would prefer similar rules between ints & floats, but after all these discussions I'm starting to think it may be worth acknowledging they are different beasts. Anyway, in my mind we were discussing what might be the desired behavior in the long term, and my suggestion isn't practical in the short term since it may break a significant amount of code. So I'm still in favor of Nathaniel's proposal, except with exceptions replaced by warnings by default (and no warning for lossy downcasting of e.g. float64 -> float32 except for zero / inf, as discussed at some point in the thread). -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.haessig at crans.org Fri Jan 18 08:48:36 2013 From: pierre.haessig at crans.org (Pierre Haessig) Date: Fri, 18 Jan 2013 14:48:36 +0100 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> <50F80711.9010204@crans.org> <50F8757B.7060008@hawaii.edu> Message-ID: <50F952B4.4070208@crans.org> Hi, Le 17/01/2013 23:31, Matthew Brett a ?crit : >> Would it be too weird or clumsy to extend the empty and empty_like functions >> >to do the filling? >> > >> >np.empty((10, 10), fill=np.nan) >> >np.empty_like(my_arr, fill=np.nan) > That sounds like a good idea to me. Someone wanting a fast way to > fill an array will probably check out the 'empty' docstring first. Oh, that sounds very good to me. There is indeed a bit of contradictions between "empty" and "fill" but maybe not that strong if we think of "empty" as a "void of actual information". (Especially true when the fill value is nan or inf, which, as Ben just mentionned are probably the most commonly used fill value after zero.) Maybe a keyword named "value" instead of "fill" may help soften the semantic opposition with "empty" ? best, Pierre From ben.root at ou.edu Fri Jan 18 09:19:35 2013 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 18 Jan 2013 09:19:35 -0500 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: <50F90B53.4030000@grinta.net> References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> <50F80711.9010204@crans.org> <50F8757B.7060008@hawaii.edu> <50F90B53.4030000@grinta.net> Message-ID: On Fri, Jan 18, 2013 at 3:44 AM, Daniele Nicolodi wrote: > On 17/01/2013 23:27, Mark Wiebe wrote: > > Would it be too weird or clumsy to extend the empty and empty_like > > functions to do the filling? > > > > np.empty((10, 10), fill=np.nan) > > np.empty_like(my_arr, fill=np.nan) > > Wouldn't it be more natural to extend the ndarray constructor? > > np.ndarray((10, 10), fill=np.nan) > > It looks more natural to me. In this way it is not possible to have the > _like extension, but I don't see it as a major drawback. > > > Cheers, > Daniele > > This isn't a bad idea. Although, I would wager that most people, like myself, use np.array() and np.array_like() instead of np.ndarray(). We should also double-check and see how well that would fit in with the other contructors like masked arrays and matrix objects. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele at grinta.net Fri Jan 18 11:36:12 2013 From: daniele at grinta.net (Daniele Nicolodi) Date: Fri, 18 Jan 2013 17:36:12 +0100 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> <50F80711.9010204@crans.org> <50F8757B.7060008@hawaii.edu> <50F90B53.4030000@grinta.net> Message-ID: <50F979FC.6020700@grinta.net> On 18/01/2013 15:19, Benjamin Root wrote: > > > On Fri, Jan 18, 2013 at 3:44 AM, Daniele Nicolodi > wrote: > > On 17/01/2013 23:27, Mark Wiebe wrote: > > Would it be too weird or clumsy to extend the empty and empty_like > > functions to do the filling? > > > > np.empty((10, 10), fill=np.nan) > > np.empty_like(my_arr, fill=np.nan) > > Wouldn't it be more natural to extend the ndarray constructor? > > np.ndarray((10, 10), fill=np.nan) > > It looks more natural to me. In this way it is not possible to have the > _like extension, but I don't see it as a major drawback. > > > Cheers, > Daniele > > > This isn't a bad idea. Although, I would wager that most people, like > myself, use np.array() and np.array_like() instead of np.ndarray(). We > should also double-check and see how well that would fit in with the > other contructors like masked arrays and matrix objects. Hello Ben, I don't really get what you mean with this. np.array() construct a numpy array from an array-like object, np.ndarray() accepts a dimensions tuple as first parameter, I don't see any np.array_like in the current numpy release. Cheers, Daniele From ben.root at ou.edu Fri Jan 18 11:46:31 2013 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 18 Jan 2013 11:46:31 -0500 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: <50F979FC.6020700@grinta.net> References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> <50F80711.9010204@crans.org> <50F8757B.7060008@hawaii.edu> <50F90B53.4030000@grinta.net> <50F979FC.6020700@grinta.net> Message-ID: On Fri, Jan 18, 2013 at 11:36 AM, Daniele Nicolodi wrote: > On 18/01/2013 15:19, Benjamin Root wrote: > > > > > > On Fri, Jan 18, 2013 at 3:44 AM, Daniele Nicolodi > > wrote: > > > > On 17/01/2013 23:27, Mark Wiebe wrote: > > > Would it be too weird or clumsy to extend the empty and empty_like > > > functions to do the filling? > > > > > > np.empty((10, 10), fill=np.nan) > > > np.empty_like(my_arr, fill=np.nan) > > > > Wouldn't it be more natural to extend the ndarray constructor? > > > > np.ndarray((10, 10), fill=np.nan) > > > > It looks more natural to me. In this way it is not possible to have > the > > _like extension, but I don't see it as a major drawback. > > > > > > Cheers, > > Daniele > > > > > > This isn't a bad idea. Although, I would wager that most people, like > > myself, use np.array() and np.array_like() instead of np.ndarray(). We > > should also double-check and see how well that would fit in with the > > other contructors like masked arrays and matrix objects. > > Hello Ben, > > I don't really get what you mean with this. np.array() construct a numpy > array from an array-like object, np.ndarray() accepts a dimensions tuple > as first parameter, I don't see any np.array_like in the current numpy > release. > > Cheers, > Daniele > > My bad, I had a brain-fart and got mixed up. I was thinking of np.empty(). In fact, I never use np.ndarray(), I use np.empty(). Besides np.ndarray() being the actual constructor, what is the difference between them? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele at grinta.net Fri Jan 18 11:57:36 2013 From: daniele at grinta.net (Daniele Nicolodi) Date: Fri, 18 Jan 2013 17:57:36 +0100 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> <50F80711.9010204@crans.org> <50F8757B.7060008@hawaii.edu> <50F90B53.4030000@grinta.net> <50F979FC.6020700@grinta.net> Message-ID: <50F97F00.8080802@grinta.net> On 18/01/2013 17:46, Benjamin Root wrote: > > > On Fri, Jan 18, 2013 at 11:36 AM, Daniele Nicolodi > wrote: > > On 18/01/2013 15:19, Benjamin Root wrote: > > > > > > On Fri, Jan 18, 2013 at 3:44 AM, Daniele Nicolodi > > > >> wrote: > > > > On 17/01/2013 23:27, Mark Wiebe wrote: > > > Would it be too weird or clumsy to extend the empty and > empty_like > > > functions to do the filling? > > > > > > np.empty((10, 10), fill=np.nan) > > > np.empty_like(my_arr, fill=np.nan) > > > > Wouldn't it be more natural to extend the ndarray constructor? > > > > np.ndarray((10, 10), fill=np.nan) > > > > It looks more natural to me. In this way it is not possible to > have the > > _like extension, but I don't see it as a major drawback. > > > > > > Cheers, > > Daniele > > > > > > This isn't a bad idea. Although, I would wager that most people, like > > myself, use np.array() and np.array_like() instead of > np.ndarray(). We > > should also double-check and see how well that would fit in with the > > other contructors like masked arrays and matrix objects. > > Hello Ben, > > I don't really get what you mean with this. np.array() construct a numpy > array from an array-like object, np.ndarray() accepts a dimensions tuple > as first parameter, I don't see any np.array_like in the current numpy > release. > > Cheers, > Daniele > > > My bad, I had a brain-fart and got mixed up. I was thinking of > np.empty(). In fact, I never use np.ndarray(), I use np.empty(). > Besides np.ndarray() being the actual constructor, what is the > difference between them? I was also wondering what's the difference between np.ndarray() and np.empty(). I thought the second was a wrapper around the first, but it looks like both of them are actually implemented in C... Cheers, Daniele From chris.barker at noaa.gov Fri Jan 18 14:58:50 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Fri, 18 Jan 2013 11:58:50 -0800 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50EDB1D4.5090909@astro.uio.no> Message-ID: On Fri, Jan 18, 2013 at 4:39 AM, Olivier Delalleau wrote: > Le vendredi 18 janvier 2013, Chris Barker - NOAA Federal a ?crit : > If you check again the examples in this thread exhibiting surprising / > unexpected behavior, you'll notice most of them are with integers. > The tricky thing about integers is that downcasting can dramatically change > your result. With floats, not so much: you get approximation errors (usually > what you want) and the occasional nan / inf creeping in (usally noticeable). fair enough. However my core argument is that people use non-standard (usually smaller) dtypes for a reason, and it should be hard to accidentally up-cast. This is in contrast with the argument that accidental down-casting can produce incorrect results, and thus it should be hard to accidentally down-cast -- same argument whether the incorrect results are drastic or not.... It's really a question of which of these we think should be prioritized. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From matthew.brett at gmail.com Fri Jan 18 17:22:36 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 18 Jan 2013 22:22:36 +0000 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: <50F952B4.4070208@crans.org> References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> <50F80711.9010204@crans.org> <50F8757B.7060008@hawaii.edu> <50F952B4.4070208@crans.org> Message-ID: Hi, On Fri, Jan 18, 2013 at 1:48 PM, Pierre Haessig wrote: > Hi, > Le 17/01/2013 23:31, Matthew Brett a ?crit : >>> Would it be too weird or clumsy to extend the empty and empty_like functions >>> >to do the filling? >>> > >>> >np.empty((10, 10), fill=np.nan) >>> >np.empty_like(my_arr, fill=np.nan) >> That sounds like a good idea to me. Someone wanting a fast way to >> fill an array will probably check out the 'empty' docstring first. > Oh, that sounds very good to me. There is indeed a bit of contradictions > between "empty" and "fill" but maybe not that strong if we think of > "empty" as a "void of actual information". (Especially true when the > fill value is nan or inf, which, as Ben just mentionned are probably the > most commonly used fill value after zero.) > > Maybe a keyword named "value" instead of "fill" may help soften the > semantic opposition with "empty" ? I personally find 'fill' OK. I'd read: a = np.empty((10, 10), fill=np.nan) as "make an empty array shape (10, 10) and fill with nans" Which would indeed be what the code was doing :) So I doubt that the semantic clash would cause any long term problems, Best, Matthew From chris.barker at noaa.gov Fri Jan 18 17:31:14 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Fri, 18 Jan 2013 14:31:14 -0800 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> <50F80711.9010204@crans.org> <50F8757B.7060008@hawaii.edu> <50F952B4.4070208@crans.org> Message-ID: On Fri, Jan 18, 2013 at 2:22 PM, Matthew Brett wrote: > I personally find 'fill' OK. I'd read: > > a = np.empty((10, 10), fill=np.nan) > > as > > "make an empty array shape (10, 10) and fill with nans" +1 simple, does the job, and doesn't bloat the API. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From ralf.gommers at gmail.com Fri Jan 18 17:35:04 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 18 Jan 2013 23:35:04 +0100 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> <50F80711.9010204@crans.org> <50F8757B.7060008@hawaii.edu> <50F952B4.4070208@crans.org> Message-ID: On Fri, Jan 18, 2013 at 11:31 PM, Chris Barker - NOAA Federal < chris.barker at noaa.gov> wrote: > On Fri, Jan 18, 2013 at 2:22 PM, Matthew Brett > wrote: > > > I personally find 'fill' OK. I'd read: > > > > a = np.empty((10, 10), fill=np.nan) > > > > as > > > > "make an empty array shape (10, 10) and fill with nans" > > +1 > > simple, does the job, and doesn't bloat the API. > +1 from me too. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Jan 18 17:35:59 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 18 Jan 2013 22:35:59 +0000 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50EDB1D4.5090909@astro.uio.no> Message-ID: Hi, On Fri, Jan 18, 2013 at 7:58 PM, Chris Barker - NOAA Federal wrote: > On Fri, Jan 18, 2013 at 4:39 AM, Olivier Delalleau wrote: >> Le vendredi 18 janvier 2013, Chris Barker - NOAA Federal a ?crit : > >> If you check again the examples in this thread exhibiting surprising / >> unexpected behavior, you'll notice most of them are with integers. >> The tricky thing about integers is that downcasting can dramatically change >> your result. With floats, not so much: you get approximation errors (usually >> what you want) and the occasional nan / inf creeping in (usally noticeable). > > fair enough. > > However my core argument is that people use non-standard (usually > smaller) dtypes for a reason, and it should be hard to accidentally > up-cast. > > This is in contrast with the argument that accidental down-casting can > produce incorrect results, and thus it should be hard to accidentally > down-cast -- same argument whether the incorrect results are drastic > or not.... > > It's really a question of which of these we think should be prioritized. After thinking about it for a while, it seems to me Olivier's suggestion is a good one. The rule becomes the following: array + scalar casting is the same as array + array casting except array + scalar casting does not upcast floating point precision of the array. Am I right (Chris, Perry?) that this deals with almost all your cases? Meaning that it is upcasting of floats that is the main problem, not upcasting of (u)ints? This rule seems to me not very far from the current 1.6 behavior; it upcasts more - but the dtype is now predictable. It's easy to explain. It avoids the obvious errors that the 1.6 rules were trying to avoid. It doesn't seem too far to stretch to make a distinction between rules about range (ints) and rules about precision (float, complex). What do you'all think? Best, Matthew From ralf.gommers at gmail.com Fri Jan 18 19:08:22 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 19 Jan 2013 01:08:22 +0100 Subject: [Numpy-discussion] Shouldn't all in-place operations simply return self? In-Reply-To: References: Message-ID: On Thu, Jan 17, 2013 at 11:13 PM, Thouis (Ray) Jones wrote: > On Thu, Jan 17, 2013 at 10:27 AM, Charles R Harris > wrote: > > > > > > On Wed, Jan 16, 2013 at 5:11 PM, eat wrote: > >> > >> Hi, > >> > >> In a recent thread > >> http://article.gmane.org/gmane.comp.python.numeric.general/52772 it was > >> proposed that .fill(.) should return self as an alternative for a > trivial > >> two-liner. > >> > >> I'm raising now the question: what if all in-place operations indeed > could > >> return self? How bad this would be? A 'strong' counter argument may be > found > >> at http://mail.python.org/pipermail/python-dev/2003-October/038855.html > . > >> > >> But anyway, at least for me. it would be much more straightforward to > >> implement simple mini dsl's > >> (http://en.wikipedia.org/wiki/Domain-specific_language) a much more > >> straightforward manner. > >> > >> What do you think? > >> > > > > I've read Guido about why he didn't like inplace operations returning > self > > and found him convincing for a while. And then I listened to other folks > > express a preference for the freight train style and found them > convincing > > also. I think it comes down to a preference for one style over another > and I > > go back and forth myself. If I had to vote, I'd go for returning self, > but > > I'm not sure it's worth breaking python conventions to do so. > > > > Chuck > > I'm -1 on breaking with Python convention without very good reasons. Three times -1: on breaking Python conventions, on changing any existing numpy functions/methods for something like this, and on having similarly named functions like shuffle/shuffled that basically do the same thing. +1 on using out= more, and on some general guideline on function-naming-grammar. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Sat Jan 19 02:28:52 2013 From: fperez.net at gmail.com (Fernando Perez) Date: Fri, 18 Jan 2013 23:28:52 -0800 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> <50F80711.9010204@crans.org> <50F8757B.7060008@hawaii.edu> <50F952B4.4070208@crans.org> Message-ID: On Fri, Jan 18, 2013 at 2:22 PM, Matthew Brett wrote: > I personally find 'fill' OK. I'd read: > > a = np.empty((10, 10), fill=np.nan) > > as > > "make an empty array shape (10, 10) and fill with nans" > > Which would indeed be what the code was doing :) So I doubt that the > semantic clash would cause any long term problems, +1, practicality beats purity... From e.antero.tammi at gmail.com Sat Jan 19 06:35:27 2013 From: e.antero.tammi at gmail.com (eat) Date: Sat, 19 Jan 2013 13:35:27 +0200 Subject: [Numpy-discussion] Shouldn't all in-place operations simply return self? In-Reply-To: References: Message-ID: Hi, On Fri, Jan 18, 2013 at 12:13 AM, Thouis (Ray) Jones wrote: > On Thu, Jan 17, 2013 at 10:27 AM, Charles R Harris > wrote: > > > > > > On Wed, Jan 16, 2013 at 5:11 PM, eat wrote: > >> > >> Hi, > >> > >> In a recent thread > >> http://article.gmane.org/gmane.comp.python.numeric.general/52772 it was > >> proposed that .fill(.) should return self as an alternative for a > trivial > >> two-liner. > >> > >> I'm raising now the question: what if all in-place operations indeed > could > >> return self? How bad this would be? A 'strong' counter argument may be > found > >> at http://mail.python.org/pipermail/python-dev/2003-October/038855.html > . > >> > >> But anyway, at least for me. it would be much more straightforward to > >> implement simple mini dsl's > >> (http://en.wikipedia.org/wiki/Domain-specific_language) a much more > >> straightforward manner. > >> > >> What do you think? > >> > > > > I've read Guido about why he didn't like inplace operations returning > self > > and found him convincing for a while. And then I listened to other folks > > express a preference for the freight train style and found them > convincing > > also. I think it comes down to a preference for one style over another > and I > > go back and forth myself. If I had to vote, I'd go for returning self, > but > > I'm not sure it's worth breaking python conventions to do so. > > > > Chuck > > I'm -1 on breaking with Python convention without very good reasons. > As an example I personally find following behavior highly counter intuitive. In []: p, P= rand(3, 1), rand(3, 5) In []: ((p- P)** 2).sum(0).argsort() Out[]: array([2, 4, 1, 3, 0]) In []: ((p- P)** 2).sum(0).sort().diff() ------------------------------------------------------------ Traceback (most recent call last): File "", line 1, in AttributeError: 'NoneType' object has no attribute 'diff' Regards, -eat > > Ray > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Sun Jan 20 21:10:30 2013 From: shish at keba.be (Olivier Delalleau) Date: Sun, 20 Jan 2013 21:10:30 -0500 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50EDB1D4.5090909@astro.uio.no> Message-ID: 2013/1/18 Matthew Brett : > Hi, > > On Fri, Jan 18, 2013 at 7:58 PM, Chris Barker - NOAA Federal > wrote: >> On Fri, Jan 18, 2013 at 4:39 AM, Olivier Delalleau wrote: >>> Le vendredi 18 janvier 2013, Chris Barker - NOAA Federal a ?crit : >> >>> If you check again the examples in this thread exhibiting surprising / >>> unexpected behavior, you'll notice most of them are with integers. >>> The tricky thing about integers is that downcasting can dramatically change >>> your result. With floats, not so much: you get approximation errors (usually >>> what you want) and the occasional nan / inf creeping in (usally noticeable). >> >> fair enough. >> >> However my core argument is that people use non-standard (usually >> smaller) dtypes for a reason, and it should be hard to accidentally >> up-cast. >> >> This is in contrast with the argument that accidental down-casting can >> produce incorrect results, and thus it should be hard to accidentally >> down-cast -- same argument whether the incorrect results are drastic >> or not.... >> >> It's really a question of which of these we think should be prioritized. > > After thinking about it for a while, it seems to me Olivier's > suggestion is a good one. > > The rule becomes the following: > > array + scalar casting is the same as array + array casting except > array + scalar casting does not upcast floating point precision of the > array. > > Am I right (Chris, Perry?) that this deals with almost all your cases? > Meaning that it is upcasting of floats that is the main problem, not > upcasting of (u)ints? > > This rule seems to me not very far from the current 1.6 behavior; it > upcasts more - but the dtype is now predictable. It's easy to > explain. It avoids the obvious errors that the 1.6 rules were trying > to avoid. It doesn't seem too far to stretch to make a distinction > between rules about range (ints) and rules about precision (float, > complex). > > What do you'all think? Personally, I think the main issue with my suggestion is that it seems hard to go there from the current behavior -- without potentially breaking existing code in non-obvious ways. The main problematic case I foresee is the typical "small_int_array + 1", which would get upcasted while it wasn't the case before (neither in 1.5 nor in 1.6). That's why I think Nathaniel's proposal is more practical. -=- Olivier From pierre.haessig at crans.org Mon Jan 21 03:07:09 2013 From: pierre.haessig at crans.org (Pierre Haessig) Date: Mon, 21 Jan 2013 09:07:09 +0100 Subject: [Numpy-discussion] New numpy functions: filled, filled_like In-Reply-To: References: <50F4400F.4040709@hawaii.edu> <50F44A95.2030202@crans.org> <50F80711.9010204@crans.org> <50F8757B.7060008@hawaii.edu> <50F952B4.4070208@crans.org> Message-ID: <50FCF72D.6040903@crans.org> Le 18/01/2013 23:22, Matthew Brett a ?crit : > I personally find 'fill' OK. I'd read: > > a = np.empty((10, 10), fill=np.nan) > > as > > "make an empty array shape (10, 10) and fill with nans" +1 (and now we have *two* verbs ! ) -- Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From ndbecker2 at gmail.com Mon Jan 21 08:41:52 2013 From: ndbecker2 at gmail.com (Neal Becker) Date: Mon, 21 Jan 2013 08:41:52 -0500 Subject: [Numpy-discussion] another little index puzzle Message-ID: I have an array to be used for indexing. It is 2d, where the rows are all the permutations of some numbers. So: array([[-2, -2, -2], [-2, -2, -1], [-2, -2, 0], [-2, -2, 1], [-2, -2, 2], ... [ 2, 1, 2], [ 2, 2, -2], [ 2, 2, -1], [ 2, 2, 0], [ 2, 2, 1], [ 2, 2, 2]]) Here the array is 125x3 I want to select all the rows of the array in which all the 3 elements are equal, so I can remove them. So for example, the 1st and last row. From robert.kern at gmail.com Mon Jan 21 08:50:53 2013 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 21 Jan 2013 14:50:53 +0100 Subject: [Numpy-discussion] another little index puzzle In-Reply-To: References: Message-ID: On Mon, Jan 21, 2013 at 2:41 PM, Neal Becker wrote: > I have an array to be used for indexing. It is 2d, where the rows are all the > permutations of some numbers. So: > > array([[-2, -2, -2], > [-2, -2, -1], > [-2, -2, 0], > [-2, -2, 1], > [-2, -2, 2], > ... > [ 2, 1, 2], > [ 2, 2, -2], > [ 2, 2, -1], > [ 2, 2, 0], > [ 2, 2, 1], > [ 2, 2, 2]]) > > Here the array is 125x3 > > I want to select all the rows of the array in which all the 3 elements are > equal, so I can remove them. So for example, the 1st and last row. all_equal_mask = np.logical_and.reduce(arr[:,1:] == arr[:,:-1], axis=1) some_unequal = arr[~all_equal_mask] -- Robert Kern From heng at cantab.net Mon Jan 21 09:02:14 2013 From: heng at cantab.net (Henry Gomersall) Date: Mon, 21 Jan 2013 14:02:14 +0000 Subject: [Numpy-discussion] another little index puzzle In-Reply-To: References: Message-ID: <1358776934.25855.49.camel@farnsworth> On Mon, 2013-01-21 at 08:41 -0500, Neal Becker wrote: > I have an array to be used for indexing. It is 2d, where the rows are > all the > permutations of some numbers. So: > > array([[-2, -2, -2], > [-2, -2, -1], > [-2, -2, 0], > [-2, -2, 1], > [-2, -2, 2], > ... > [ 2, 1, 2], > [ 2, 2, -2], > [ 2, 2, -1], > [ 2, 2, 0], > [ 2, 2, 1], > [ 2, 2, 2]]) > > Here the array is 125x3 > > I want to select all the rows of the array in which all the 3 elements > are > equal, so I can remove them. So for example, the 1st and last row. You can use a convolution to pick out the changes... conv_arr = numpy.array([[1, -1, 0], [0, 1, -1]]) equal_selector = ~numpy.any(numpy.dot(b, numpy.transpose(a)), 0) or unequal_selector = numpy.any(numpy.dot(b, numpy.transpose(a)), 0) hen From matthew.brett at gmail.com Mon Jan 21 17:46:55 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 21 Jan 2013 14:46:55 -0800 Subject: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment? In-Reply-To: References: <50EDB1D4.5090909@astro.uio.no> Message-ID: Hi, On Sun, Jan 20, 2013 at 6:10 PM, Olivier Delalleau wrote: > 2013/1/18 Matthew Brett : >> Hi, >> >> On Fri, Jan 18, 2013 at 7:58 PM, Chris Barker - NOAA Federal >> wrote: >>> On Fri, Jan 18, 2013 at 4:39 AM, Olivier Delalleau wrote: >>>> Le vendredi 18 janvier 2013, Chris Barker - NOAA Federal a ?crit : >>> >>>> If you check again the examples in this thread exhibiting surprising / >>>> unexpected behavior, you'll notice most of them are with integers. >>>> The tricky thing about integers is that downcasting can dramatically change >>>> your result. With floats, not so much: you get approximation errors (usually >>>> what you want) and the occasional nan / inf creeping in (usally noticeable). >>> >>> fair enough. >>> >>> However my core argument is that people use non-standard (usually >>> smaller) dtypes for a reason, and it should be hard to accidentally >>> up-cast. >>> >>> This is in contrast with the argument that accidental down-casting can >>> produce incorrect results, and thus it should be hard to accidentally >>> down-cast -- same argument whether the incorrect results are drastic >>> or not.... >>> >>> It's really a question of which of these we think should be prioritized. >> >> After thinking about it for a while, it seems to me Olivier's >> suggestion is a good one. >> >> The rule becomes the following: >> >> array + scalar casting is the same as array + array casting except >> array + scalar casting does not upcast floating point precision of the >> array. >> >> Am I right (Chris, Perry?) that this deals with almost all your cases? >> Meaning that it is upcasting of floats that is the main problem, not >> upcasting of (u)ints? >> >> This rule seems to me not very far from the current 1.6 behavior; it >> upcasts more - but the dtype is now predictable. It's easy to >> explain. It avoids the obvious errors that the 1.6 rules were trying >> to avoid. It doesn't seem too far to stretch to make a distinction >> between rules about range (ints) and rules about precision (float, >> complex). >> >> What do you'all think? > > Personally, I think the main issue with my suggestion is that it seems > hard to go there from the current behavior -- without potentially > breaking existing code in non-obvious ways. The main problematic case > I foresee is the typical "small_int_array + 1", which would get > upcasted while it wasn't the case before (neither in 1.5 nor in 1.6). > That's why I think Nathaniel's proposal is more practical. It's important to establish the behavior we want in the long term, because it will likely affect the stop-gap solution we choose now. For example, let's say we think that the 1.5 behavior is desired in the long term - in that case Nathaniel's solution seems good (although it will change behavior from 1.6.x) If we think that your suggestion is preferable for the long term, sticking with 1.6. behavior is more attractive. It seems to me we need the use-cases laid out properly in order to decide, at the moment we are working somewhat blind, at least in my opinion. Cheers, Matthew From amueller at ais.uni-bonn.de Mon Jan 21 18:02:24 2013 From: amueller at ais.uni-bonn.de (Andreas Mueller) Date: Tue, 22 Jan 2013 00:02:24 +0100 Subject: [Numpy-discussion] ANN: scikit-learn 0.13 released! Message-ID: <50FDC900.9000600@ais.uni-bonn.de> Hi all. I am very happy to announce the release of scikit-learn 0.13. New features in this release include feature hashing for text processing, passive-agressive classifiers, faster random forests and many more. There have also been countless improvements in stability, consistency and usability. Details can be found on the what's new page. Sources and windows binaries are available on sourceforge, through pypi (http://pypi.python.org/pypi/scikit-learn/0.13) or can be installed directly using pip: pip install -U scikit-learn A big "thank you" to all the contributors who made this release possible! In parallel to the release, we started a small survey to get to know our user base a bit more. If you are using scikit-learn, it would be great if you could give us your input. Best, Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Mon Jan 21 18:19:58 2013 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Tue, 22 Jan 2013 00:19:58 +0100 Subject: [Numpy-discussion] ANN: scikit-learn 0.13 released! In-Reply-To: <50FDC900.9000600@ais.uni-bonn.de> References: <50FDC900.9000600@ais.uni-bonn.de> Message-ID: Congrats and thanks to Andreas and everyone involved in the release, the website fixes and the online survey setup. I posted Andreas blog post on HN and reddit: - http://news.ycombinator.com/item?id=5094319 - http://www.reddit.com/r/programming/comments/170oty/scikitlearn_013_is_out_machine_learning_in_python/ We might get some user feedback in the comments there as well. From toddrjen at gmail.com Tue Jan 22 04:21:03 2013 From: toddrjen at gmail.com (Todd) Date: Tue, 22 Jan 2013 10:21:03 +0100 Subject: [Numpy-discussion] Subclassing ndarray with concatenate Message-ID: I am trying to create a subclass of ndarray that has additional attributes. These attributes are maintained with most numpy functions if __array_finalize__ is used. The main exception I have found is concatenate (and hstack/vstack, which just wrap concatenate). In this case, __array_finalize__ is passed an array that has already been stripped of the additional attributes, and I don't see a way to recover this information. In my particular case at least, there are clear ways to handle corner cases (like being passed a class that lacks these attributes), so in principle there no problem handling concatenate in a general way, assuming I can get access to the attributes. So is there any way to subclass ndarray in such a way that concatenate can be handled properly? I have been looking extensively online, but have not been able to find a clear answer on how to do this, or if there even is a way. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Jan 22 07:44:33 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 22 Jan 2013 13:44:33 +0100 Subject: [Numpy-discussion] Subclassing ndarray with concatenate In-Reply-To: References: Message-ID: <1358858673.24631.20.camel@sebastian-laptop> Hey, On Tue, 2013-01-22 at 10:21 +0100, Todd wrote: > I am trying to create a subclass of ndarray that has additional > attributes. These attributes are maintained with most numpy functions > if __array_finalize__ is used. > You can cover a bit more if you also implement `__array_wrap__`, though unless you want to do something fancy, that just replaces the `__array_finalize__` for the most part. But some (very few) functions currently call `__array_wrap__` explicitly. > The main exception I have found is concatenate (and hstack/vstack, > which just wrap concatenate). In this case, __array_finalize__ is > passed an array that has already been stripped of the additional > attributes, and I don't see a way to recover this information. > There are quite a few functions that simply do not preserve subclasses (though I think more could/should call `__array_wrap__` probably, even if the documentation may say that it is about ufuncs, there are some example of this already). `np.concatenate` is one of these. It always returns a base array. In any case it gets a bit difficult if you have multiple input arrays (which may not matter for you). > In my particular case at least, there are clear ways to handle corner > cases (like being passed a class that lacks these attributes), so in > principle there no problem handling concatenate in a general way, > assuming I can get access to the attributes. > > > So is there any way to subclass ndarray in such a way that concatenate > can be handled properly? > Quite simply, no. If you compare masked arrays, they also provide their own concatenate for this reason. I hope that helps a bit... Regards, Sebastian > I have been looking extensively online, but have not been able to find > a clear answer on how to do this, or if there even is a way. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Tue Jan 22 10:56:05 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 22 Jan 2013 16:56:05 +0100 Subject: [Numpy-discussion] Subclassing ndarray with concatenate In-Reply-To: <1358858673.24631.20.camel@sebastian-laptop> References: <1358858673.24631.20.camel@sebastian-laptop> Message-ID: <1358870165.1679.1.camel@sebastian-laptop> On Tue, 2013-01-22 at 13:44 +0100, Sebastian Berg wrote: > Hey, > > On Tue, 2013-01-22 at 10:21 +0100, Todd wrote: > > I am trying to create a subclass of ndarray that has additional > > attributes. These attributes are maintained with most numpy functions > > if __array_finalize__ is used. > > > You can cover a bit more if you also implement `__array_wrap__`, though > unless you want to do something fancy, that just replaces the > `__array_finalize__` for the most part. But some (very few) functions > currently call `__array_wrap__` explicitly. > Actually have to correct myself here. The default __array_wrap__ causes __array_finalize__ to be called as you would expect, so there is no need to use it unless you want to do something fancy. > > The main exception I have found is concatenate (and hstack/vstack, > > which just wrap concatenate). In this case, __array_finalize__ is > > passed an array that has already been stripped of the additional > > attributes, and I don't see a way to recover this information. > > > There are quite a few functions that simply do not preserve subclasses > (though I think more could/should call `__array_wrap__` probably, even > if the documentation may say that it is about ufuncs, there are some > example of this already). > `np.concatenate` is one of these. It always returns a base array. In any > case it gets a bit difficult if you have multiple input arrays (which > may not matter for you). > > > In my particular case at least, there are clear ways to handle corner > > cases (like being passed a class that lacks these attributes), so in > > principle there no problem handling concatenate in a general way, > > assuming I can get access to the attributes. > > > > > > So is there any way to subclass ndarray in such a way that concatenate > > can be handled properly? > > > Quite simply, no. If you compare masked arrays, they also provide their > own concatenate for this reason. > > I hope that helps a bit... > > Regards, > > Sebastian > > > I have been looking extensively online, but have not been able to find > > a clear answer on how to do this, or if there even is a way. > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From wesmckinn at gmail.com Tue Jan 22 11:32:03 2013 From: wesmckinn at gmail.com (Wes McKinney) Date: Tue, 22 Jan 2013 11:32:03 -0500 Subject: [Numpy-discussion] ANN: pandas 0.10.1 is released Message-ID: hi all, We've released pandas 0.10.1 which includes many bug fixes from 0.10.0 (including a number of issues with the new file parser, e.g. reading multiple files in separate threads), various performance improvements, and major new PyTables/HDF5-based functionality contributed by Jeff Reback. I strongly recommend that all users upgrade. Thanks to all who contributed to this release, especially Chang She, Jeff Reback, and Yoval P. As always source archives and Windows installers are on PyPI. What's new: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html Installers: http://pypi.python.org/pypi/pandas $ git log v0.10.0..v0.10.1 --pretty=format:%aN | sort | uniq -c | sort -rn 66 jreback 59 Wes McKinney 43 Chang She 12 y-p 5 Vincent Arel-Bundock 4 Damien Garaud 3 Christopher Whelan 3 Andy Hayden 2 Jay Parlar 2 Dan Allan 1 Thouis (Ray) Jones 1 svaksha 1 herrfz 1 Garrett Drapala 1 elpres 1 Dieter Vandenbussche 1 Anton I. Sipos Happy data hacking! - Wes What is it ========== pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with relational, time series, or any other kind of labeled data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Links ===== Release Notes: http://github.com/pydata/pandas/blob/master/RELEASE.rst Documentation: http://pandas.pydata.org Installers: http://pypi.python.org/pypi/pandas Code Repository: http://github.com/pydata/pandas Mailing List: http://groups.google.com/group/pydata From jrocher at enthought.com Wed Jan 23 15:16:06 2013 From: jrocher at enthought.com (Jonathan Rocher) Date: Wed, 23 Jan 2013 14:16:06 -0600 Subject: [Numpy-discussion] [SCIPY2013] Feedback on mini-symposia themes In-Reply-To: References: Message-ID: Dear community members, [Sorry for the cross-post] We are making progress and building an awesome organization team for the SciPy2013 conference (Scientific Computing with Python) this June 24th-29th in Austin, TX. More on that later. Following my previous email, we have gotten lots of good answers to our survey about the themes the community would like to see at the for the mini-symposia *[1]*. We will leave *this survey open until Feb 7th*. So if you haven't done so, and would like to discuss scientific python tools with peers from the same industry/field, take a second to voice your opinion: http://www.surveygizmo.com/s3/1114631/SciPy-2013-Themes Thanks, The SciPy2013 organizers *[1] These mini-symposia are held to discuss scientific computing applied to a specific scientific domain/industry during a half afternoon after the general conference. Their goal is to promote industry specific libraries and tools, and gather people with similar interests for discussions. For example, the SciPy2012 edition successfully hosted 4 mini-symposia on Astronomy/Astrophysics, Bio-informatics, Meteorology, and Geophysics.* * * On Wed, Jan 9, 2013 at 4:32 PM, Jonathan Rocher wrote: > Dear community members, > > We are working hard to organize the SciPy2013 conference (Scientific > Computing with Python) , > this June 24th-29th in Austin, TX. We would like to probe the community > about the themes you would be interested in contributing to or > participating in for the mini-symposia at SciPy2013. > > These mini-symposia are held to discuss scientific computing applied to a > specific *scientific domain/industry* during a half afternoon after the > general conference. Their goal is to promote industry specific libraries > and tools, and gather people with similar interests for discussions. For > example, the SciPy2012 edition > successfully hosted 4 mini-symposia on Astronomy/Astrophysics, > Bio-informatics, Meteorology, and Geophysics. > > Please join us and voice your opinion to shape the next SciPy conference > at: > > http://www.surveygizmo.com/s3/1114631/SciPy-2013-Themes > > Thanks, > > The Scipy2013 organizers > > -- > Jonathan Rocher, PhD > Scientific software developer > Enthought, Inc. > jrocher at enthought.com > 1-512-536-1057 > http://www.enthought.com > > -- Jonathan Rocher, PhD Scientific software developer Enthought, Inc. jrocher at enthought.com 1-512-536-1057 http://www.enthought.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From raul at virtualmaterials.com Sat Jan 26 12:56:34 2013 From: raul at virtualmaterials.com (Raul Cota) Date: Sat, 26 Jan 2013 10:56:34 -0700 Subject: [Numpy-discussion] NumPy int32 array to Excel through COM server is failing Message-ID: <510418D2.3010706@virtualmaterials.com> Hello, We came across a problem trying to get an array across COM when wrapped as a COM server using the win32com extension. What caught our attention is that arrays of type float64 work fine but do not work for any other array. Does anyone know if there is something we could do at the NumPy level to make it work ? Arrays are mapped onto Variant/SafeArray and the conversion to a Variant seems to be failing for anything that is not a float64. It is not a huge deal for us to workaround the problem but it is kind of ugly and I just wanted to make sure there was not something simple that could be done (particularly if something like this would be considered a bug). I include working sample code below that reproduces the problem where Excel instantiates a com server and requests an array of size and dtype. On the Python side, the code sets up the com server and exposes the function that just returns an array of ones. Code in Excel to get an array: ======================================== Public Sub NumPyVariantTest() Dim npcom As Object Dim arr '... instantiate com object Set npcom = CreateObject("NPTest.COM") size = 7 '... test float64. Works ! arr = npcom.GetNPArray(size, "float64") '... test int32. Fails ! arr = npcom.GetNPArray(size, "int32") End Sub ======================================== Code in Python to set up com server and expose the GetNPArray function ======================================== import win32com.server.util import win32com.client from pythoncom import CLSCTX_LOCAL_SERVER, CLSCTX_INPROC import sys, os import numpy from numpy import zeros, ones, array class NPTestCOM(object): """COM accessible version of CommandInterface""" _reg_clsid_ = "{A0E551F5-2F22-4FB4-B28E-FF1B6809D21C}" _reg_desc_ = "NumPy COM Test" _reg_progid_ = "NPTest.COM" _reg_clsctx_ = CLSCTX_INPROC _public_methods_ = ['GetNPArray'] _public_attrs_ = [] _readonly_attrs_ = [] def GetNPArray(self, size, dtype): """ Return an arbitrary NumPy array of type dtype to check conversion to Variant""" return ones(size, dtype=dtype) if __name__ == '__main__': import win32com.server.register import _winreg dllkey = 'nptestdll' if len(sys.argv) > 1 and sys.argv[1] == 'unregister': win32com.server.register.UnregisterClasses(NPTestCOM) software_key = _winreg.OpenKey(_winreg.HKEY_LOCAL_MACHINE, 'SOFTWARE') vmg_key = _winreg.OpenKey(software_key, 'VMG') _winreg.DeleteKey(vmg_key, dllkey) _winreg.CloseKey(vmg_key) _winreg.CloseKey(software_key) else: win32com.server.register.UseCommandLine(NPTestCOM) software_key = _winreg.OpenKey(_winreg.HKEY_LOCAL_MACHINE, 'SOFTWARE') vmg_key = _winreg.CreateKey(software_key, 'VMG') _winreg.SetValue(vmg_key, dllkey, _winreg.REG_SZ, os.path.abspath(os.curdir)) _winreg.CloseKey(vmg_key) _winreg.CloseKey(software_key) ======================================== Regards, Raul -- Raul Cota (P.Eng., Ph.D. Chemical Engineering) Research & Development Manager Phone: (403) 457 4598 Fax: (403) 457 4637 Virtual Materials Group - Canada www.virtualmaterials.com From olli.wallin at elisanet.fi Sun Jan 27 14:40:49 2013 From: olli.wallin at elisanet.fi (olli.wallin at elisanet.fi) Date: Sun, 27 Jan 2013 21:40:49 +0200 (EET) Subject: [Numpy-discussion] Installing numpy-mkl binary on top of Python(x, y) Message-ID: <26797036.26795851359315649926.JavaMail.olli.wallin@elisanet.fi> Hi, if I want to have a painless Python installation build against Intel MKL on Windows, one obvious choice is to just buy the EPD package. However, as I already do have a C++ licence of the MKL library I was wondering if I could just install the Python(x,y) -distribution and then take one of the NumPy-MKL binaries provided by Christoph Gohlke. Is it simple as that? Any downsides, will SciPy work as well? On the plus side, I would get Spyder2 without hassle and it looks nice to a former Matlab user. I apologize for such a simple question, I would have tried it myself but this is for my work where only IT support has the admin rights and I have mac at home. I want it to be as clearcut for them as possible so I get things up and running. I did try to search the internet and the list but did not find a conclusive answer. Many thanks in advance for any help. All the best, Olli -- From cgohlke at uci.edu Sun Jan 27 14:54:44 2013 From: cgohlke at uci.edu (Christoph Gohlke) Date: Sun, 27 Jan 2013 11:54:44 -0800 Subject: [Numpy-discussion] Installing numpy-mkl binary on top of Python(x, y) In-Reply-To: <26797036.26795851359315649926.JavaMail.olli.wallin@elisanet.fi> References: <26797036.26795851359315649926.JavaMail.olli.wallin@elisanet.fi> Message-ID: <51058604.3000906@uci.edu> On 1/27/2013 11:40 AM, olli.wallin at elisanet.fi wrote: > Hi, > > if I want to have a painless Python installation build against Intel MKL on Windows, one obvious choice is to just buy the EPD package. However, > as I already do have a C++ licence of the MKL library I was wondering if I could just install the Python(x,y) -distribution and then take one of the NumPy-MKL binaries provided > by Christoph Gohlke. Is it simple as that? Any downsides, will SciPy work as well? On the plus side, I would get Spyder2 without hassle and it looks nice to a former Matlab user. > > I apologize for such a simple question, I would have tried it myself but this is for my work where only IT support has the admin rights and I have mac at home. I want it to be as > clearcut for them as possible so I get things up and running. I did try to search the internet and the list but did not find a conclusive answer. > > Many thanks in advance for any help. > > All the best, > > Olli > Try WinPython . It repackages numpy-MKL and other packages from , contains Spyder and all dependencies, is available as 64 bit, and does not require admin rights to install. Christoph From josef.pktd at gmail.com Sun Jan 27 16:34:16 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 27 Jan 2013 16:34:16 -0500 Subject: [Numpy-discussion] Installing numpy-mkl binary on top of Python(x, y) In-Reply-To: <51058604.3000906@uci.edu> References: <26797036.26795851359315649926.JavaMail.olli.wallin@elisanet.fi> <51058604.3000906@uci.edu> Message-ID: On Sun, Jan 27, 2013 at 2:54 PM, Christoph Gohlke wrote: > On 1/27/2013 11:40 AM, olli.wallin at elisanet.fi wrote: >> Hi, >> >> if I want to have a painless Python installation build against Intel MKL on Windows, one obvious choice is to just buy the EPD package. However, >> as I already do have a C++ licence of the MKL library I was wondering if I could just install the Python(x,y) -distribution and then take one of the NumPy-MKL binaries provided >> by Christoph Gohlke. Is it simple as that? Any downsides, will SciPy work as well? On the plus side, I would get Spyder2 without hassle and it looks nice to a former Matlab user. >> >> I apologize for such a simple question, I would have tried it myself but this is for my work where only IT support has the admin rights and I have mac at home. I want it to be as >> clearcut for them as possible so I get things up and running. I did try to search the internet and the list but did not find a conclusive answer. >> >> Many thanks in advance for any help. >> >> All the best, >> >> Olli >> > > Try WinPython . It repackages > numpy-MKL and other packages from > , contains Spyder and all > dependencies, is available as 64 bit, and does not require admin rights > to install. You can replace python xy installed packages but it's necessary to watch out for dependencies. If you replace numpy with the mkl version, then you also have to replace scipy with the mkl version, as far as I understand. I initially installed python xy on a new computer and updated many packages since, using standard python not the python xy updates. The only problem I have is that I have some incompatibilities between QT, pyQT, pyside, spyder and the ipython qt console, the later doesn't work in my current setup. Josef > > Christoph > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From zdoor at xs4all.nl Mon Jan 28 10:06:59 2013 From: zdoor at xs4all.nl (Alex) Date: Mon, 28 Jan 2013 15:06:59 +0000 (UTC) Subject: [Numpy-discussion] =?utf-8?q?Merging_structured_arrays_with_mixed?= =?utf-8?q?_dtypes_including_=27=7CO4=27?= Message-ID: Let's say I have two structured arrays with dtypes as per below >>> getdat.dtype dtype([('Tstamp', '|O4'), ('Vf', '>> out.dtype dtype([('Viscosity_cSt', '>> rfn.merge_arrays((getdat, out), flatten = True, usemask = False, asrecarray=False) Traceback (most recent call last): File "", line 1, in rfn.merge_arrays((getdat, out), flatten = True, usemask = False, asrecarray=False) File "C:\Python27\lib\site-packages\numpy\lib\recfunctions.py", line 458, in merge_arrays dtype=newdtype, count=maxlength) ValueError: cannot create object arrays from iterator The issue seems to be object field 'Tstamp' which contains python datetime objects. I can merge structured arrays with numeric formats. Any help much appreciated. Alex van der Spek From mail.till at gmx.de Mon Jan 28 11:31:26 2013 From: mail.till at gmx.de (Till Stensitzki) Date: Mon, 28 Jan 2013 16:31:26 +0000 (UTC) Subject: [Numpy-discussion] Matrix Expontial for differenr t. Message-ID: Hi group, is there a faster way to calculate the matrix exponential for different t's than this: def sol_matexp(A, tlist, y0): w, v = np.linalg.eig(A) out = np.zeros((tlist.size, y0.size)) for i, t in enumerate(tlist): sol_t = np.dot(v,np.diag(np.exp(-w*t))).dot(np.linalg.inv(v)).dot(y0) out[i, :] = sol_t return out This is the calculates exp(-Kt).dot(y0) for a list a ts. greetings Till From robert.kern at gmail.com Mon Jan 28 11:42:18 2013 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 28 Jan 2013 17:42:18 +0100 Subject: [Numpy-discussion] Matrix Expontial for differenr t. In-Reply-To: References: Message-ID: On Mon, Jan 28, 2013 at 5:31 PM, Till Stensitzki wrote: > Hi group, > is there a faster way to calculate the > matrix exponential for different t's > than this: > > def sol_matexp(A, tlist, y0): > w, v = np.linalg.eig(A) > out = np.zeros((tlist.size, y0.size)) > for i, t in enumerate(tlist): > sol_t = np.dot(v,np.diag(np.exp(-w*t))).dot(np.linalg.inv(v)).dot(y0) > out[i, :] = sol_t > return out > > This is the calculates exp(-Kt).dot(y0) for a list a ts. You can precalculate the latter part of the expression and avoid the inv() by using solve(). viy0 = np.linalg.solve(v, y0) for i, t in enumerate(tlist): # And no need to dot() the first part. Broadcasting works just fine. sol_t = (v * np.exp(-w*t)).dot(viy0) ... -- Robert Kern From nadavh at visionsense.com Mon Jan 28 11:48:27 2013 From: nadavh at visionsense.com (Nadav Horesh) Date: Mon, 28 Jan 2013 16:48:27 +0000 Subject: [Numpy-discussion] Matrix Expontial for differenr t. In-Reply-To: References: Message-ID: I did not try it, but I assume that you can build a stack of diagonal matrices as a MxNxN array and use tensordot with the matrix v (and it's inverse). The trivial way to accelerate the loop is to calculate in inverse of v before the loop. Nadav ________________________________________ From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] on behalf of Till Stensitzki [mail.till at gmx.de] Sent: 28 January 2013 18:31 To: numpy-discussion at scipy.org Subject: [Numpy-discussion] Matrix Expontial for differenr t. Hi group, is there a faster way to calculate the matrix exponential for different t's than this: def sol_matexp(A, tlist, y0): w, v = np.linalg.eig(A) out = np.zeros((tlist.size, y0.size)) for i, t in enumerate(tlist): sol_t = np.dot(v,np.diag(np.exp(-w*t))).dot(np.linalg.inv(v)).dot(y0) out[i, :] = sol_t return out This is the calculates exp(-Kt).dot(y0) for a list a ts. greetings Till _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From pierre.haessig at crans.org Mon Jan 28 11:52:21 2013 From: pierre.haessig at crans.org (Pierre Haessig) Date: Mon, 28 Jan 2013 17:52:21 +0100 Subject: [Numpy-discussion] Matrix Expontial for differenr t. In-Reply-To: References: Message-ID: <5106ACC5.5010107@crans.org> Hi, Le 28/01/2013 17:31, Till Stensitzki a ?crit : > This is the calculates exp(-Kt).dot(y0) for a list a ts. If your time vector ts is *regularly* discretized with a timestep h, you could try an iterative computation I would (roughly) write this as : Ah = np.expm(A*h) # or use the "diagonalization + np.exp" method you mentionned y[0] = y0 for i in range(len(tlist)-1): y[i+1] = Ah*y[i] best, Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From mail.till at gmx.de Mon Jan 28 12:14:49 2013 From: mail.till at gmx.de (Till Stensitzki) Date: Mon, 28 Jan 2013 17:14:49 +0000 (UTC) Subject: [Numpy-discussion] Matrix Expontial for differenr t. References: Message-ID: Thanks for hints so far, i am especially searching for a way to get rid of the t loop. Making a NxMxM Matrix is quite memory inefficient in my case (N > M). On way would be just use cython, but i think this problem common enough to have a solution into scipy. (Solution of a simple compartment model.) thanks, Till From pierre.haessig at crans.org Mon Jan 28 12:24:57 2013 From: pierre.haessig at crans.org (Pierre Haessig) Date: Mon, 28 Jan 2013 18:24:57 +0100 Subject: [Numpy-discussion] Matrix Expontial for differenr t. In-Reply-To: References: Message-ID: <5106B469.6030206@crans.org> Hi, Le 28/01/2013 18:14, Till Stensitzki a ?crit : > On way would be just use cython, but i think this problem > common enough to have a solution into scipy. > (Solution of a simple compartment model.) I see the solution you propose as a specialized ODE solver for linear systems. Then, what about using a general purpose ODE ? I guess there would be some integration errors as opposed to the exact integration method but errors bounds should be manageable. Maybe the performance would be increased thanks to ODE solvers being already written in C or Fortran? (At this point, please note that I'm handwaving a lot !) Best, Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From mesanthu at gmail.com Mon Jan 28 12:58:56 2013 From: mesanthu at gmail.com (santhu kumar) Date: Mon, 28 Jan 2013 11:58:56 -0600 Subject: [Numpy-discussion] Numpy multiple instances Message-ID: Hello, I have embedded python/numpy scripts in an application that runs in parallel. But the python code is always invoked on the master node. So it could be assumed that at some point, there could be multiple instances of script being invoked and run. I have commented out import numpy part to see if both get the same sys.path and here are the sys.path: >From C : Replica ID value 0 >From Python : The replicaId 0 of 2 simulations running at 0.0 with 27319 atoms ['/usr/lib64/python24.zip', '/usr/lib64/python2.4', '/usr/lib64/python2.4/plat-linux2', '/usr/lib64/python2.4/lib-tk', '/usr/lib64/python2.4/lib-dynload', '/usr/lib64/python2.4/site-packages', '/usr/lib64/python2.4/site-packages/Numeric', '/usr/lib64/python2.4/site-packages/gtk-2.0', '/usr/lib/python2.4/site-packages', 'python_custom'] >From C : Replica ID value 1 >From Python : The replicaId 1 of 2 simulations running at 0.0 with 27319 atoms ['/usr/lib64/python24.zip', '/usr/lib64/python2.4', '/usr/lib64/python2.4/plat-linux2', '/usr/lib64/python2.4/lib-tk', '/usr/lib64/python2.4/lib-dynload', '/usr/lib64/python2.4/site-packages', '/usr/lib64/python2.4/site-packages/Numeric', '/usr/lib64/python2.4/site-packages/gtk-2.0', '/usr/lib/python2.4/site-packages', 'python_custom'] But once I uncomment, import numpy as np part in the script, >From C : Replica ID value 0 >From Python : The replicaId 0 of 2 simulations running at 0.0 with 27319 atoms ['/usr/lib64/python24.zip', '/usr/lib64/python2.4', '/usr/lib64/python2.4/plat-linux2', '/usr/lib64/python2.4/lib-tk', '/usr/lib64/python2.4/lib-dynload', '/usr/lib64/python2.4/site-packages', '/usr/lib64/python2.4/site-packages/Numeric', '/usr/lib64/python2.4/site-packages/gtk-2.0', '/usr/lib/python2.4/site-packages', 'python_custom'] >From C : Replica ID value 1 Traceback (most recent call last): File "python_custom/customF.py", line 3, in ? import numpy ImportError: No module named numpy Just giving some more information : When I embedded the python call in C, I had to comment out Py_Finalize() as numpy was throwing an error when trying to finalize. Any ideas/suggestions on whats happening ? Do I need to do something special to have multiple instances of python/numpy in C? Thanks Santhosh -------------- next part -------------- An HTML attachment was scrubbed... URL: From mesanthu at gmail.com Mon Jan 28 13:23:55 2013 From: mesanthu at gmail.com (santhu kumar) Date: Mon, 28 Jan 2013 12:23:55 -0600 Subject: [Numpy-discussion] Numpy multiple instances Message-ID: Please ignore the previous message. I have done some testing and found it to be running on a client node instead of the master node. The problem might be because node2, does not have numpy installed. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From xscript at gmx.net Mon Jan 28 17:15:00 2013 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Mon, 28 Jan 2013 23:15:00 +0100 Subject: [Numpy-discussion] numpythonically getting elements with the minimum sum Message-ID: <871ud5x8d7.fsf@fimbulvetr.bsc.es> Hi, I have a somewhat convoluted N-dimensional array that contains information of a set of experiments. The last dimension has as many entries as iterations in the experiment (an iterative application), and the penultimate dimension has as many entries as times I have run that experiment; the rest of dimensions describe the features of the experiment: data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, NUM_ITERATIONS) So, what I want is to get the data for the best run of each experiment: best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS) by selecting, for each experiment, the run with the lowest total time (sum of the time of all iterations for that experiment). So far I've got the trivial part, but not the final indexing into "data": dsum = data.sum(axis = -1) dmin = dsum.min(axis = -1) best = data[???] I'm sure there must be some numpythonic and generic way to get what I want, but fancy indexing is beating me here :) Thanks a lot! Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From irving at naml.us Mon Jan 28 18:48:36 2013 From: irving at naml.us (Geoffrey Irving) Date: Mon, 28 Jan 2013 15:48:36 -0800 Subject: [Numpy-discussion] PyArray_FromAny silently converts None to a singleton nan Message-ID: I discovered this from C via the PyArray_FromAny function, but here it is in Python: >>> asarray(None,dtype=float) array(nan) Is this expected or documented behavior? It seems quite unintuitive and surprising that this wouldn't throw an exception. Is there a way to disable this behavior in PyArray_FromAny in order to catch bugs earlier on? In the situation where I discovered this I actually passed None to a wrapped C routine, and it complained that it didn't have rank 2 (since the resulting nan singleton had rank 0). It'd be much nicer to get something mentioning NoneType. I suppose I could check for None manually as long there aren't any other weird cases. Geoffrey From brad.froehle at gmail.com Mon Jan 28 20:09:50 2013 From: brad.froehle at gmail.com (Bradley M. Froehle) Date: Mon, 28 Jan 2013 17:09:50 -0800 Subject: [Numpy-discussion] PyArray_FromAny silently converts None to a singleton nan In-Reply-To: References: Message-ID: >>> import numpy as np >>> np.double(None) nan On Mon, Jan 28, 2013 at 3:48 PM, Geoffrey Irving wrote: > I discovered this from C via the PyArray_FromAny function, but here it > is in Python: > > >>> asarray(None,dtype=float) > array(nan) > > Is this expected or documented behavior? -------------- next part -------------- An HTML attachment was scrubbed... URL: From irving at naml.us Mon Jan 28 20:27:45 2013 From: irving at naml.us (Geoffrey Irving) Date: Mon, 28 Jan 2013 17:27:45 -0800 Subject: [Numpy-discussion] PyArray_FromAny silently converts None to a singleton nan In-Reply-To: References: Message-ID: For comparison: >>> float32(None) nan >>> float(None) Traceback (most recent call last): File "", line 1, in TypeError: float() argument must be a string or a number On Mon, Jan 28, 2013 at 5:09 PM, Bradley M. Froehle wrote: >>>> import numpy as np >>>> np.double(None) > nan > > On Mon, Jan 28, 2013 at 3:48 PM, Geoffrey Irving wrote: >> >> I discovered this from C via the PyArray_FromAny function, but here it >> is in Python: >> >> >>> asarray(None,dtype=float) >> array(nan) >> >> Is this expected or documented behavior? > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From gregor.thalhammer at gmail.com Tue Jan 29 03:49:55 2013 From: gregor.thalhammer at gmail.com (Gregor Thalhammer) Date: Tue, 29 Jan 2013 09:49:55 +0100 Subject: [Numpy-discussion] numpythonically getting elements with the minimum sum In-Reply-To: <871ud5x8d7.fsf@fimbulvetr.bsc.es> References: <871ud5x8d7.fsf@fimbulvetr.bsc.es> Message-ID: Am 28.1.2013 um 23:15 schrieb Llu?s: > Hi, > > I have a somewhat convoluted N-dimensional array that contains information of a > set of experiments. > > The last dimension has as many entries as iterations in the experiment (an > iterative application), and the penultimate dimension has as many entries as > times I have run that experiment; the rest of dimensions describe the features > of the experiment: > > data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, NUM_ITERATIONS) > > So, what I want is to get the data for the best run of each experiment: > > best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS) > > by selecting, for each experiment, the run with the lowest total time (sum of > the time of all iterations for that experiment). > > > So far I've got the trivial part, but not the final indexing into "data": > > dsum = data.sum(axis = -1) > dmin = dsum.min(axis = -1) > best = data[???] > > > I'm sure there must be some numpythonic and generic way to get what I want, but > fancy indexing is beating me here :) Did you have a look at the argmin function? It delivers the indices of the minimum values along an axis. Untested guess: dmin_idx = argmin(dsum, axis = -1) best = data[..., dmin_idx, :] Gregor From valentin at haenel.co Tue Jan 29 04:49:07 2013 From: valentin at haenel.co (Valentin Haenel) Date: Tue, 29 Jan 2013 10:49:07 +0100 Subject: [Numpy-discussion] Question about documentation for SWIG and ctypes numpy support Message-ID: <20130129094907.GA30692@kudu.in-berlin.de> Hi, I need to link the documentation on ctypes and SWIG support for Numpy. For ctypes I found: http://www.scipy.org/Cookbook/Ctypes Which seems to be reasonably up-to-date. There are of course also: http://docs.scipy.org/doc/numpy/reference/routines.ctypeslib.html There are also the corresponding section from the API docs: http://docs.scipy.org/doc/numpy/reference/routines.ctypeslib.html http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.ctypes.html So for numpy ctypes support, I would link those three. For SWIG I found: http://www.scipy.org/Cookbook/SWIG_NumPy_examples And this seems to be somewhat outdated, at least it references files from the numpy svn... :( There is also: http://docs.scipy.org/doc/numpy/reference/swig.interface-file.html Which seems to be more up-to-date, although it doesn't contain much information about the compilation procedure, like the cookbook does. I would probably only link that last one for numpy swig support. Is there any other documentation I should be aware of? V- From denis-bz-gg at t-online.de Tue Jan 29 06:16:43 2013 From: denis-bz-gg at t-online.de (denis) Date: Tue, 29 Jan 2013 11:16:43 +0000 (UTC) Subject: [Numpy-discussion] np.where: x and y need to have the same shape as condition ? Message-ID: Folks, the doc for `where` says "x and y need to have the same shape as condition" http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.where.html But surely "where is equivalent to: [xv if c else yv for (c,xv,yv) in zip(condition,x,y)]" holds as long as len(condition) == len(x) == len(y) ? And `condition` can be broadcast ? n = 3 all01 = np.array([ t for t in np.ndindex( n * (2,) )]) # 000 001 ... x = np.zeros(n) y = np.ones(n) w = np.where( all01, y, x ) # 2^n x n Can anyone please help me understand `where` / extend "where is equivalent to ..." ? Thanks, cheers -- denis From xscript at gmx.net Tue Jan 29 08:53:29 2013 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Tue, 29 Jan 2013 14:53:29 +0100 Subject: [Numpy-discussion] numpythonically getting elements with the minimum sum In-Reply-To: (Gregor Thalhammer's message of "Tue, 29 Jan 2013 09:49:55 +0100") References: <871ud5x8d7.fsf@fimbulvetr.bsc.es> Message-ID: <87txq0t7s6.fsf@fimbulvetr.bsc.es> Gregor Thalhammer writes: > Am 28.1.2013 um 23:15 schrieb Llu?s: >> Hi, >> >> I have a somewhat convoluted N-dimensional array that contains information of a >> set of experiments. >> >> The last dimension has as many entries as iterations in the experiment (an >> iterative application), and the penultimate dimension has as many entries as >> times I have run that experiment; the rest of dimensions describe the features >> of the experiment: >> >> data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, NUM_ITERATIONS) >> >> So, what I want is to get the data for the best run of each experiment: >> >> best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS) >> >> by selecting, for each experiment, the run with the lowest total time (sum of >> the time of all iterations for that experiment). >> >> >> So far I've got the trivial part, but not the final indexing into "data": >> >> dsum = data.sum(axis = -1) >> dmin = dsum.min(axis = -1) >> best = data[???] >> >> >> I'm sure there must be some numpythonic and generic way to get what I want, but >> fancy indexing is beating me here :) > Did you have a look at the argmin function? It delivers the indices of the minimum values along an axis. Untested guess: > dmin_idx = argmin(dsum, axis = -1) > best = data[..., dmin_idx, :] Ah, sorry, my example is incorrect. I was actually using 'argmin', but indexing with it does not exactly work as I expected: >>> d1.shape (2, 5, 10) >>> dsum = d1.sum(axis = -1) >>> dmin = d1.argmin(axis = -1) >>> dmin.shape (2,) >>> d1_best = d1[...,dmin,:] >>> d1_best.shape (2, 2, 10) Assuming 1st dimension is the test, 2nd the run and 10th the iterations, using this previous code with some example values: >>> dmin [4 3] >>> d1_best [[[ ... contents of d1[0,4,:] ...] [ ... contents of d1[0,3,:] ...]] [[ ... contents of d1[1,4,:] ...] [ ... contents of d1[1,3,:] ...]]] While I actually want this: [[ ... contents of d1[0,4,:] ...] [ ... contents of d1[1,3,:] ...]] Thanks, Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From sebastian at sipsolutions.net Tue Jan 29 09:11:55 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 29 Jan 2013 15:11:55 +0100 Subject: [Numpy-discussion] numpythonically getting elements with the minimum sum In-Reply-To: <87txq0t7s6.fsf@fimbulvetr.bsc.es> References: <871ud5x8d7.fsf@fimbulvetr.bsc.es> <87txq0t7s6.fsf@fimbulvetr.bsc.es> Message-ID: <1359468715.3559.10.camel@sebastian-laptop> On Tue, 2013-01-29 at 14:53 +0100, Llu?s wrote: > Gregor Thalhammer writes: > > > Am 28.1.2013 um 23:15 schrieb Llu?s: > > >> Hi, > >> > >> I have a somewhat convoluted N-dimensional array that contains information of a > >> set of experiments. > >> > >> The last dimension has as many entries as iterations in the experiment (an > >> iterative application), and the penultimate dimension has as many entries as > >> times I have run that experiment; the rest of dimensions describe the features > >> of the experiment: > >> > >> data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, NUM_ITERATIONS) > >> > >> So, what I want is to get the data for the best run of each experiment: > >> > >> best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS) > >> > >> by selecting, for each experiment, the run with the lowest total time (sum of > >> the time of all iterations for that experiment). > >> > >> > >> So far I've got the trivial part, but not the final indexing into "data": > >> > >> dsum = data.sum(axis = -1) > >> dmin = dsum.min(axis = -1) > >> best = data[???] > >> > >> > >> I'm sure there must be some numpythonic and generic way to get what I want, but > >> fancy indexing is beating me here :) > > > Did you have a look at the argmin function? It delivers the indices of the minimum values along an axis. Untested guess: > > > dmin_idx = argmin(dsum, axis = -1) > > best = data[..., dmin_idx, :] > > Ah, sorry, my example is incorrect. I was actually using 'argmin', but indexing > with it does not exactly work as I expected: > > >>> d1.shape > (2, 5, 10) > >>> dsum = d1.sum(axis = -1) > >>> dmin = d1.argmin(axis = -1) > >>> dmin.shape > (2,) > >>> d1_best = d1[...,dmin,:] You need to use fancy indexing. Something like: >>> d1_best = d1[np.arange(2), dmin,:] Because the Ellipsis takes everything from the axis, while you want to pick from multiple axes at the same time. That can be achieved with fancy indexing (indexing with arrays). From another perspective, you want to get rid of two axes in favor of a new one, but a slice/Ellipsis always preserves the axis it works on. > >>> d1_best.shape > (2, 2, 10) > > > Assuming 1st dimension is the test, 2nd the run and 10th the iterations, using > this previous code with some example values: > > >>> dmin > [4 3] > >>> d1_best > [[[ ... contents of d1[0,4,:] ...] > [ ... contents of d1[0,3,:] ...]] > [[ ... contents of d1[1,4,:] ...] > [ ... contents of d1[1,3,:] ...]]] > > > While I actually want this: > > [[ ... contents of d1[0,4,:] ...] > [ ... contents of d1[1,3,:] ...]] > > > Thanks, > Lluis > From xscript at gmx.net Tue Jan 29 10:56:47 2013 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Tue, 29 Jan 2013 16:56:47 +0100 Subject: [Numpy-discussion] numpythonically getting elements with the minimum sum In-Reply-To: <1359468715.3559.10.camel@sebastian-laptop> (Sebastian Berg's message of "Tue, 29 Jan 2013 15:11:55 +0100") References: <871ud5x8d7.fsf@fimbulvetr.bsc.es> <87txq0t7s6.fsf@fimbulvetr.bsc.es> <1359468715.3559.10.camel@sebastian-laptop> Message-ID: <87sj5kq8xs.fsf@fimbulvetr.bsc.es> Sebastian Berg writes: > On Tue, 2013-01-29 at 14:53 +0100, Llu?s wrote: >> Gregor Thalhammer writes: >> >> > Am 28.1.2013 um 23:15 schrieb Llu?s: >> >> >> Hi, >> >> >> >> I have a somewhat convoluted N-dimensional array that contains information of a >> >> set of experiments. >> >> >> >> The last dimension has as many entries as iterations in the experiment (an >> >> iterative application), and the penultimate dimension has as many entries as >> >> times I have run that experiment; the rest of dimensions describe the features >> >> of the experiment: >> >> >> >> data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, NUM_ITERATIONS) >> >> >> >> So, what I want is to get the data for the best run of each experiment: >> >> >> >> best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS) >> >> >> >> by selecting, for each experiment, the run with the lowest total time (sum of >> >> the time of all iterations for that experiment). >> >> >> >> >> >> So far I've got the trivial part, but not the final indexing into "data": >> >> >> >> dsum = data.sum(axis = -1) >> >> dmin = dsum.min(axis = -1) >> >> best = data[???] >> >> >> >> >> >> I'm sure there must be some numpythonic and generic way to get what I want, but >> >> fancy indexing is beating me here :) >> >> > Did you have a look at the argmin function? It delivers the indices of the minimum values along an axis. Untested guess: >> >> > dmin_idx = argmin(dsum, axis = -1) >> > best = data[..., dmin_idx, :] >> >> Ah, sorry, my example is incorrect. I was actually using 'argmin', but indexing >> with it does not exactly work as I expected: >> >> >>> d1.shape >> (2, 5, 10) >> >>> dsum = d1.sum(axis = -1) >> >>> dmin = d1.argmin(axis = -1) >> >>> dmin.shape >> (2,) >> >>> d1_best = d1[...,dmin,:] > You need to use fancy indexing. Something like: >>>> d1_best = d1[np.arange(2), dmin,:] > Because the Ellipsis takes everything from the axis, while you want to > pick from multiple axes at the same time. That can be achieved with > fancy indexing (indexing with arrays). From another perspective, you > want to get rid of two axes in favor of a new one, but a slice/Ellipsis > always preserves the axis it works on. Nice, thanks. That works for this specific example, but I couldn't get it to work with "d1.shape == (1, 2, 16, 5, 10)" (thus "dmin.shape == (1, 2, 16)"): >>> def get_best_run (data, field): ... """Returns the best run.""" ... data = data.view(np.ndarray) ... assert data.ndim >= 2 ... dsum = data[field].sum(axis=-1) ... dmin = dsum.argmin(axis=-1) ... idxs = [ np.arange(dlen) for dlen in data.shape[:-2] ] ... idxs += [ dmin ] ... idxs += [ slice(None) ] ... return data[tuple(idxs)] >>> d1.shape (2, 5, 10) >>> get_best_run(d1, "time") (2, 10) >>> d2.shape (1, 2, 16, 5, 10) >>> get_best_run(d2, "time") Traceback (most recent call last): ... File "./plot-user.py", line 89, in get_best_run res = data.view(np.ndarray)[tuple(idxs)] ValueError: shape mismatch: objects cannot be broadcast to a single shape After reading the "Advanced indexing section", my understanding is that the elements in "idxs" are not broadcastable to the same shape, but I'm not sure how I should build them to be broadcastable to what specific shape. Thanks a lot, Lluis >> >>> d1_best.shape >> (2, 2, 10) >> >> >> Assuming 1st dimension is the test, 2nd the run and 10th the iterations, using >> this previous code with some example values: >> >> >>> dmin >> [4 3] >> >>> d1_best >> [[[ ... contents of d1[0,4,:] ...] >> [ ... contents of d1[0,3,:] ...]] >> [[ ... contents of d1[1,4,:] ...] >> [ ... contents of d1[1,3,:] ...]]] >> >> >> While I actually want this: >> >> [[ ... contents of d1[0,4,:] ...] >> [ ... contents of d1[1,3,:] ...]] -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From xscript at gmx.net Tue Jan 29 13:07:03 2013 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Tue, 29 Jan 2013 19:07:03 +0100 Subject: [Numpy-discussion] numpythonically getting elements with the minimum sum In-Reply-To: <87sj5kq8xs.fsf@fimbulvetr.bsc.es> (=?utf-8?Q?=22Llu=C3=ADs?= =?utf-8?Q?=22's?= message of "Tue, 29 Jan 2013 16:56:47 +0100") References: <871ud5x8d7.fsf@fimbulvetr.bsc.es> <87txq0t7s6.fsf@fimbulvetr.bsc.es> <1359468715.3559.10.camel@sebastian-laptop> <87sj5kq8xs.fsf@fimbulvetr.bsc.es> Message-ID: <87y5fbq2wo.fsf@fimbulvetr.bsc.es> Llu?s writes: > Sebastian Berg writes: >> On Tue, 2013-01-29 at 14:53 +0100, Llu?s wrote: >>> Gregor Thalhammer writes: >>> >>> > Am 28.1.2013 um 23:15 schrieb Llu?s: >>> >>> >> Hi, >>> >> >>> >> I have a somewhat convoluted N-dimensional array that contains information of a >>> >> set of experiments. >>> >> >>> >> The last dimension has as many entries as iterations in the experiment (an >>> >> iterative application), and the penultimate dimension has as many entries as >>> >> times I have run that experiment; the rest of dimensions describe the features >>> >> of the experiment: >>> >> >>> >> data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, NUM_ITERATIONS) >>> >> >>> >> So, what I want is to get the data for the best run of each experiment: >>> >> >>> >> best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS) >>> >> >>> >> by selecting, for each experiment, the run with the lowest total time (sum of >>> >> the time of all iterations for that experiment). >>> >> >>> >> >>> >> So far I've got the trivial part, but not the final indexing into "data": >>> >> >>> >> dsum = data.sum(axis = -1) >>> >> dmin = dsum.min(axis = -1) >>> >> best = data[???] >>> >> >>> >> >>> >> I'm sure there must be some numpythonic and generic way to get what I want, but >>> >> fancy indexing is beating me here :) >>> >>> > Did you have a look at the argmin function? It delivers the indices of the minimum values along an axis. Untested guess: >>> >>> > dmin_idx = argmin(dsum, axis = -1) >>> > best = data[..., dmin_idx, :] >>> >>> Ah, sorry, my example is incorrect. I was actually using 'argmin', but indexing >>> with it does not exactly work as I expected: >>> >>> >>> d1.shape >>> (2, 5, 10) >>> >>> dsum = d1.sum(axis = -1) >>> >>> dmin = d1.argmin(axis = -1) >>> >>> dmin.shape >>> (2,) >>> >>> d1_best = d1[...,dmin,:] >> You need to use fancy indexing. Something like: >>>>> d1_best = d1[np.arange(2), dmin,:] >> Because the Ellipsis takes everything from the axis, while you want to >> pick from multiple axes at the same time. That can be achieved with >> fancy indexing (indexing with arrays). From another perspective, you >> want to get rid of two axes in favor of a new one, but a slice/Ellipsis >> always preserves the axis it works on. > Nice, thanks. That works for this specific example, but I couldn't get it to > work with "d1.shape == (1, 2, 16, 5, 10)" (thus "dmin.shape == (1, 2, 16)"): >>>> def get_best_run (data, field): > ... """Returns the best run.""" > ... data = data.view(np.ndarray) > ... assert data.ndim >= 2 > ... dsum = data[field].sum(axis=-1) > ... dmin = dsum.argmin(axis=-1) > ... idxs = [ np.arange(dlen) for dlen in data.shape[:-2] ] > ... idxs += [ dmin ] > ... idxs += [ slice(None) ] > ... return data[tuple(idxs)] >>>> d1.shape > (2, 5, 10) >>>> get_best_run(d1, "time") > (2, 10) >>>> d2.shape > (1, 2, 16, 5, 10) >>>> get_best_run(d2, "time") > Traceback (most recent call last): > ... > File "./plot-user.py", line 89, in get_best_run > res = data.view(np.ndarray)[tuple(idxs)] > ValueError: shape mismatch: objects cannot be broadcast to a single shape > After reading the "Advanced indexing section", my understanding is that the > elements in "idxs" are not broadcastable to the same shape, but I'm not sure how > I should build them to be broadcastable to what specific shape. BTW, here's an equivalent that seems to work on all cases, although I would prefer to avoid control code to manually fill-in the result: >>> def get_best_run (data, field): ... """Returns the best run.""" ... data = data.view(np.ndarray) ... assert data.ndim >= 2 ... dsum = data[field].sum(axis=-1) ... dmin = dsum.argmin(axis=-1) ... ... res_shape = list(data.shape) ... del res_shape[-2] ... res = np.ndarray(res_shape, dtype = data.dtype) ... ... idxs = np.unravel_index(np.arange(dmin.size), dmin.shape) ... for idx in itertools.izip(*idxs): ... isum = dsum[idx] ... imin = dmin[idx] ... idata = data[idx] ... res[idx] = data[tuple(list(idx) + [imin])] ... ... return res >>> d1.shape (2, 5, 10) >>> get_best_run(d1, "time") (2, 10) >>> d2.shape (1, 2, 16, 5, 10) >>> get_best_run(d2, "time") (1, 2, 16, 10) Thanks, Lluis >>> >>> d1_best.shape >>> (2, 2, 10) >>> >>> >>> Assuming 1st dimension is the test, 2nd the run and 10th the iterations, using >>> this previous code with some example values: >>> >>> >>> dmin >>> [4 3] >>> >>> d1_best >>> [[[ ... contents of d1[0,4,:] ...] >>> [ ... contents of d1[0,3,:] ...]] >>> [[ ... contents of d1[1,4,:] ...] >>> [ ... contents of d1[1,3,:] ...]]] >>> >>> >>> While I actually want this: >>> >>> [[ ... contents of d1[0,4,:] ...] >>> [ ... contents of d1[1,3,:] ...]] -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From ben.root at ou.edu Tue Jan 29 17:19:53 2013 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 29 Jan 2013 17:19:53 -0500 Subject: [Numpy-discussion] np.where: x and y need to have the same shape as condition ? In-Reply-To: References: Message-ID: On Tue, Jan 29, 2013 at 6:16 AM, denis wrote: > Folks, > the doc for `where` says "x and y need to have the same shape as > condition" > http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.where.html > But surely > "where is equivalent to: > [xv if c else yv for (c,xv,yv) in zip(condition,x,y)]" > holds as long as len(condition) == len(x) == len(y) ? > And `condition` can be broadcast ? > n = 3 > all01 = np.array([ t for t in np.ndindex( n * (2,) )]) # 000 001 ... > x = np.zeros(n) > y = np.ones(n) > w = np.where( all01, y, x ) # 2^n x n > > Can anyone please help me understand `where` > / extend "where is equivalent to ..." ? > Thanks, > cheers > -- denis > > Do keep in mind the difference between len() and shape (they aren't the same for 2 and greater dimension arrays). But, ultimately, yes, the arrays have to have the same shape, or use scalars. I haven't checked broadcast-ability though. Perhaps a note should be added into the documentation to explicitly say whether the arrays can be broadcastable. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Wed Jan 30 04:24:22 2013 From: toddrjen at gmail.com (Todd) Date: Wed, 30 Jan 2013 10:24:22 +0100 Subject: [Numpy-discussion] Subclassing ndarray with concatenate In-Reply-To: <1358858673.24631.20.camel@sebastian-laptop> References: <1358858673.24631.20.camel@sebastian-laptop> Message-ID: On Tue, Jan 22, 2013 at 1:44 PM, Sebastian Berg wrote: > Hey, > > On Tue, 2013-01-22 at 10:21 +0100, Todd wrote: > > > The main exception I have found is concatenate (and hstack/vstack, > > which just wrap concatenate). In this case, __array_finalize__ is > > passed an array that has already been stripped of the additional > > attributes, and I don't see a way to recover this information. > > > There are quite a few functions that simply do not preserve subclasses > (though I think more could/should call `__array_wrap__` probably, even > if the documentation may say that it is about ufuncs, there are some > example of this already). > `np.concatenate` is one of these. It always returns a base array. In any > case it gets a bit difficult if you have multiple input arrays (which > may not matter for you). > I don't think this is right. I tried it and it doesn't return a base array, it returns an instance of the original array subclass. > > > In my particular case at least, there are clear ways to handle corner > > cases (like being passed a class that lacks these attributes), so in > > principle there no problem handling concatenate in a general way, > > assuming I can get access to the attributes. > > > > > > So is there any way to subclass ndarray in such a way that concatenate > > can be handled properly? > > > Quite simply, no. If you compare masked arrays, they also provide their > own concatenate for this reason. > > I hope that helps a bit... > > Is this something that should be available? For instance a method that provides both the new array and the arrays that were used to construct it. This would seem to be an extremely common use-case for array subclasses, so letting them gracefully handle this would seem to be very important. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Wed Jan 30 05:20:39 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 30 Jan 2013 11:20:39 +0100 Subject: [Numpy-discussion] Subclassing ndarray with concatenate In-Reply-To: References: <1358858673.24631.20.camel@sebastian-laptop> Message-ID: <1359541239.2496.14.camel@sebastian-laptop> On Wed, 2013-01-30 at 10:24 +0100, Todd wrote: > On Tue, Jan 22, 2013 at 1:44 PM, Sebastian Berg > wrote: > Hey, > > On Tue, 2013-01-22 at 10:21 +0100, Todd wrote: > > > > The main exception I have found is concatenate (and > hstack/vstack, > > which just wrap concatenate). In this case, > __array_finalize__ is > > passed an array that has already been stripped of the > additional > > attributes, and I don't see a way to recover this > information. > > > > There are quite a few functions that simply do not preserve > subclasses > (though I think more could/should call `__array_wrap__` > probably, even > if the documentation may say that it is about ufuncs, there > are some > example of this already). > `np.concatenate` is one of these. It always returns a base > array. In any > case it gets a bit difficult if you have multiple input arrays > (which > may not matter for you). > > > > I don't think this is right. I tried it and it doesn't return a base > array, it returns an instance of the original array subclass. Yes you are right it preserves type, I was fooled by `__array_priority__` being 0 as default, thought it defaulted to more then 0 (for ufuncs everything beats arrays, not sure if it really should) but so I missed. In any case, yes, it calls __array_finalize__, but as you noticed, it calls it without the original array. Now it would be very easy and harmless to change that, however I am not sure if giving only the parent array is very useful (ie. you only get the one with highest array priority). Another way to get around it would be maybe to call __array_wrap__ like ufuncs do (with a context, so you get all inputs, but then the non-array axis argument may not be reasonably placed into the context). In any case, if you think it would be helpful to at least get the single parent array, that would be a very simple change, but I feel the whole subclassing could use a bit thinking and quite a bit of work probably, since I am not quite convinced that calling __array_wrap__ with a complicated context from as many functions as possible is the right approach for allowing more complex subclasses. > > > > In my particular case at least, there are clear ways to > handle corner > > cases (like being passed a class that lacks these > attributes), so in > > principle there no problem handling concatenate in a general > way, > > assuming I can get access to the attributes. > > > > > > So is there any way to subclass ndarray in such a way that > concatenate > > can be handled properly? > > > > Quite simply, no. If you compare masked arrays, they also > provide their > own concatenate for this reason. > > I hope that helps a bit... > > > > Is this something that should be available? For instance a method > that provides both the new array and the arrays that were used to > construct it. This would seem to be an extremely common use-case for > array subclasses, so letting them gracefully handle this would seem to > be very important. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From opossumnano at gmail.com Thu Jan 31 05:56:28 2013 From: opossumnano at gmail.com (Tiziano Zito) Date: Thu, 31 Jan 2013 11:56:28 +0100 (CET) Subject: [Numpy-discussion] =?utf-8?q?=5BANN=5D_Summer_School_=22Advanced_?= =?utf-8?q?Scientific_Programming_in_Python=22_in_Z=C3=BCrich=2C_Switzerla?= =?utf-8?q?nd?= Message-ID: <20130131105628.5223E12E00D8@comms.bccn-berlin.de> Advanced Scientific Programming in Python ========================================= a Summer School by the G-Node and the Physik-Institut, University of Zurich Scientists spend more and more time writing, maintaining, and debugging software. While techniques for doing this efficiently have evolved, only few scientists actually use them. As a result, instead of doing their research, they spend far too much time writing deficient code and reinventing the wheel. In this course we will present a selection of advanced programming techniques, incorporating theoretical lectures and practical exercises tailored to the needs of a programming scientist. New skills will be tested in a real programming project: we will team up to develop an entertaining scientific computer game. We use the Python programming language for the entire course. Python works as a simple programming language for beginners, but more importantly, it also works great in scientific simulations and data analysis. We show how clean language design, ease of extensibility, and the great wealth of open source libraries for scientific computing and data visualization are driving Python to become a standard tool for the programming scientist. This school is targeted at Master or PhD students and Post-docs from all areas of science. Competence in Python or in another language such as Java, C/C++, MATLAB, or Mathematica is absolutely required. Basic knowledge of Python is assumed. Participants without any prior experience with Python should work through the proposed introductory materials before the course. Date and Location ================= September 1?6, 2013. Z?rich, Switzerlandi. Preliminary Program =================== Day 0 (Sun Sept 1) ? Best Programming Practices - Best Practices, Development Methodologies and the Zen of Python - Version control with git - Object-oriented programming & design patterns Day 1 (Mon Sept 2) ? Software Carpentry - Test-driven development, unit testing & quality assurance - Debugging, profiling and benchmarking techniques - Best practices in data visualization - Programming in teams Day 2 (Tue Sept 3) ? Scientific Tools for Python - Advanced NumPy - The Quest for Speed (intro): Interfacing to C with Cython - Advanced Python I: idioms, useful built-in data structures, generators Day 3 (Wed Sept 4) ? The Quest for Speed - Writing parallel applications in Python - Programming project Day 4 (Thu Sept 5) ? Efficient Memory Management - When parallelization does not help: the starving CPUs problem - Advanced Python II: decorators and context managers - Programming project Day 5 (Fri Sept 6) ? Practical Software Development - Programming project - The Pelita Tournament Every evening we will have the tutors' consultation hour : Tutors will answer your questions and give suggestions for your own projects. Applications ============ You can apply on-line at http://python.g-node.org Applications must be submitted before 23:59 CEST, May 1, 2013. Notifications of acceptance will be sent by June 1, 2013. No fee is charged but participants should take care of travel, living, and accommodation expenses. Candidates will be selected on the basis of their profile. Places are limited: acceptance rate is usually around 20%. Prerequisites: You are supposed to know the basics of Python to participate in the lectures. You are encouraged to go through the introductory material available on the website. Faculty ======= - Francesc Alted, Continuum Analytics Inc., USA - Pietro Berkes, Enthought Inc., UK - Valentin Haenel, freelance developer and consultant, Berlin, Germany - Zbigniew J?drzejewski-Szmek, Krasnow Institute, George Mason University, USA - Eilif Muller, Blue Brain Project, ?cole Polytechnique F?d?rale de Lausanne, Switzerland - Emanuele Olivetti, NeuroInformatics Laboratory, Fondazione Bruno Kessler and University of Trento, Italy - Rike-Benjamin Schuppner, Technologit GbR, Germany - Bartosz Tele?czuk, Unit? de Neurosciences Information et Complexit?, CNRS, France - St?fan van der Walt, Applied Mathematics, Stellenbosch University, South Africa - Bastian Venthur, Berlin Institute of Technology and Bernstein Focus Neurotechnology, Germany - Niko Wilbert, TNG Technology Consulting GmbH, Germany - Tiziano Zito, Institute for Theoretical Biology, Humboldt-Universit?t zu Berlin, Germany Organized by Nicola Chiapolini and colleagues of the Physik-Institut, University of Zurich, and by Zbigniew J?drzejewski-Szmek and Tiziano Zito for the German Neuroinformatics Node of the INCF. Website: http://python.g-node.org Contact: python-info at g-node.org From oscar.villellas at continuum.io Thu Jan 31 11:43:23 2013 From: oscar.villellas at continuum.io (Oscar Villellas) Date: Thu, 31 Jan 2013 17:43:23 +0100 Subject: [Numpy-discussion] pull request: generalized ufunc signature fix and lineal algebra generalized ufuncs Message-ID: Hello, At Continuum Analytics we've been working on a submodule implementing a set of lineal algebra operations as generalized ufuncs. This allows specifying arrays of lineal algebra problems to be computed with a single Python call, allowing broadcasting as well. As the vectorization is handled in the kernel, this gives a speed edge on the operations. We think this could be useful to the community and we want to share the work done. I've created a couple of pull-requests: The first one contains a fix for a bug in the handling of certain signatures in the gufuncs. This was found while building the submodule. The fix was done by Mark Wiebe, so credit should go to him :). https://github.com/numpy/numpy/pull/2953 The second pull request contains the submodule itself and builds on top of the previous fix. It contains a rst file that explains the submodule, enumerates the functions implemented and details some implementation bits. The entry point to the module is in written in Python and contains detailed docstrings. https://github.com/numpy/numpy/pull/2954 We are open to discussion and to make improvements to the code if needed, in order to adapt to NumPy standards. Thanks, Oscar. From njs at pobox.com Thu Jan 31 14:44:05 2013 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 31 Jan 2013 11:44:05 -0800 Subject: [Numpy-discussion] pull request: generalized ufunc signature fix and lineal algebra generalized ufuncs In-Reply-To: References: Message-ID: On Thu, Jan 31, 2013 at 8:43 AM, Oscar Villellas wrote: > Hello, > > At Continuum Analytics we've been working on a submodule implementing > a set of lineal algebra operations as generalized ufuncs. This allows > specifying arrays of lineal algebra problems to be computed with a > single Python call, allowing broadcasting as well. As the > vectorization is handled in the kernel, this gives a speed edge on the > operations. We think this could be useful to the community and we want > to share the work done. It certainly does look useful. My question is -- why do we need two complete copies of the linear algebra routine interfaces? Can we just replace the existing linalg functions with these new implementations? Or if not, what prevents it? -n From robert.kern at gmail.com Thu Jan 31 15:35:22 2013 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 31 Jan 2013 20:35:22 +0000 Subject: [Numpy-discussion] pull request: generalized ufunc signature fix and lineal algebra generalized ufuncs In-Reply-To: References: Message-ID: On Thu, Jan 31, 2013 at 7:44 PM, Nathaniel Smith wrote: > On Thu, Jan 31, 2013 at 8:43 AM, Oscar Villellas > wrote: >> Hello, >> >> At Continuum Analytics we've been working on a submodule implementing >> a set of lineal algebra operations as generalized ufuncs. This allows >> specifying arrays of lineal algebra problems to be computed with a >> single Python call, allowing broadcasting as well. As the >> vectorization is handled in the kernel, this gives a speed edge on the >> operations. We think this could be useful to the community and we want >> to share the work done. > > It certainly does look useful. My question is -- why do we need two > complete copies of the linear algebra routine interfaces? Can we just > replace the existing linalg functions with these new implementations? > Or if not, what prevents it? The error reporting would have to be bodged back in. -- Robert Kern