From ralf.gommers at gmail.com Sun Dec 2 10:16:42 2012 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 2 Dec 2012 16:16:42 +0100 Subject: [Numpy-discussion] allclose changed behaviour in 1.6.2 ? In-Reply-To: <50B8692F.4050706@smhi.se> References: <50B8692F.4050706@smhi.se> Message-ID: On Fri, Nov 30, 2012 at 9:07 AM, Martin Raspaud wrote: > Hi, > > We noticed that comparing arrays of different shapes with allclose > doesn't work anymore in numpy 1.6.2. > > Is this a feature or a bug ? :) > I vote for feature. Allclose does element-wise comparison, so using different size non-broadcastable inputs is an error in user code. It should have raised ValueError in 1.6.1 also. Ralf > > See the output in both 1.6.1 and 1.6.2 at the end of this mail. > > Best regards, > Martin > > 1.6.1:: > > In [1]: import numpy as np > > In [2]: np.__version__ > Out[2]: '1.6.1' > > In [3]: a = np.array([1, 2, 3]) > > In [4]: b = np.array([1, 2, 3, 4]) > > In [5]: np.allclose(a, b) > Out[5]: False > > > 1.6.2:: > > In[1]: import numpy as np > > In[2]: np.__version__ > Out[2]: '1.6.2' > > In [3]: a = np.array([1, 2, 3]) > > In[4]: b = np.array([1, 2, 3, 4]) > > In[5]: np.allclose(a, b) > Traceback (most recent call last): > File "", line 1, in > File > > "/home/maarten/pytroll/local/lib/python2.7/site-packages/numpy-1.6.2-py2.7-linux-x86_64.egg/numpy/core/numeric.py", > line 1936, in allclose > return all(less_equal(abs(x-y), atol + rtol * abs(y))) > ValueError: operands could not be broadcast together with shapes (3) (4) > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Dec 2 11:11:27 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 2 Dec 2012 16:11:27 +0000 Subject: [Numpy-discussion] allclose changed behaviour in 1.6.2 ? In-Reply-To: References: <50B8692F.4050706@smhi.se> Message-ID: On Sun, Dec 2, 2012 at 3:16 PM, Ralf Gommers wrote: > > On Fri, Nov 30, 2012 at 9:07 AM, Martin Raspaud > wrote: >> >> Hi, >> >> We noticed that comparing arrays of different shapes with allclose >> doesn't work anymore in numpy 1.6.2. >> >> Is this a feature or a bug ? :) > > > I vote for feature. Allclose does element-wise comparison, so using > different size non-broadcastable inputs is an error in user code. It should > have raised ValueError in 1.6.1 also. I think I agree... in retrospect maybe we should have left the change for 1.7 rather than 1.6.2, but it's too late to do much about that, at least for this particular issue. -n From charlesr.harris at gmail.com Sun Dec 2 16:07:21 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 2 Dec 2012 14:07:21 -0700 Subject: [Numpy-discussion] Euler-Mascheroni constant Message-ID: Hi All, I put in a PR to expose the Euler-Mascheroni constant as 'euler_gamma'. The name is open to discussion. Suggestions for alternatives welcome. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From raul at virtualmaterials.com Sun Dec 2 20:28:24 2012 From: raul at virtualmaterials.com (Raul Cota) Date: Sun, 02 Dec 2012 18:28:24 -0700 Subject: [Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement Message-ID: <50BC0038.70105@virtualmaterials.com> Hello, First a quick summary of my problem and at the end I include the basic changes I am suggesting to the source (they may benefit others) I am ages behind in times and I am still using Numeric in Python 2.2.3. The main reason why it has taken so long to upgrade is because NumPy kills performance on several of my tests. I am sorry if this topic has been discussed before. I tried parsing the mailing list and also google and all I found were comments related to the fact that such is life when you use NumPy for small arrays. In my case I have several thousands of lines of code where data structures rely heavily on Numeric arrays but it is unpredictable if the problem at hand will result in large or small arrays. Furthermore, once the vectorized operations complete, the values could be assigned into scalars and just do simple math or loops. I am fairly sure the core of my problems is that the 'float64' objects start propagating all over the program data structures (not in arrays) and they are considerably slower for just about everything when compared to the native python float. Conclusion, it is not practical for me to do a massive re-structuring of code to improve speed on simple things like "a[0] < 4" (assuming "a" is an array) which is about 10 times slower than "b < 4" (assuming "b" is a float) I finally decided to track down the problem and I started by getting Python 2.6 from source and profiling it in one of my cases. By far the biggest bottleneck came out to be PyString_FromFormatV which is a function to assemble a string for a Python error caused by a failure to find an attribute when "multiarray" calls PyObject_GetAttrString. This function seems to get called way too often from NumPy. The real bottleneck of trying to find the attribute when it does not exist is not that it fails to find it, but that it builds a string to set a Python error. In other words, something as simple as "a[0] < 3.5" internally result in a call to set a python error . I downloaded NumPy code (for Python 2.6) and tracked down all the calls like this, ret = PyObject_GetAttrString(obj, "__array_priority__"); and changed to if (PyList_CheckExact(obj) || (Py_None == obj) || PyTuple_CheckExact(obj) || PyFloat_CheckExact(obj) || PyInt_CheckExact(obj) || PyString_CheckExact(obj) || PyUnicode_CheckExact(obj)){ //Avoid expensive calls when I am sure the attribute //does not exist ret = NULL; } else{ ret = PyObject_GetAttrString(obj, "__array_priority__"); ( I think I found about 7 spots ) I also noticed (not as bad in my case) that calls to PyObject_GetBuffer also resulted in Python errors being set thus unnecessarily slower code. With this change, something like this, for i in xrange(1000000): if a[1] < 35.0: pass went down from 0.8 seconds to 0.38 seconds. A bogus test like this, for i in xrange(1000000): a = array([1., 2., 3.]) went down from 8.5 seconds to 2.5 seconds. Altogether, these simple changes got me half way to the speed I used to get in Numeric and I could not see any slow down in any of my cases that benefit from heavy array manipulation. I am out of ideas on how to improve further though. Few questions: - Is there any interest for me to provide the exact details of the code I changed ? - I managed to compile NumPy through setup.py but I am not sure how to force it to generate pdb files from my Visual Studio Compiler. I need the pdb files such that I can run my profiler on NumPy. Anybody has any experience with this ? (Visual Studio) - The core of my problems I think boil down to things like this s = a[0] assigning a float64 into s as opposed to a native float ? Is there any way to hack code to change it to extract a native float instead ? (probably crazy talk, but I thought I'd ask :) ). I'd prefer to not use s = a.item(0) because I would have to change too much code and it is not even that much faster. For example, for i in xrange(1000000): if a.item(1) < 35.0: pass is 0.23 seconds (as opposed to 0.38 seconds with my suggested changes) I apologize again if this topic has already been discussed. Regards, Raul From cgohlke at uci.edu Sun Dec 2 21:33:33 2012 From: cgohlke at uci.edu (Christoph Gohlke) Date: Sun, 02 Dec 2012 18:33:33 -0800 Subject: [Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement In-Reply-To: <50BC0038.70105@virtualmaterials.com> References: <50BC0038.70105@virtualmaterials.com> Message-ID: <50BC0F7D.8020602@uci.edu> On 12/2/2012 5:28 PM, Raul Cota wrote: > Hello, > > First a quick summary of my problem and at the end I include the basic > changes I am suggesting to the source (they may benefit others) > > I am ages behind in times and I am still using Numeric in Python 2.2.3. > The main reason why it has taken so long to upgrade is because NumPy > kills performance on several of my tests. > > I am sorry if this topic has been discussed before. I tried parsing the > mailing list and also google and all I found were comments related to > the fact that such is life when you use NumPy for small arrays. > > In my case I have several thousands of lines of code where data > structures rely heavily on Numeric arrays but it is unpredictable if the > problem at hand will result in large or small arrays. Furthermore, once > the vectorized operations complete, the values could be assigned into > scalars and just do simple math or loops. I am fairly sure the core of > my problems is that the 'float64' objects start propagating all over the > program data structures (not in arrays) and they are considerably slower > for just about everything when compared to the native python float. > > Conclusion, it is not practical for me to do a massive re-structuring of > code to improve speed on simple things like "a[0] < 4" (assuming "a" is > an array) which is about 10 times slower than "b < 4" (assuming "b" is a > float) > > > I finally decided to track down the problem and I started by getting > Python 2.6 from source and profiling it in one of my cases. By far the > biggest bottleneck came out to be PyString_FromFormatV which is a > function to assemble a string for a Python error caused by a failure to > find an attribute when "multiarray" calls PyObject_GetAttrString. This > function seems to get called way too often from NumPy. The real > bottleneck of trying to find the attribute when it does not exist is not > that it fails to find it, but that it builds a string to set a Python > error. In other words, something as simple as "a[0] < 3.5" internally > result in a call to set a python error . > > I downloaded NumPy code (for Python 2.6) and tracked down all the calls > like this, > > ret = PyObject_GetAttrString(obj, "__array_priority__"); > > and changed to > if (PyList_CheckExact(obj) || (Py_None == obj) || > PyTuple_CheckExact(obj) || > PyFloat_CheckExact(obj) || > PyInt_CheckExact(obj) || > PyString_CheckExact(obj) || > PyUnicode_CheckExact(obj)){ > //Avoid expensive calls when I am sure the attribute > //does not exist > ret = NULL; > } > else{ > ret = PyObject_GetAttrString(obj, "__array_priority__"); > > > > ( I think I found about 7 spots ) > > > I also noticed (not as bad in my case) that calls to PyObject_GetBuffer > also resulted in Python errors being set thus unnecessarily slower code. > > > With this change, something like this, > for i in xrange(1000000): > if a[1] < 35.0: > pass > > went down from 0.8 seconds to 0.38 seconds. > > A bogus test like this, > for i in xrange(1000000): > a = array([1., 2., 3.]) > > went down from 8.5 seconds to 2.5 seconds. > > > > Altogether, these simple changes got me half way to the speed I used to > get in Numeric and I could not see any slow down in any of my cases that > benefit from heavy array manipulation. I am out of ideas on how to > improve further though. > > Few questions: > - Is there any interest for me to provide the exact details of the code > I changed ? > > - I managed to compile NumPy through setup.py but I am not sure how to > force it to generate pdb files from my Visual Studio Compiler. I need > the pdb files such that I can run my profiler on NumPy. Anybody has any > experience with this ? (Visual Studio) Change the compiler and linker flags in Python\Lib\distutils\msvc9compiler.py to: self.compile_options = ['/nologo', '/Ox', '/MD', '/W3', '/DNDEBUG', '/Zi'] self.ldflags_shared = ['/DLL', '/nologo', '/INCREMENTAL:YES', '/DEBUG'] Then rebuild numpy. Christoph > > - The core of my problems I think boil down to things like this > s = a[0] > assigning a float64 into s as opposed to a native float ? > Is there any way to hack code to change it to extract a native float > instead ? (probably crazy talk, but I thought I'd ask :) ). > I'd prefer to not use s = a.item(0) because I would have to change too > much code and it is not even that much faster. For example, > for i in xrange(1000000): > if a.item(1) < 35.0: > pass > is 0.23 seconds (as opposed to 0.38 seconds with my suggested changes) > > > I apologize again if this topic has already been discussed. > > > Regards, > > Raul > > From travis at continuum.io Sun Dec 2 22:31:28 2012 From: travis at continuum.io (Travis Oliphant) Date: Sun, 2 Dec 2012 21:31:28 -0600 Subject: [Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement In-Reply-To: <50BC0038.70105@virtualmaterials.com> References: <50BC0038.70105@virtualmaterials.com> Message-ID: <4F8FB56A-5C63-436D-8EFE-359C7BB70203@continuum.io> Raul, This is *fantastic work*. While many optimizations were done 6 years ago as people started to convert their code, that kind of report has trailed off in the last few years. I have not seen this kind of speed-comparison for some time --- but I think it's definitely beneficial. NumPy still has quite a bit that can be optimized. I think your example is really great. Perhaps it's worth making a C-API macro out of the short-cut to the attribute string so it can be used by others. It would be interesting to see where your other slow-downs are. I would be interested to see if the slow-math of float64 is hurting you. It would be possible, for example, to do a simple subclass of the ndarray that overloads a[] to be the same as array.item(). The latter syntax returns python objects (i.e. floats) instead of array scalars. Also, it would not be too difficult to add fast-math paths for int64, float32, and float64 scalars (so they don't go through ufuncs but do scalar-math like the float and int objects in Python. A related thing we've been working on lately which might help you is Numba which might help speed up functions that have code like: "a[0] < 4" : http://numba.pydata.org. Numba will translate the expression a[0] < 4 to a machine-code address-lookup and math operation which is *much* faster when a is a NumPy array. Presently this requires you to wrap your function call in a decorator: from numba import autojit @autojit def function_to_speed_up(...): pass In the near future (2-4 weeks), numba will grow the experimental ability to basically replace all your function calls with @autojit versions in a Python function. I would love to see something like this work: python -m numba filename.py To get an effective autojit on all the filename.py functions (and optionally on all python modules it imports). The autojit works out of the box today --- you can get Numba from PyPI (or inside of the completely free Anaconda CE) to try it out. Best, -Travis On Dec 2, 2012, at 7:28 PM, Raul Cota wrote: > Hello, > > First a quick summary of my problem and at the end I include the basic > changes I am suggesting to the source (they may benefit others) > > I am ages behind in times and I am still using Numeric in Python 2.2.3. > The main reason why it has taken so long to upgrade is because NumPy > kills performance on several of my tests. > > I am sorry if this topic has been discussed before. I tried parsing the > mailing list and also google and all I found were comments related to > the fact that such is life when you use NumPy for small arrays. > > In my case I have several thousands of lines of code where data > structures rely heavily on Numeric arrays but it is unpredictable if the > problem at hand will result in large or small arrays. Furthermore, once > the vectorized operations complete, the values could be assigned into > scalars and just do simple math or loops. I am fairly sure the core of > my problems is that the 'float64' objects start propagating all over the > program data structures (not in arrays) and they are considerably slower > for just about everything when compared to the native python float. > > Conclusion, it is not practical for me to do a massive re-structuring of > code to improve speed on simple things like "a[0] < 4" (assuming "a" is > an array) which is about 10 times slower than "b < 4" (assuming "b" is a > float) > > > I finally decided to track down the problem and I started by getting > Python 2.6 from source and profiling it in one of my cases. By far the > biggest bottleneck came out to be PyString_FromFormatV which is a > function to assemble a string for a Python error caused by a failure to > find an attribute when "multiarray" calls PyObject_GetAttrString. This > function seems to get called way too often from NumPy. The real > bottleneck of trying to find the attribute when it does not exist is not > that it fails to find it, but that it builds a string to set a Python > error. In other words, something as simple as "a[0] < 3.5" internally > result in a call to set a python error . > > I downloaded NumPy code (for Python 2.6) and tracked down all the calls > like this, > > ret = PyObject_GetAttrString(obj, "__array_priority__"); > > and changed to > if (PyList_CheckExact(obj) || (Py_None == obj) || > PyTuple_CheckExact(obj) || > PyFloat_CheckExact(obj) || > PyInt_CheckExact(obj) || > PyString_CheckExact(obj) || > PyUnicode_CheckExact(obj)){ > //Avoid expensive calls when I am sure the attribute > //does not exist > ret = NULL; > } > else{ > ret = PyObject_GetAttrString(obj, "__array_priority__"); > > > > ( I think I found about 7 spots ) > > > I also noticed (not as bad in my case) that calls to PyObject_GetBuffer > also resulted in Python errors being set thus unnecessarily slower code. > > > With this change, something like this, > for i in xrange(1000000): > if a[1] < 35.0: > pass > > went down from 0.8 seconds to 0.38 seconds. > > A bogus test like this, > for i in xrange(1000000): > a = array([1., 2., 3.]) > > went down from 8.5 seconds to 2.5 seconds. > > > > Altogether, these simple changes got me half way to the speed I used to > get in Numeric and I could not see any slow down in any of my cases that > benefit from heavy array manipulation. I am out of ideas on how to > improve further though. > > Few questions: > - Is there any interest for me to provide the exact details of the code > I changed ? > > - I managed to compile NumPy through setup.py but I am not sure how to > force it to generate pdb files from my Visual Studio Compiler. I need > the pdb files such that I can run my profiler on NumPy. Anybody has any > experience with this ? (Visual Studio) > > - The core of my problems I think boil down to things like this > s = a[0] > assigning a float64 into s as opposed to a native float ? > Is there any way to hack code to change it to extract a native float > instead ? (probably crazy talk, but I thought I'd ask :) ). > I'd prefer to not use s = a.item(0) because I would have to change too > much code and it is not even that much faster. For example, > for i in xrange(1000000): > if a.item(1) < 35.0: > pass > is 0.23 seconds (as opposed to 0.38 seconds with my suggested changes) > > > I apologize again if this topic has already been discussed. > > > Regards, > > Raul > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Mon Dec 3 06:14:13 2012 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 3 Dec 2012 11:14:13 +0000 Subject: [Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement In-Reply-To: <50BC0038.70105@virtualmaterials.com> References: <50BC0038.70105@virtualmaterials.com> Message-ID: On Mon, Dec 3, 2012 at 1:28 AM, Raul Cota wrote: > I finally decided to track down the problem and I started by getting > Python 2.6 from source and profiling it in one of my cases. By far the > biggest bottleneck came out to be PyString_FromFormatV which is a > function to assemble a string for a Python error caused by a failure to > find an attribute when "multiarray" calls PyObject_GetAttrString. This > function seems to get called way too often from NumPy. The real > bottleneck of trying to find the attribute when it does not exist is not > that it fails to find it, but that it builds a string to set a Python > error. In other words, something as simple as "a[0] < 3.5" internally > result in a call to set a python error . > > I downloaded NumPy code (for Python 2.6) and tracked down all the calls > like this, > > ret = PyObject_GetAttrString(obj, "__array_priority__"); > > and changed to > if (PyList_CheckExact(obj) || (Py_None == obj) || > PyTuple_CheckExact(obj) || > PyFloat_CheckExact(obj) || > PyInt_CheckExact(obj) || > PyString_CheckExact(obj) || > PyUnicode_CheckExact(obj)){ > //Avoid expensive calls when I am sure the attribute > //does not exist > ret = NULL; > } > else{ > ret = PyObject_GetAttrString(obj, "__array_priority__"); > > ( I think I found about 7 spots ) If the problem is the exception construction, then maybe this would work about as well? if (PyObject_HasAttrString(obj, "__array_priority__") { ret = PyObject_GetAttrString(obj, "__array_priority__"); } else { ret = NULL; } If so then it would be an easier and more reliable way to accomplish this. > I also noticed (not as bad in my case) that calls to PyObject_GetBuffer > also resulted in Python errors being set thus unnecessarily slower code. > > With this change, something like this, > for i in xrange(1000000): > if a[1] < 35.0: > pass > > went down from 0.8 seconds to 0.38 seconds. Huh, why is PyObject_GetBuffer even getting called in this case? > A bogus test like this, > for i in xrange(1000000): > a = array([1., 2., 3.]) > > went down from 8.5 seconds to 2.5 seconds. I can see why we'd call PyObject_GetBuffer in this case, but not why it would take 2/3rds of the total run-time... > - The core of my problems I think boil down to things like this > s = a[0] > assigning a float64 into s as opposed to a native float ? > Is there any way to hack code to change it to extract a native float > instead ? (probably crazy talk, but I thought I'd ask :) ). > I'd prefer to not use s = a.item(0) because I would have to change too > much code and it is not even that much faster. For example, > for i in xrange(1000000): > if a.item(1) < 35.0: > pass > is 0.23 seconds (as opposed to 0.38 seconds with my suggested changes) I'm confused here -- first you say that your problems would be fixed if a[0] gave you a native float, but then you say that a.item(0) (which is basically a[0] that gives a native float) is still too slow? (OTOH at 40% speedup is pretty good, even if it is just a microbenchmark :-).) Array scalars are definitely pretty slow: In [9]: timeit a[0] 1000000 loops, best of 3: 151 ns per loop In [10]: timeit a.item(0) 10000000 loops, best of 3: 169 ns per loop In [11]: timeit a[0] < 35.0 1000000 loops, best of 3: 989 ns per loop In [12]: timeit a.item(0) < 35.0 1000000 loops, best of 3: 233 ns per loop It is probably possible to make numpy scalars faster... I'm not even sure why they go through the ufunc machinery, like Travis said, since they don't even follow the ufunc rules: In [3]: np.array(2) * [1, 2, 3] # 0-dim array coerces and broadcasts Out[3]: array([2, 4, 6]) In [4]: np.array(2)[()] * [1, 2, 3] # scalar acts like python integer Out[4]: [1, 2, 3, 1, 2, 3] But you may want to experiment a bit more to make sure this is actually the problem. IME guesses about speed problems are almost always wrong (even when I take this rule into account and only guess when I'm *really* sure). -n From josef.pktd at gmail.com Mon Dec 3 08:56:55 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 3 Dec 2012 08:56:55 -0500 Subject: [Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement In-Reply-To: References: <50BC0038.70105@virtualmaterials.com> Message-ID: On Mon, Dec 3, 2012 at 6:14 AM, Nathaniel Smith wrote: > On Mon, Dec 3, 2012 at 1:28 AM, Raul Cota wrote: >> I finally decided to track down the problem and I started by getting >> Python 2.6 from source and profiling it in one of my cases. By far the >> biggest bottleneck came out to be PyString_FromFormatV which is a >> function to assemble a string for a Python error caused by a failure to >> find an attribute when "multiarray" calls PyObject_GetAttrString. This >> function seems to get called way too often from NumPy. The real >> bottleneck of trying to find the attribute when it does not exist is not >> that it fails to find it, but that it builds a string to set a Python >> error. In other words, something as simple as "a[0] < 3.5" internally >> result in a call to set a python error . >> >> I downloaded NumPy code (for Python 2.6) and tracked down all the calls >> like this, >> >> ret = PyObject_GetAttrString(obj, "__array_priority__"); >> >> and changed to >> if (PyList_CheckExact(obj) || (Py_None == obj) || >> PyTuple_CheckExact(obj) || >> PyFloat_CheckExact(obj) || >> PyInt_CheckExact(obj) || >> PyString_CheckExact(obj) || >> PyUnicode_CheckExact(obj)){ >> //Avoid expensive calls when I am sure the attribute >> //does not exist >> ret = NULL; >> } >> else{ >> ret = PyObject_GetAttrString(obj, "__array_priority__"); >> >> ( I think I found about 7 spots ) > > If the problem is the exception construction, then maybe this would > work about as well? > > if (PyObject_HasAttrString(obj, "__array_priority__") { > ret = PyObject_GetAttrString(obj, "__array_priority__"); > } else { > ret = NULL; > } > > If so then it would be an easier and more reliable way to accomplish this. > >> I also noticed (not as bad in my case) that calls to PyObject_GetBuffer >> also resulted in Python errors being set thus unnecessarily slower code. >> >> With this change, something like this, >> for i in xrange(1000000): >> if a[1] < 35.0: >> pass >> >> went down from 0.8 seconds to 0.38 seconds. > > Huh, why is PyObject_GetBuffer even getting called in this case? > >> A bogus test like this, >> for i in xrange(1000000): >> a = array([1., 2., 3.]) >> >> went down from 8.5 seconds to 2.5 seconds. > > I can see why we'd call PyObject_GetBuffer in this case, but not why > it would take 2/3rds of the total run-time... > >> - The core of my problems I think boil down to things like this >> s = a[0] >> assigning a float64 into s as opposed to a native float ? >> Is there any way to hack code to change it to extract a native float >> instead ? (probably crazy talk, but I thought I'd ask :) ). >> I'd prefer to not use s = a.item(0) because I would have to change too >> much code and it is not even that much faster. For example, >> for i in xrange(1000000): >> if a.item(1) < 35.0: >> pass >> is 0.23 seconds (as opposed to 0.38 seconds with my suggested changes) > > I'm confused here -- first you say that your problems would be fixed > if a[0] gave you a native float, but then you say that a.item(0) > (which is basically a[0] that gives a native float) is still too slow? > (OTOH at 40% speedup is pretty good, even if it is just a > microbenchmark :-).) Array scalars are definitely pretty slow: > > In [9]: timeit a[0] > 1000000 loops, best of 3: 151 ns per loop > > In [10]: timeit a.item(0) > 10000000 loops, best of 3: 169 ns per loop > > In [11]: timeit a[0] < 35.0 > 1000000 loops, best of 3: 989 ns per loop > > In [12]: timeit a.item(0) < 35.0 > 1000000 loops, best of 3: 233 ns per loop > > It is probably possible to make numpy scalars faster... I'm not even > sure why they go through the ufunc machinery, like Travis said, since > they don't even follow the ufunc rules: > > In [3]: np.array(2) * [1, 2, 3] # 0-dim array coerces and broadcasts > Out[3]: array([2, 4, 6]) > > In [4]: np.array(2)[()] * [1, 2, 3] # scalar acts like python integer > Out[4]: [1, 2, 3, 1, 2, 3] I thought it still behaves like a numpy "animal" >>> np.array(-2)[()] ** [1, 2, 3] array([-2, 4, -8]) >>> np.array(-2)[()] ** 0.5 nan >>> np.array(-2).item() ** [1, 2, 3] Traceback (most recent call last): File "", line 1, in TypeError: unsupported operand type(s) for ** or pow(): 'int' and 'list' >>> np.array(-2).item() ** 0.5 Traceback (most recent call last): File "", line 1, in ValueError: negative number cannot be raised to a fractional power >>> np.array(0)[()] ** (-1) inf >>> np.array(0).item() ** (-1) Traceback (most recent call last): File "", line 1, in ZeroDivisionError: 0.0 cannot be raised to a negative power and similar I often try to avoid python scalars to avoid "surprising" behavior, and try to work defensively or fixed bugs by switching to np.power(...) (for example in the distributions). Josef > > But you may want to experiment a bit more to make sure this is > actually the problem. IME guesses about speed problems are almost > always wrong (even when I take this rule into account and only guess > when I'm *really* sure). > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From josef.pktd at gmail.com Mon Dec 3 10:14:44 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 3 Dec 2012 10:14:44 -0500 Subject: [Numpy-discussion] scalars and strange casting Message-ID: A followup on the previous thread on scalar speed. operations with numpy scalars I can *maybe* understand this >>> np.array(2)[()] * [0.5, 1] [0.5, 1, 0.5, 1] but don't understand this >>> np.array(2.+0.1j)[()] * [0.5, 1] __main__:1: ComplexWarning: Casting complex values to real discards the imaginary part [0.5, 1, 0.5, 1] The difference in behavior compared to the other operators, +,-, /,**, looks, at least, like an inconsistency to me. Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> np.array(2.+0.1j)[()] * [0.5, 1] __main__:1: ComplexWarning: Casting complex values to real discards the imaginary part [0.5, 1, 0.5, 1] >>> np.array(2.+0.1j)[()] ** [0.5, 1] array([ 1.41465516+0.0353443j, 2.00000000+0.1j ]) >>> np.array(2.+0.1j)[()] + [0.5, 1] array([ 2.5+0.1j, 3.0+0.1j]) >>> np.array(2.+0.1j)[()] / [0.5, 1] array([ 4.+0.2j, 2.+0.1j]) >>> np.array(2)[()] * [0.5, 1] [0.5, 1, 0.5, 1] >>> np.array(2)[()] / [0.5, 1] array([ 4., 2.]) >>> np.array(2)[()] ** [0.5, 1] array([ 1.41421356, 2. ]) >>> np.array(2)[()] - [0.5, 1] array([ 1.5, 1. ]) >>> np.__version__ '1.5.1' or >>> np.array(-2.+0.1j)[()] * [0.5, 1] [] >>> np.multiply(np.array(-2.+0.1j)[()], [0.5, 1]) array([-1.+0.05j, -2.+0.1j ]) >>> np.array([-2.+0.1j])[0] * [0.5, 1] [] >>> np.multiply(np.array([-2.+0.1j])[0], [0.5, 1]) array([-1.+0.05j, -2.+0.1j ]) Josef defensive programming = don't use python, use numpy arrays, or at least remember which kind of animals you have From raul at virtualmaterials.com Mon Dec 3 10:33:23 2012 From: raul at virtualmaterials.com (Raul Cota) Date: Mon, 03 Dec 2012 08:33:23 -0700 Subject: [Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement In-Reply-To: <50BC0F7D.8020602@uci.edu> References: <50BC0038.70105@virtualmaterials.com> <50BC0F7D.8020602@uci.edu> Message-ID: <50BCC643.7000601@virtualmaterials.com> Thanks Christoph. It seemed to work. Will do profile runs today/tomorrow and see what come out. Raul On 02/12/2012 7:33 PM, Christoph Gohlke wrote: > On 12/2/2012 5:28 PM, Raul Cota wrote: >> Hello, >> >> First a quick summary of my problem and at the end I include the basic >> changes I am suggesting to the source (they may benefit others) >> >> I am ages behind in times and I am still using Numeric in Python 2.2.3. >> The main reason why it has taken so long to upgrade is because NumPy >> kills performance on several of my tests. >> >> I am sorry if this topic has been discussed before. I tried parsing the >> mailing list and also google and all I found were comments related to >> the fact that such is life when you use NumPy for small arrays. >> >> In my case I have several thousands of lines of code where data >> structures rely heavily on Numeric arrays but it is unpredictable if the >> problem at hand will result in large or small arrays. Furthermore, once >> the vectorized operations complete, the values could be assigned into >> scalars and just do simple math or loops. I am fairly sure the core of >> my problems is that the 'float64' objects start propagating all over the >> program data structures (not in arrays) and they are considerably slower >> for just about everything when compared to the native python float. >> >> Conclusion, it is not practical for me to do a massive re-structuring of >> code to improve speed on simple things like "a[0] < 4" (assuming "a" is >> an array) which is about 10 times slower than "b < 4" (assuming "b" is a >> float) >> >> >> I finally decided to track down the problem and I started by getting >> Python 2.6 from source and profiling it in one of my cases. By far the >> biggest bottleneck came out to be PyString_FromFormatV which is a >> function to assemble a string for a Python error caused by a failure to >> find an attribute when "multiarray" calls PyObject_GetAttrString. This >> function seems to get called way too often from NumPy. The real >> bottleneck of trying to find the attribute when it does not exist is not >> that it fails to find it, but that it builds a string to set a Python >> error. In other words, something as simple as "a[0] < 3.5" internally >> result in a call to set a python error . >> >> I downloaded NumPy code (for Python 2.6) and tracked down all the calls >> like this, >> >> ret = PyObject_GetAttrString(obj, "__array_priority__"); >> >> and changed to >> if (PyList_CheckExact(obj) || (Py_None == obj) || >> PyTuple_CheckExact(obj) || >> PyFloat_CheckExact(obj) || >> PyInt_CheckExact(obj) || >> PyString_CheckExact(obj) || >> PyUnicode_CheckExact(obj)){ >> //Avoid expensive calls when I am sure the attribute >> //does not exist >> ret = NULL; >> } >> else{ >> ret = PyObject_GetAttrString(obj, "__array_priority__"); >> >> >> >> ( I think I found about 7 spots ) >> >> >> I also noticed (not as bad in my case) that calls to PyObject_GetBuffer >> also resulted in Python errors being set thus unnecessarily slower code. >> >> >> With this change, something like this, >> for i in xrange(1000000): >> if a[1] < 35.0: >> pass >> >> went down from 0.8 seconds to 0.38 seconds. >> >> A bogus test like this, >> for i in xrange(1000000): >> a = array([1., 2., 3.]) >> >> went down from 8.5 seconds to 2.5 seconds. >> >> >> >> Altogether, these simple changes got me half way to the speed I used to >> get in Numeric and I could not see any slow down in any of my cases that >> benefit from heavy array manipulation. I am out of ideas on how to >> improve further though. >> >> Few questions: >> - Is there any interest for me to provide the exact details of the code >> I changed ? >> >> - I managed to compile NumPy through setup.py but I am not sure how to >> force it to generate pdb files from my Visual Studio Compiler. I need >> the pdb files such that I can run my profiler on NumPy. Anybody has any >> experience with this ? (Visual Studio) > > Change the compiler and linker flags in > Python\Lib\distutils\msvc9compiler.py to: > > self.compile_options = ['/nologo', '/Ox', '/MD', '/W3', '/DNDEBUG', '/Zi'] > self.ldflags_shared = ['/DLL', '/nologo', '/INCREMENTAL:YES', '/DEBUG'] > > Then rebuild numpy. > > Christoph > > > >> - The core of my problems I think boil down to things like this >> s = a[0] >> assigning a float64 into s as opposed to a native float ? >> Is there any way to hack code to change it to extract a native float >> instead ? (probably crazy talk, but I thought I'd ask :) ). >> I'd prefer to not use s = a.item(0) because I would have to change too >> much code and it is not even that much faster. For example, >> for i in xrange(1000000): >> if a.item(1) < 35.0: >> pass >> is 0.23 seconds (as opposed to 0.38 seconds with my suggested changes) >> >> >> I apologize again if this topic has already been discussed. >> >> >> Regards, >> >> Raul >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From raul at virtualmaterials.com Mon Dec 3 10:35:58 2012 From: raul at virtualmaterials.com (Raul Cota) Date: Mon, 03 Dec 2012 08:35:58 -0700 Subject: [Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement In-Reply-To: <4F8FB56A-5C63-436D-8EFE-359C7BB70203@continuum.io> References: <50BC0038.70105@virtualmaterials.com> <4F8FB56A-5C63-436D-8EFE-359C7BB70203@continuum.io> Message-ID: <50BCC6DE.3090901@virtualmaterials.com> On 02/12/2012 8:31 PM, Travis Oliphant wrote: > Raul, > > This is *fantastic work*. While many optimizations were done 6 years ago as people started to convert their code, that kind of report has trailed off in the last few years. I have not seen this kind of speed-comparison for some time --- but I think it's definitely beneficial. I'll clean up a bit as a Macro and comment. > NumPy still has quite a bit that can be optimized. I think your example is really great. Perhaps it's worth making a C-API macro out of the short-cut to the attribute string so it can be used by others. It would be interesting to see where your other slow-downs are. I would be interested to see if the slow-math of float64 is hurting you. It would be possible, for example, to do a simple subclass of the ndarray that overloads a[] to be the same as array.item(). The latter syntax returns python objects (i.e. floats) instead of array scalars. > > Also, it would not be too difficult to add fast-math paths for int64, float32, and float64 scalars (so they don't go through ufuncs but do scalar-math like the float and int objects in Python. Thanks. I'll dig a bit more into the code. > > A related thing we've been working on lately which might help you is Numba which might help speed up functions that have code like: "a[0] < 4" : http://numba.pydata.org. > > Numba will translate the expression a[0] < 4 to a machine-code address-lookup and math operation which is *much* faster when a is a NumPy array. Presently this requires you to wrap your function call in a decorator: > > from numba import autojit > > @autojit > def function_to_speed_up(...): > pass > > In the near future (2-4 weeks), numba will grow the experimental ability to basically replace all your function calls with @autojit versions in a Python function. I would love to see something like this work: > > python -m numba filename.py > > To get an effective autojit on all the filename.py functions (and optionally on all python modules it imports). The autojit works out of the box today --- you can get Numba from PyPI (or inside of the completely free Anaconda CE) to try it out. This looks very interesting. Will check it out. > Best, > > -Travis > > > > > On Dec 2, 2012, at 7:28 PM, Raul Cota wrote: > >> Hello, >> >> First a quick summary of my problem and at the end I include the basic >> changes I am suggesting to the source (they may benefit others) >> >> I am ages behind in times and I am still using Numeric in Python 2.2.3. >> The main reason why it has taken so long to upgrade is because NumPy >> kills performance on several of my tests. >> >> I am sorry if this topic has been discussed before. I tried parsing the >> mailing list and also google and all I found were comments related to >> the fact that such is life when you use NumPy for small arrays. >> >> In my case I have several thousands of lines of code where data >> structures rely heavily on Numeric arrays but it is unpredictable if the >> problem at hand will result in large or small arrays. Furthermore, once >> the vectorized operations complete, the values could be assigned into >> scalars and just do simple math or loops. I am fairly sure the core of >> my problems is that the 'float64' objects start propagating all over the >> program data structures (not in arrays) and they are considerably slower >> for just about everything when compared to the native python float. >> >> Conclusion, it is not practical for me to do a massive re-structuring of >> code to improve speed on simple things like "a[0] < 4" (assuming "a" is >> an array) which is about 10 times slower than "b < 4" (assuming "b" is a >> float) >> >> >> I finally decided to track down the problem and I started by getting >> Python 2.6 from source and profiling it in one of my cases. By far the >> biggest bottleneck came out to be PyString_FromFormatV which is a >> function to assemble a string for a Python error caused by a failure to >> find an attribute when "multiarray" calls PyObject_GetAttrString. This >> function seems to get called way too often from NumPy. The real >> bottleneck of trying to find the attribute when it does not exist is not >> that it fails to find it, but that it builds a string to set a Python >> error. In other words, something as simple as "a[0] < 3.5" internally >> result in a call to set a python error . >> >> I downloaded NumPy code (for Python 2.6) and tracked down all the calls >> like this, >> >> ret = PyObject_GetAttrString(obj, "__array_priority__"); >> >> and changed to >> if (PyList_CheckExact(obj) || (Py_None == obj) || >> PyTuple_CheckExact(obj) || >> PyFloat_CheckExact(obj) || >> PyInt_CheckExact(obj) || >> PyString_CheckExact(obj) || >> PyUnicode_CheckExact(obj)){ >> //Avoid expensive calls when I am sure the attribute >> //does not exist >> ret = NULL; >> } >> else{ >> ret = PyObject_GetAttrString(obj, "__array_priority__"); >> >> >> >> ( I think I found about 7 spots ) >> >> >> I also noticed (not as bad in my case) that calls to PyObject_GetBuffer >> also resulted in Python errors being set thus unnecessarily slower code. >> >> >> With this change, something like this, >> for i in xrange(1000000): >> if a[1] < 35.0: >> pass >> >> went down from 0.8 seconds to 0.38 seconds. >> >> A bogus test like this, >> for i in xrange(1000000): >> a = array([1., 2., 3.]) >> >> went down from 8.5 seconds to 2.5 seconds. >> >> >> >> Altogether, these simple changes got me half way to the speed I used to >> get in Numeric and I could not see any slow down in any of my cases that >> benefit from heavy array manipulation. I am out of ideas on how to >> improve further though. >> >> Few questions: >> - Is there any interest for me to provide the exact details of the code >> I changed ? >> >> - I managed to compile NumPy through setup.py but I am not sure how to >> force it to generate pdb files from my Visual Studio Compiler. I need >> the pdb files such that I can run my profiler on NumPy. Anybody has any >> experience with this ? (Visual Studio) >> >> - The core of my problems I think boil down to things like this >> s = a[0] >> assigning a float64 into s as opposed to a native float ? >> Is there any way to hack code to change it to extract a native float >> instead ? (probably crazy talk, but I thought I'd ask :) ). >> I'd prefer to not use s = a.item(0) because I would have to change too >> much code and it is not even that much faster. For example, >> for i in xrange(1000000): >> if a.item(1) < 35.0: >> pass >> is 0.23 seconds (as opposed to 0.38 seconds with my suggested changes) >> >> >> I apologize again if this topic has already been discussed. >> >> >> Regards, >> >> Raul >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From raul at virtualmaterials.com Mon Dec 3 11:26:39 2012 From: raul at virtualmaterials.com (Raul Cota) Date: Mon, 03 Dec 2012 09:26:39 -0700 Subject: [Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement In-Reply-To: References: <50BC0038.70105@virtualmaterials.com> Message-ID: <50BCD2BF.1040807@virtualmaterials.com> On 03/12/2012 4:14 AM, Nathaniel Smith wrote: > On Mon, Dec 3, 2012 at 1:28 AM, Raul Cota wrote: >> I finally decided to track down the problem and I started by getting >> Python 2.6 from source and profiling it in one of my cases. By far the >> biggest bottleneck came out to be PyString_FromFormatV which is a >> function to assemble a string for a Python error caused by a failure to >> find an attribute when "multiarray" calls PyObject_GetAttrString. This >> function seems to get called way too often from NumPy. The real >> bottleneck of trying to find the attribute when it does not exist is not >> that it fails to find it, but that it builds a string to set a Python >> error. In other words, something as simple as "a[0] < 3.5" internally >> result in a call to set a python error . >> >> I downloaded NumPy code (for Python 2.6) and tracked down all the calls >> like this, >> >> ret = PyObject_GetAttrString(obj, "__array_priority__"); >> >> and changed to >> if (PyList_CheckExact(obj) || (Py_None == obj) || >> PyTuple_CheckExact(obj) || >> PyFloat_CheckExact(obj) || >> PyInt_CheckExact(obj) || >> PyString_CheckExact(obj) || >> PyUnicode_CheckExact(obj)){ >> //Avoid expensive calls when I am sure the attribute >> //does not exist >> ret = NULL; >> } >> else{ >> ret = PyObject_GetAttrString(obj, "__array_priority__"); >> >> ( I think I found about 7 spots ) > If the problem is the exception construction, then maybe this would > work about as well? > > if (PyObject_HasAttrString(obj, "__array_priority__") { > ret = PyObject_GetAttrString(obj, "__array_priority__"); > } else { > ret = NULL; > } > > If so then it would be an easier and more reliable way to accomplish this. I did think of that one but at least in Python 2.6 the implementation is just a wrapper to PyObject_GetAttrSting that clears the error """ PyObject_HasAttrString(PyObject *v, const char *name) { PyObject *res = PyObject_GetAttrString(v, name); if (res != NULL) { Py_DECREF(res); return 1; } PyErr_Clear(); return 0; } """ so it is just as bad when it fails and a waste when it succeeds (it will end up finding it twice). In my opinion, Python's source code should offer a version of PyObject_GetAttrString that does not raise an error but that is a completely different topic. >> I also noticed (not as bad in my case) that calls to PyObject_GetBuffer >> also resulted in Python errors being set thus unnecessarily slower code. >> >> With this change, something like this, >> for i in xrange(1000000): >> if a[1] < 35.0: >> pass >> >> went down from 0.8 seconds to 0.38 seconds. > Huh, why is PyObject_GetBuffer even getting called in this case? Sorry for being misleading in an already long and confusing email. PyObject_GetBuffer is not getting called doing an "if" call. This call showed up in my profiler as a time consuming task that raised python errors unnecessarily (not nearly as bad as often as PyObject_GetAttrString ) but since I was already there I decided to look into it as well. The point I was trying to make was that I did both changes (avoiding PyObject_GetBuffer, PyObject_GetAttrString) when I came up with the times. >> A bogus test like this, >> for i in xrange(1000000): >> a = array([1., 2., 3.]) >> >> went down from 8.5 seconds to 2.5 seconds. > I can see why we'd call PyObject_GetBuffer in this case, but not why > it would take 2/3rds of the total run-time... Same scenario. This total time includes both changes (avoiding PyObject_GetBuffer, PyObject_GetAttrString). If my memory helps, I believe PyObject_GetBuffer gets called once for every 9 times of a call to PyObject_GetAttrString in this scenario. >> - The core of my problems I think boil down to things like this >> s = a[0] >> assigning a float64 into s as opposed to a native float ? >> Is there any way to hack code to change it to extract a native float >> instead ? (probably crazy talk, but I thought I'd ask :) ). >> I'd prefer to not use s = a.item(0) because I would have to change too >> much code and it is not even that much faster. For example, >> for i in xrange(1000000): >> if a.item(1) < 35.0: >> pass >> is 0.23 seconds (as opposed to 0.38 seconds with my suggested changes) > I'm confused here -- first you say that your problems would be fixed > if a[0] gave you a native float, but then you say that a.item(0) > (which is basically a[0] that gives a native float) is still too slow? Don't get me wrong. I am confused too when it gets beyond my suggested changes :) . My "theory" for saying that a.item(1) is not the same to a[1] returning a float was that perhaps the overhead of the dot operator is too big. At the end of the day, I do want to profile NumPy and find out if there is anything I can do to speed things up. To bring things more into context, I don't really care to speed up a bogus loop with if statements. My bottom line is, - I am focusing on two cases from our software that take 141.8 seconds and 40 seconds respectively using Numeric and Python 2.2.3 . - These cases now take 229 seconds and 62 seconds respectively using NumPy and Python 2.6 . This is quite a bit of a slow down taking into account that Python code that uses only native objects is quite a bit faster in Python 2.6 Vs Python 2.2 Both cases (like most of our software) use array operations as much as possible and revert down to scalar operations when it is not practical to do otherwise. I am not saying it is impossible to optimize even more, it is just not practical. I ran the profiler on Python 2.6 and I found the bottlenecks I reported in this email. Both of my cases are now running at 170 and 50 seconds respectively. In other words, I am "almost" back to where I want to be. The improvement is huge, but in my opinion it still uncomfortably far from what it used to be in Numeric and I worry that there may be other spots in our software that may be affected on a more meaningful way that I just have not noticed. > (OTOH at 40% speedup is pretty good, even if it is just a > microbenchmark :-).) Array scalars are definitely pretty slow: > > In [9]: timeit a[0] > 1000000 loops, best of 3: 151 ns per loop > > In [10]: timeit a.item(0) > 10000000 loops, best of 3: 169 ns per loop > > In [11]: timeit a[0] < 35.0 > 1000000 loops, best of 3: 989 ns per loop > > In [12]: timeit a.item(0) < 35.0 > 1000000 loops, best of 3: 233 ns per loop > > It is probably possible to make numpy scalars faster... I'm not even > sure why they go through the ufunc machinery, like Travis said, since > they don't even follow the ufunc rules: > > In [3]: np.array(2) * [1, 2, 3] # 0-dim array coerces and broadcasts > Out[3]: array([2, 4, 6]) > > In [4]: np.array(2)[()] * [1, 2, 3] # scalar acts like python integer > Out[4]: [1, 2, 3, 1, 2, 3] > > But you may want to experiment a bit more to make sure this is > actually the problem. IME guesses about speed problems are almost > always wrong (even when I take this rule into account and only guess > when I'm *really* sure). I agree 100% about the pitfalls of guessing. Thanks to Christoph's suggestion I should be able to profile NumPy now. Thanks for your comments, Raul > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From chris.barker at noaa.gov Mon Dec 3 14:49:57 2012 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Mon, 3 Dec 2012 11:49:57 -0800 Subject: [Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement In-Reply-To: <50BCD2BF.1040807@virtualmaterials.com> References: <50BC0038.70105@virtualmaterials.com> <50BCD2BF.1040807@virtualmaterials.com> Message-ID: Raul, Thanks for doing this work -- both the profiling and actual suggestions for how to improve the code -- whoo hoo! In general, it seem that numpy performance for scalars and very small arrays (i.e (2,), (3,) maybe (3,3), the kind of thing that you'd use to hold a coordinate point or the like, not small as in "fits in cache") is pretty slow. In principle, a basic array scalar operation could be as fast as a numpy native numeric type, and it would be great is small array operations were, too. It may be that the route to those performance improvements is special-case code, which is ugly, but I think could really be worth it for the common types and operations. I'm really out of my depth for suggesting (or contributing) actual soluitons, but +1 for the idea! -Chris NOTE: Here's a example of what I'm talking about -- say you are scaling an (x,y) point by a (s_x, s_y) scale factor: def numpy_version(point, scale): return point * scale def tuple_version(point, scale): return (point[0] * scale[0], point[1] * scale[1]) In [36]: point_arr, sca scale scale_arr In [36]: point_arr, scale_arr Out[36]: (array([ 3., 5.]), array([ 2., 3.])) In [37]: timeit tuple_version(point, scale) 1000000 loops, best of 3: 397 ns per loop In [38]: timeit numpy_version(point_arr, scale_arr) 100000 loops, best of 3: 2.32 us per loop It would be great if numpy could get closer to tuple performance for this sor tof thing... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From magnetotellurics at gmail.com Mon Dec 3 15:10:31 2012 From: magnetotellurics at gmail.com (Karl Kappler) Date: Mon, 3 Dec 2012 12:10:31 -0800 Subject: [Numpy-discussion] Apparently Non-deterministic behaviour of complex-array instantiation values Message-ID: Hello, This is a continuation of a problem I had last year, http://old.nabble.com/Apparently-non-deterministic-behaviour-of-complex-array-multiplication-tt32893004.html#a32931369 at least it seems to have similar symptoms. I am working again with complex valued arrays in numpy (python version 2.7.3). This time however, the dataset is not very large, and I am able to post a snippet of the code. I became aware of the problem working in and IDE: Spyder 2.1.9, where I was repeatedly running code by pushing f5 and checking that my numerical results were what I expect. What I found was that the output in spyder varied somewhat randomly. In particular, when initializing a 2x2 complex-valued numpy array on line 32 of the code called ?lowTri?. When I print the value of the upper right element (the one that should be zero) I often see 1.789+1.543j, or 0+1.543j, or 0+0j. This behavior happens when I run the code using f5 in spyder, and I thought it may be a Spyder issue, but further investigation has shown equally strange behavior on the command line as well. When I run this script on the command line the output is usually the same from run to run (although I have seen some variations, which I do not understand), but most remarkable, and reproducible, is that if I comment out line 17 (where the complex-valued array zzz is populated), the behavior of the initialization of lowTri varies. With Line 17 uncommented I usually get (on the command line): Lower Triangular [0,1]: 1.543j [[ 670.9 +1.22400000e-05j 0.0 +1.54300000e+00j] [ 195.8 -1.17300000e+02j 391.2 +1.46900000e-05j]] Lower Triangular [0,1]: 1.543j Real Part 0.0 and with line 17 it commented: Lower Triangular [0,1]: 0j [[ 670.9 +1.22400000e-05j 0.0 +0.00000000e+00j] [ 195.8 -1.17300000e+02j 391.2 +1.46900000e-05j]] Lower Triangular [0,1]: 0j Real Part 0.0 I.e.. the imaginary part is initialized to a different value. From reading up on forums I think I understand that when an array is allocated without specific values, it will be given random values which are very small, ie. ~1e-316 or so. But it would seem that sometimes initallization is done to a finite quantity. I know I can try to initialize the array using np.zeros() instead of np.ndarray(), but it is the principle I am concerned about. Last year it had been suggested that I had bad RAM, but these issues are reproducing on four computers, one of which has a new motherboard/RAM and AMD processor, and the others are Intel. Memtest has been run recently on at least two of the machines. Could someone try running this script with and without line 17 commented out and tell me if they are getting the same sort of behaviour? Thanks, Karl ********************************** import numpy as np if __name__ == "__main__": a={} a[0] = '0.1537E+00 0.1610E+01 -0.4801E+01 -0.3175E+01' a[1] = '0.1789E+01 0.1543E+01 -0.5524E+00 -0.8423E+00' c = '0.6709E+03 0.1224E-04 0.1958E+03 -0.1173E+03 0.3912E+03 0.1469E-04' ztmp = np.zeros((4,2)) zzz = np.zeros((2,2)) + complex(0,1)*np.zeros((2,2)); line = [] for iE in range(2): line = a[iE].split() for iElt, elt in enumerate(line): ztmp[iElt,iE] = float(elt) zzz[:,0:2] = ztmp[[0,2],:] + complex(0,1)*ztmp[[1,3],:] #commenting this line seems to affect value of lowTri stemp = np.zeros((2,3)) nElts = np.prod(stemp.shape) v = [] line = c.split() for l in line: v.append(float(l)) N=len(v)/2 cVec = np.ndarray(shape=(N), dtype=complex) for i in range(N): cVec[i] = complex(float(v[2*i]),float(v[2*(i+1)-1])) lowTri = np.ndarray(shape=(2,2), dtype=complex) #lowTri = np.zeros(shape=(2,2), dtype=complex) print("Lower Triangular [0,1]: {}".format(lowTri[0,1])) TI = np.tril_indices(2) rows = TI[0] cols = TI[1] for iCell in range(len(rows)): lowTri[rows[iCell],cols[iCell]] = cVec[iCell] print lowTri print("Lower Triangular [0,1]: {}".format(lowTri[0,1])) print("Real Part {}".format(np.real(lowTri[0,1]))) -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Mon Dec 3 15:19:54 2012 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 03 Dec 2012 22:19:54 +0200 Subject: [Numpy-discussion] Apparently Non-deterministic behaviour of complex-array instantiation values In-Reply-To: References: Message-ID: 03.12.2012 22:10, Karl Kappler kirjoitti: [clip] > I.e.. the imaginary part is initialized to a different value. From > reading up on forums I think I understand that when an array is > allocated without specific values, it will be given random values which > are very small, ie. ~1e-316 or so. But it would seem that sometimes > initallization is done to a finite quantity. I know I can try to > initialize the array using np.zeros() instead of np.ndarray(), but it is > the principle I am concerned about. The memory is not initialized in any way [*] if you get the array from np.empty(..) or np.ndarray(...). It contains whatever that happens to be at that location. It just happens that "typical memory content" when viewed in floating point often looks like that. [*] Except that the OS zeroes new memory pages given to the process. Processes however reuse the pages they are given. -- Pauli Virtanen From raul at virtualmaterials.com Mon Dec 3 17:12:57 2012 From: raul at virtualmaterials.com (Raul Cota) Date: Mon, 03 Dec 2012 15:12:57 -0700 Subject: [Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement In-Reply-To: References: <50BC0038.70105@virtualmaterials.com> <50BCD2BF.1040807@virtualmaterials.com> Message-ID: <50BD23E9.1080903@virtualmaterials.com> Chris, thanks for the feedback, fyi, the minor changes I talked about have different performance enhancements depending on scenario, e.g, 1) Array * Array point = array( [2.0, 3.0]) scale = array( [2.4, 0.9] ) retVal = point * scale #The line above runs 1.1 times faster with my new code (but it runs 3 times faster in Numeric in Python 2.2) #i.e. pretty meaningless but still far from old Numeric 2) Array * Tuple (item by item) point = array( [2.0, 3.0]) scale = (2.4, 0.9 ) retVal = point[0] < scale[0], point[1] < scale[1] #The line above runs 1.8 times faster with my new code (but it runs 6.8 times faster in Numeric in Python 2.2) #i.e. pretty decent speed up but quite far from old Numeric I am not saying that I would ever do something exactly like (2) in my code nor am I saying that the changes in NumPy Vs Numeric are not beneficial. My point is that performance in small size problems is fairly far from what it used to be in Numeric particularly when dealing with scalars and it is problematic at least to me. I am currently looking around to see if there are practical ways to speed things up without slowing anything else down. Will keep you posted. regards, Raul On 03/12/2012 12:49 PM, Chris Barker - NOAA Federal wrote: > Raul, > > Thanks for doing this work -- both the profiling and actual > suggestions for how to improve the code -- whoo hoo! > > In general, it seem that numpy performance for scalars and very small > arrays (i.e (2,), (3,) maybe (3,3), the kind of thing that you'd use > to hold a coordinate point or the like, not small as in "fits in > cache") is pretty slow. In principle, a basic array scalar operation > could be as fast as a numpy native numeric type, and it would be great > is small array operations were, too. > > It may be that the route to those performance improvements is > special-case code, which is ugly, but I think could really be worth it > for the common types and operations. > > I'm really out of my depth for suggesting (or contributing) actual > soluitons, but +1 for the idea! > > -Chris > > NOTE: Here's a example of what I'm talking about -- say you are > scaling an (x,y) point by a (s_x, s_y) scale factor: > > def numpy_version(point, scale): > return point * scale > > > def tuple_version(point, scale): > return (point[0] * scale[0], point[1] * scale[1]) > > > In [36]: point_arr, sca > scale scale_arr > > In [36]: point_arr, scale_arr > Out[36]: (array([ 3., 5.]), array([ 2., 3.])) > > In [37]: timeit tuple_version(point, scale) > 1000000 loops, best of 3: 397 ns per loop > > In [38]: timeit numpy_version(point_arr, scale_arr) > 100000 loops, best of 3: 2.32 us per loop > > It would be great if numpy could get closer to tuple performance for > this sor tof thing... > > > -Chris > > From ondrej.certik at gmail.com Mon Dec 3 21:27:42 2012 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Mon, 3 Dec 2012 18:27:42 -0800 Subject: [Numpy-discussion] Weird Travis-CI bugs in the release 1.7.x branch Message-ID: Hi, I started to work on the release again and noticed weird failures at Travis-CI: https://github.com/numpy/numpy/pull/2782 The first commit (8a18fc7) should not trigger this failure: ====================================================================== FAIL: test_iterator.test_iter_array_cast ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/travis/virtualenv/python2.5/lib/python2.5/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/home/travis/virtualenv/python2.5/lib/python2.5/site-packages/numpy/core/tests/test_iterator.py", line 836, in test_iter_array_cast assert_equal(i.operands[0].strides, (-96,8,-32)) File "/home/travis/virtualenv/python2.5/lib/python2.5/site-packages/numpy/testing/utils.py", line 252, in assert_equal assert_equal(actual[k], desired[k], 'item=%r\n%s' % (k,err_msg), verbose) File "/home/travis/virtualenv/python2.5/lib/python2.5/site-packages/numpy/testing/utils.py", line 314, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: item=0 ACTUAL: 96 DESIRED: -96 So I pushed a whitespace commit into the PR (516b478) yet it has the same failure. So it's there, it's not some random fluke at Travis. I created this testing PR: https://github.com/numpy/numpy/pull/2783 to try to nail it down. But I can't see what could have caused this, because the release branch was passing all tests last time I worked on it. Any ideas? Btw, I managed to reproduce the SPARC64 bug: https://github.com/numpy/numpy/issues/2668 so that's good. Now I just need to debug it. Ondrej P.S. My thesis was finally approved by the grad school today, doing some final changes took more time than expected, but I think that I am done now. From njs at pobox.com Mon Dec 3 22:10:50 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 4 Dec 2012 03:10:50 +0000 Subject: [Numpy-discussion] Weird Travis-CI bugs in the release 1.7.x branch In-Reply-To: References: Message-ID: On 4 Dec 2012 02:27, "Ond?ej ?ert?k" wrote: > > Hi, > > I started to work on the release again and noticed weird failures at Travis-CI: [?] > File "/home/travis/virtualenv/python2.5/lib/python2.5/site-packages/numpy/core/tests/test_iterator.py", The problem is that Travis started installing numpy in all python virtualenvs by default, and our Travis build script just runs setup.py install, which is too dumb to notice that there is a numpy already installed and just overwrites it. The file mentioned above doesn't even exist in 1.7, it's left over from the 1.6 install. I did a PR to fix this in master a few days ago, you want to back port that. (Sorry for lack of link, I'm on my phone.) > P.S. My thesis was finally approved by the grad school today, > doing some final changes took more time than expected, but > I think that I am done now. Congratulations Dr. ?ert?k! -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Dec 4 08:57:12 2012 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 04 Dec 2012 14:57:12 +0100 Subject: [Numpy-discussion] Allowing 0-d arrays in np.take Message-ID: <1354629432.21666.13.camel@sebastian-laptop> Hey, Maybe someone has an opinion about this (since in fact it is new behavior, so it is undefined). `np.take` used to not allow 0-d/scalar input but did allow any other dimensions for the indices. Thinking about changing this, meaning that: np.take(np.arange(5), 0) works. I was wondering if anyone has feelings about whether this should return a scalar or a 0-d array. Typically numpy prefers scalars for these cases (indexing would return a scalar too) for good reasons, so I guess that is correct. But since I noticed this wondering if maybe it returns a 0-d array, I thought I would ask here. Regards, Sebastian From ben.root at ou.edu Tue Dec 4 09:15:45 2012 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 4 Dec 2012 09:15:45 -0500 Subject: [Numpy-discussion] Allowing 0-d arrays in np.take In-Reply-To: <1354629432.21666.13.camel@sebastian-laptop> References: <1354629432.21666.13.camel@sebastian-laptop> Message-ID: On Tue, Dec 4, 2012 at 8:57 AM, Sebastian Berg wrote: > Hey, > > Maybe someone has an opinion about this (since in fact it is new > behavior, so it is undefined). `np.take` used to not allow 0-d/scalar > input but did allow any other dimensions for the indices. Thinking about > changing this, meaning that: > > np.take(np.arange(5), 0) > > works. I was wondering if anyone has feelings about whether this should > return a scalar or a 0-d array. Typically numpy prefers scalars for > these cases (indexing would return a scalar too) for good reasons, so I > guess that is correct. But since I noticed this wondering if maybe it > returns a 0-d array, I thought I would ask here. > > Regards, > > Sebastian > > At first, I was thinking that the output type should be based on what the input type is. So, if a scalar index was used, then a scalar value should be returned. But this wouldn't be true if the array had other dimensions. So, perhaps it should always be an array. The only other option is to mimic the behavior of the array indexing, which wouldn't be a bad choice. Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From ondrej.certik at gmail.com Tue Dec 4 11:14:31 2012 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Tue, 4 Dec 2012 08:14:31 -0800 Subject: [Numpy-discussion] Weird Travis-CI bugs in the release 1.7.x branch In-Reply-To: References: Message-ID: On Mon, Dec 3, 2012 at 7:10 PM, Nathaniel Smith wrote: > On 4 Dec 2012 02:27, "Ond?ej ?ert?k" wrote: >> >> Hi, >> >> I started to work on the release again and noticed weird failures at >> Travis-CI: > [?] >> File >> "/home/travis/virtualenv/python2.5/lib/python2.5/site-packages/numpy/core/tests/test_iterator.py", > > The problem is that Travis started installing numpy in all python > virtualenvs by default, and our Travis build script just runs setup.py > install, which is too dumb to notice that there is a numpy already installed > and just overwrites it. The file mentioned above doesn't even exist in 1.7, > it's left over from the 1.6 install. > > I did a PR to fix this in master a few days ago, you want to back port that. > (Sorry for lack of link, I'm on my phone.) Thanks! I backported it in: https://github.com/numpy/numpy/pull/2786 Nice, I was not aware of the fact that "pip install ." fixes this problem with setup.py --- I've burned myself with this so many times already and I always forget about this bug. > >> P.S. My thesis was finally approved by the grad school today, >> doing some final changes took more time than expected, but >> I think that I am done now. > > Congratulations Dr. ?ert?k! Thanks. I am glad it's over. Ondrej From sebastian at sipsolutions.net Tue Dec 4 12:08:32 2012 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 04 Dec 2012 18:08:32 +0100 Subject: [Numpy-discussion] Numpy's definition of contiguous arrays Message-ID: <1354640912.21666.65.camel@sebastian-laptop> Hi, maybe someone has an opinion about how this can be handled and was not yet aware of this. In current numpy master (probably being reverted), the definition for contiguous arrays is changed such that it means "Contiguous in memory" and nothing more. What this means is this: 1. An array of size (1,3,1) is both C- and F-contiguous (Assuming `arr.strides[1] == arr.itemsize`). 2. However it is incorrect that `arr.strides[-1] == arr.itemsize` because the corresponding axes dimension is 1 so it does not matter for the memory layout. Also other similar assumptions about "clean strides" are incorrect. (This was always incorrect in corner cases) I think most will agree that this change reflects what these flags should indicate, because the exact value of the strides is not really important for the memory layout and for example for a row vector there is no reason to say it cannot be both C- and F-contiguous. However the change broke some code in scipy as well as sk-learn, that relied on `arr.strides[-1] == arr.itemsize` (for C-contiguous arrays). The fact that it was never noticed that this isn't quite correct indicates that there is certainly more code out there just like it. There is more discussion here: https://github.com/numpy/numpy/pull/2735 with suggestions for a possible deprecation process of having both definitions next to each other and deprecating the current, etc. I was personally wondering if it is good enough to ensure strides are cleaned up when an array is explicitly requested as contiguous which means: np.array(arr, copy=False, order='C').strides[-1] == arr.itemsize is always True, but: if arr.flags.c_contiguous: # It is possible that: arr.strides[-1] != arr.itemsize Which fixes the problems found yet since typically if you want to use the fact that an array is contiguous, you use this kind of command to make sure it is. But I guess it is likely too dangerous to assume that nobody only checks the flags and then continuous to do unwanted assumptions about strides. Best Regards, Sebastian From ondrej.certik at gmail.com Tue Dec 4 18:47:41 2012 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Tue, 4 Dec 2012 15:47:41 -0800 Subject: [Numpy-discussion] Weird Travis-CI bugs in the release 1.7.x branch In-Reply-To: References: Message-ID: On Tue, Dec 4, 2012 at 8:14 AM, Ond?ej ?ert?k wrote: > On Mon, Dec 3, 2012 at 7:10 PM, Nathaniel Smith wrote: >> On 4 Dec 2012 02:27, "Ond?ej ?ert?k" wrote: >>> >>> Hi, >>> >>> I started to work on the release again and noticed weird failures at >>> Travis-CI: >> [?] >>> File >>> "/home/travis/virtualenv/python2.5/lib/python2.5/site-packages/numpy/core/tests/test_iterator.py", >> >> The problem is that Travis started installing numpy in all python >> virtualenvs by default, and our Travis build script just runs setup.py >> install, which is too dumb to notice that there is a numpy already installed >> and just overwrites it. The file mentioned above doesn't even exist in 1.7, >> it's left over from the 1.6 install. >> >> I did a PR to fix this in master a few days ago, you want to back port that. >> (Sorry for lack of link, I'm on my phone.) > > Thanks! I backported it in: > > https://github.com/numpy/numpy/pull/2786 > > Nice, I was not aware of the fact that "pip install ." fixes this > problem with setup.py --- > I've burned myself with this so many times already and I always forget > about this bug. It's fixed in the release branch now. So both master and the release branch pass all tests on Travis again. Thanks for your help. Ondrej From markbak at gmail.com Wed Dec 5 16:35:32 2012 From: markbak at gmail.com (Mark Bakker) Date: Wed, 5 Dec 2012 22:35:32 +0100 Subject: [Numpy-discussion] how do I specify maximum line length when using savetxt? Message-ID: Hello List, I want to write a large array to file, and each line can only be 80 characters long. Can I use savetxt to do that? Where would I specify the maximum line length? Or is there a better way to do this? Thanks, Mark From markbak at gmail.com Wed Dec 5 16:42:27 2012 From: markbak at gmail.com (Mark Bakker) Date: Wed, 5 Dec 2012 22:42:27 +0100 Subject: [Numpy-discussion] turn off square brackets in set_print_options? Message-ID: Hello List, Is it possible to turn off printing the square brackets in set_print_options? Am I overlooking something? Thanks, Mark From paul.anton.letnes at gmail.com Wed Dec 5 16:56:55 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Wed, 5 Dec 2012 22:56:55 +0100 Subject: [Numpy-discussion] how do I specify maximum line length when using savetxt? In-Reply-To: References: Message-ID: <46DE5D18-345D-483F-808D-4776487B7884@gmail.com> On 5. des. 2012, at 22:35, Mark Bakker wrote: > Hello List, > > I want to write a large array to file, and each line can only be 80 > characters long. > Can I use savetxt to do that? Where would I specify the maximum line length? If you specify the format, %10.3f for instance, you will know the max line length if you also know the array shape. > Or is there a better way to do this? Probably 1000 ways to accomplish the same thing out there, sure. Cheers Paul > > Thanks, > > Mark > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From markbak at gmail.com Wed Dec 5 18:40:17 2012 From: markbak at gmail.com (Mark Bakker) Date: Thu, 6 Dec 2012 00:40:17 +0100 Subject: [Numpy-discussion] how do I specify maximum line length when using savetxt? Message-ID: I guess I wasn't explicit enough. Say I have an array with 100 numbers and I want to write it to a file with 6 numbers on each line (and hence, only 4 on the last line). Can I use savetxt to do that? What other easy tool does numpy have to do that? Thanks, Mark On 5. des. 2012, at 22:35, Mark Bakker wrote: > Hello List, > > I want to write a large array to file, and each line can only be 80 > characters long. > Can I use savetxt to do that? Where would I specify the maximum line length? If you specify the format, %10.3f for instance, you will know the max line length if you also know the array shape. > Or is there a better way to do this? Probably 1000 ways to accomplish the same thing out there, sure. Cheers Paul From derek at astro.physik.uni-goettingen.de Wed Dec 5 19:27:42 2012 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Thu, 6 Dec 2012 01:27:42 +0100 Subject: [Numpy-discussion] how do I specify maximum line length when using savetxt? In-Reply-To: References: Message-ID: On 06.12.2012, at 12:40AM, Mark Bakker wrote: > I guess I wasn't explicit enough. > Say I have an array with 100 numbers and I want to write it to a file > with 6 numbers on each line (and hence, only 4 on the last line). > Can I use savetxt to do that? > What other easy tool does numpy have to do that? I've just been looking into a similar case and I think there is no easy tool for this - i.e. nothing comparable to Fortran's '(6e10.3)' or the like, so if your array does not reshape to a Nx6 array, you'd probably have to write something customised yourself. I would not be terribly difficult to add such functionality to savetxt, but then, unless you want the output file to be more human-readable, there is not really a strong case for writing a shape (100,) array into 16 lines plus an incomplete one - it just would not play well with reading back in and then determining the right shape automatically? HTH, Derek From raul at virtualmaterials.com Wed Dec 5 19:30:44 2012 From: raul at virtualmaterials.com (Raul Cota) Date: Wed, 05 Dec 2012 17:30:44 -0700 Subject: [Numpy-discussion] how do I specify maximum line length when using savetxt? In-Reply-To: References: Message-ID: <50BFE734.7090601@virtualmaterials.com> assuming savetxt does not support it, I modified a bit of code I had to do what I think you need ONLY works for a 1D array and wrapped it into a function that writes in properly formatted columns. I didn't really test it other than what is there. I "dressed" it like savetxt but the glaring difference is that it goes off significant digits as opposed to format. """ import numpy def mysavetxt_forvector(fname, x, sigDigs, delimiter, newline, maxCharsPLine, fmode='w'): padSize = sigDigs + 6 #How many characters per number including empty space fmt = str(padSize) + '.' + str(sigDigs) + 'g' #e.g. 13.7g' asTxtLst = map(lambda val: format(val, fmt), a) #from array to list of formatted strings #how many cols max ? cols = maxCharsPLine/(padSize + len(delimiter)) #write to file size = len(asTxtLst) col = 0 f = open(fname, fmode) while col < size: f.write(delimiter.join(asTxtLst[col:col+cols]) ) f.write(newline) col += cols f.close() #Test it a = numpy.ones(34, dtype='float64') * 7./3. a[3] = 123564234.0002345 a[5] = 1 a[7] = -123564234.0002345 a[9] = -.00000000000023453456345 sigDigs = 7 maxCharsPLine = 80 delimiter = ',' newline = '\n' fname = 'temp.out' mysavetxt_forvector(fname, a, sigDigs, delimiter, newline, maxCharsPLine) #append on this one maxCharsPLine = 33 mysavetxt_forvector(fname, a, sigDigs, delimiter, newline, maxCharsPLine, fmode='a') """ Raul On 05/12/2012 4:40 PM, Mark Bakker wrote: > I guess I wasn't explicit enough. > Say I have an array with 100 numbers and I want to write it to a file > with 6 numbers on each line (and hence, only 4 on the last line). > Can I use savetxt to do that? > What other easy tool does numpy have to do that? > Thanks, > Mark > > On 5. des. 2012, at 22:35, Mark Bakker wrote: > >> Hello List, >> >> I want to write a large array to file, and each line can only be 80 >> characters long. >> Can I use savetxt to do that? Where would I specify the maximum line length? > > If you specify the format, %10.3f for instance, you will know the max > line length if you also know the array shape. > > >> Or is there a better way to do this? > > Probably 1000 ways to accomplish the same thing out there, sure. > > Cheers > Paul > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From pav at iki.fi Wed Dec 5 19:45:23 2012 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 06 Dec 2012 02:45:23 +0200 Subject: [Numpy-discussion] Numpy Trac migration Message-ID: <50BFEAA3.1030306@iki.fi> Hi, For those whom it may concern: Since the Numpy Trac -> Github migration is complete, I went ahead and added redirects projects.scipy.org/numpy/register -> github.com/numpy/numpy/issues projects.scipy.org/numpy/newticket -> github.com/numpy/numpy/issues plus an ugly warning bar on top to direct potential users towards Github. Cheers, -- Pauli Virtanen From alex.eberspaecher at gmail.com Thu Dec 6 07:29:08 2012 From: alex.eberspaecher at gmail.com (Alexander =?ISO-8859-1?B?RWJlcnNw5GNoZXI=?=) Date: Thu, 6 Dec 2012 13:29:08 +0100 Subject: [Numpy-discussion] site.cfg: Custom BLAS / LAPACK configuration In-Reply-To: References: Message-ID: <20121206132908.769707dd@poetzsch.nat.uni-magdeburg.de> On Fri, 30 Nov 2012 12:13:58 -0800 "Bradley M. Froehle" wrote: > As far as I can tell, it's IMPOSSIBLE to create a site.cfg which will > link to ACML when a system installed ATLAS is present. setup.py respects environment variables. You can set ATLAS to None and force the setup to use $LAPACK and $BLAS. See also this link: http://www.der-schnorz.de/2012/06/optimized-linear-algebra-and-numpyscipy/ Greetings Alex From ondrej.certik at gmail.com Thu Dec 6 12:35:33 2012 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Thu, 6 Dec 2012 09:35:33 -0800 Subject: [Numpy-discussion] Numpy Trac migration In-Reply-To: <50BFEAA3.1030306@iki.fi> References: <50BFEAA3.1030306@iki.fi> Message-ID: On Wed, Dec 5, 2012 at 4:45 PM, Pauli Virtanen wrote: > Hi, > > For those whom it may concern: Since the Numpy Trac -> Github migration > is complete, I went ahead and added redirects > > projects.scipy.org/numpy/register -> github.com/numpy/numpy/issues > projects.scipy.org/numpy/newticket -> github.com/numpy/numpy/issues > > plus an ugly warning bar on top to direct potential users towards Github. Thanks, that is excellent. Ondrej From brad.froehle at gmail.com Thu Dec 6 13:13:26 2012 From: brad.froehle at gmail.com (Bradley M. Froehle) Date: Thu, 6 Dec 2012 10:13:26 -0800 Subject: [Numpy-discussion] site.cfg: Custom BLAS / LAPACK configuration In-Reply-To: <20121206132908.769707dd@poetzsch.nat.uni-magdeburg.de> References: <20121206132908.769707dd@poetzsch.nat.uni-magdeburg.de> Message-ID: Thanks Alexander, that was quite helpful, but unfortunately does not actually work. The recommendations there are akin to a site.cfg file: [atlas] atlas_libs = library_dirs = [blas] blas_libs = cblas,acml library_dirs = /opt/acml5.2.0/gfortan64_fma4/lib [lapack] blas_libs = cblas,acml library_dirs = /opt/acml5.2.0/gfortan64_fma4/lib $ python setup.py build However this makes numpy think that there is no optimized blas available and prevents the numpy.core._dotblas module from being built. -Brad On Thu, Dec 6, 2012 at 4:29 AM, Alexander Ebersp?cher < alex.eberspaecher at gmail.com> wrote: > On Fri, 30 Nov 2012 12:13:58 -0800 > "Bradley M. Froehle" wrote: > > > As far as I can tell, it's IMPOSSIBLE to create a site.cfg which will > > link to ACML when a system installed ATLAS is present. > > setup.py respects environment variables. You can set ATLAS to None and > force the setup to use $LAPACK and $BLAS. See also this link: > > http://www.der-schnorz.de/2012/06/optimized-linear-algebra-and-numpyscipy/ > > Greetings > > Alex > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Thu Dec 6 13:34:50 2012 From: cournape at gmail.com (David Cournapeau) Date: Thu, 6 Dec 2012 19:34:50 +0100 Subject: [Numpy-discussion] site.cfg: Custom BLAS / LAPACK configuration In-Reply-To: References: <20121206132908.769707dd@poetzsch.nat.uni-magdeburg.de> Message-ID: On Thu, Dec 6, 2012 at 7:13 PM, Bradley M. Froehle wrote: > Thanks Alexander, that was quite helpful, but unfortunately does not > actually work. The recommendations there are akin to a site.cfg file: > > [atlas] > atlas_libs = > library_dirs = > > [blas] > blas_libs = cblas,acml > library_dirs = /opt/acml5.2.0/gfortan64_fma4/lib > > [lapack] > blas_libs = cblas,acml > library_dirs = /opt/acml5.2.0/gfortan64_fma4/lib > $ python setup.py build > > However this makes numpy think that there is no optimized blas available and > prevents the numpy.core._dotblas module from being built. _dotblas is only built if *C*blas is available (atlas, accelerate and mkl only are supported ATM). David From brad.froehle at gmail.com Thu Dec 6 13:35:36 2012 From: brad.froehle at gmail.com (Bradley M. Froehle) Date: Thu, 6 Dec 2012 10:35:36 -0800 Subject: [Numpy-discussion] site.cfg: Custom BLAS / LAPACK configuration In-Reply-To: References: <20121206132908.769707dd@poetzsch.nat.uni-magdeburg.de> Message-ID: Right, but if I link to libcblas, cblas would be available, no? On Thu, Dec 6, 2012 at 10:34 AM, David Cournapeau wrote: > On Thu, Dec 6, 2012 at 7:13 PM, Bradley M. Froehle > wrote: > > Thanks Alexander, that was quite helpful, but unfortunately does not > > actually work. The recommendations there are akin to a site.cfg file: > > > > [atlas] > > atlas_libs = > > library_dirs = > > > > [blas] > > blas_libs = cblas,acml > > library_dirs = /opt/acml5.2.0/gfortan64_fma4/lib > > > > [lapack] > > blas_libs = cblas,acml > > library_dirs = /opt/acml5.2.0/gfortan64_fma4/lib > > $ python setup.py build > > > > However this makes numpy think that there is no optimized blas available > and > > prevents the numpy.core._dotblas module from being built. > > _dotblas is only built if *C*blas is available (atlas, accelerate and > mkl only are supported ATM). > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgohlke at uci.edu Fri Dec 7 03:00:45 2012 From: cgohlke at uci.edu (Christoph Gohlke) Date: Fri, 07 Dec 2012 00:00:45 -0800 Subject: [Numpy-discussion] ValueError: low level cast function is for unequal type numbers Message-ID: <50C1A22D.7050209@uci.edu> Hello, the following code using np.object_ data types works with numpy 1.5.1 but fails with 1.6.2. Is this intended or a regression? Other data types, np.float64 for example, seem to work. In [1]: import numpy as np In [2]: np.array(['a'], dtype='O').astype(('O', [('name', 'O')])) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () ----> 1 np.array(['a'], dtype='O').astype(('O', [('name', 'O')])) ValueError: low level cast function is for unequal type numbers In [3]: np.array([16], dtype='d').astype(('d', [('name', 'd')])) Out[3]: array([ 16.]) These downstream issues could be related: http://code.google.com/p/h5py/issues/detail?id=217 https://github.com/CellProfiler/CellProfiler/issues/421 Thank you, Christoph From cournape at gmail.com Fri Dec 7 07:01:37 2012 From: cournape at gmail.com (David Cournapeau) Date: Fri, 7 Dec 2012 13:01:37 +0100 Subject: [Numpy-discussion] site.cfg: Custom BLAS / LAPACK configuration In-Reply-To: References: <20121206132908.769707dd@poetzsch.nat.uni-magdeburg.de>

Message-ID: On Thu, Dec 6, 2012 at 7:35 PM, Bradley M. Froehle wrote: > Right, but if I link to libcblas, cblas would be available, no? No, because we don't explicitly check for CBLAS. We assume it is there if Atlas, Accelerate or MKL is found. cheers, David > > > On Thu, Dec 6, 2012 at 10:34 AM, David Cournapeau > wrote: >> >> On Thu, Dec 6, 2012 at 7:13 PM, Bradley M. Froehle >> wrote: >> > Thanks Alexander, that was quite helpful, but unfortunately does not >> > actually work. The recommendations there are akin to a site.cfg file: >> > >> > [atlas] >> > atlas_libs = >> > library_dirs = >> > >> > [blas] >> > blas_libs = cblas,acml >> > library_dirs = /opt/acml5.2.0/gfortan64_fma4/lib >> > >> > [lapack] >> > blas_libs = cblas,acml >> > library_dirs = /opt/acml5.2.0/gfortan64_fma4/lib >> > $ python setup.py build >> > >> > However this makes numpy think that there is no optimized blas available >> > and >> > prevents the numpy.core._dotblas module from being built. >> >> _dotblas is only built if *C*blas is available (atlas, accelerate and >> mkl only are supported ATM). >> >> David >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From brad.froehle at gmail.com Fri Dec 7 13:09:00 2012 From: brad.froehle at gmail.com (Bradley M. Froehle) Date: Fri, 7 Dec 2012 10:09:00 -0800 Subject: [Numpy-discussion] site.cfg: Custom BLAS / LAPACK configuration In-Reply-To: References: <20121206132908.769707dd@poetzsch.nat.uni-magdeburg.de>

Message-ID: <7C5D3880ABAC4B1AB19214CDCBA2227A@gmail.com> Aha, thanks for the clarification. I've always been surpassed that NumPy doesn't ship with a copy of CBLAS. It's easy to compile --- just a thin wrapper over BLAS, if I remember correctly. -Brad On Friday, December 7, 2012 at 4:01 AM, David Cournapeau wrote: > On Thu, Dec 6, 2012 at 7:35 PM, Bradley M. Froehle > wrote: > > Right, but if I link to libcblas, cblas would be available, no? > > > > No, because we don't explicitly check for CBLAS. We assume it is there > if Atlas, Accelerate or MKL is found. > > cheers, > David From d.s.seljebotn at astro.uio.no Fri Dec 7 13:58:26 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 07 Dec 2012 19:58:26 +0100 Subject: [Numpy-discussion] site.cfg: Custom BLAS / LAPACK configuration In-Reply-To: <7C5D3880ABAC4B1AB19214CDCBA2227A@gmail.com> References: <20121206132908.769707dd@poetzsch.nat.uni-magdeburg.de>

<7C5D3880ABAC4B1AB19214CDCBA2227A@gmail.com> Message-ID: <50C23C52.3030204@astro.uio.no> One way of fixing this I'm sort of itching to do is to create a "pylapack" project which can iterate quickly on these build issues, run-time selection of LAPACK backend and so on. (With some templates generating some Cython code it shouldn't be more than a few days for an MVP.) Then patch NumPy to attempt to import pylapack (and if it's there, get a table of function pointers from it). The idea would be that powerusers could more easily build pylapack the way they wanted, in isolation, and then have other things switch to that without a rebuild (note again that it would need to export a table of function pointers). But I'm itching to do too many things, we'll see. Dag Sverre On 12/07/2012 07:09 PM, Bradley M. Froehle wrote: > Aha, thanks for the clarification. I've always been surpassed that NumPy doesn't ship with a copy of CBLAS. It's easy to compile --- just a thin wrapper over BLAS, if I remember correctly. > > -Brad > > > On Friday, December 7, 2012 at 4:01 AM, David Cournapeau wrote: > >> On Thu, Dec 6, 2012 at 7:35 PM, Bradley M. Froehle >> wrote: >>> Right, but if I link to libcblas, cblas would be available, no? >> >> >> >> No, because we don't explicitly check for CBLAS. We assume it is there >> if Atlas, Accelerate or MKL is found. >> >> cheers, >> David > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From njs at pobox.com Fri Dec 7 16:00:20 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 7 Dec 2012 21:00:20 +0000 Subject: [Numpy-discussion] Fwd: [matplotlib-devel] GitHub attachments In-Reply-To: References: Message-ID: Heh, looks like we did the trac migration about a month too soon... ---------- Forwarded message ---------- From: Damon McDougall Date: Fri, Dec 7, 2012 at 8:15 PM Subject: [matplotlib-devel] GitHub attachments To: matplotlib-devel at lists.sourceforge.net Did everyone see that GitHub now allows attachments in issue comments? IMO, this is awesome. Attaching a visual *thing* of output when users get unexpected results from plotting calls is a massive plus. Drag and drop right into the browser, too! No more having to upload a file to imgur or dropbox and link it to the issue comment and then have it disappear when you delete it and forgot it was linked to GitHub. Win. -- Damon McDougall http://www.damon-is-a-geek.com Institute for Computational Engineering Sciences 201 E. 24th St. Stop C0200 The University of Texas at Austin Austin, TX 78712-1229 ------------------------------------------------------------------------------ LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d _______________________________________________ Matplotlib-devel mailing list Matplotlib-devel at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/matplotlib-devel From njs at pobox.com Fri Dec 7 16:21:25 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 7 Dec 2012 21:21:25 +0000 Subject: [Numpy-discussion] [matplotlib-devel] GitHub attachments In-Reply-To: References: Message-ID: Oh, never mind, I guess they *only* allow image files. So, uh, no test data files, but if we have any lolcats in the trac attachments, we can migrate those. On Fri, Dec 7, 2012 at 9:00 PM, Nathaniel Smith wrote: > Heh, looks like we did the trac migration about a month too soon... > > ---------- Forwarded message ---------- > From: Damon McDougall > Date: Fri, Dec 7, 2012 at 8:15 PM > Subject: [matplotlib-devel] GitHub attachments > To: matplotlib-devel at lists.sourceforge.net > > > Did everyone see that GitHub now allows attachments in issue comments? > IMO, this is awesome. Attaching a visual *thing* of output when users > get unexpected results from plotting calls is a massive plus. Drag and > drop right into the browser, too! No more having to upload a file to > imgur or dropbox and link it to the issue comment and then have it > disappear when you delete it and forgot it was linked to GitHub. > > Win. > > -- > Damon McDougall > http://www.damon-is-a-geek.com > Institute for Computational Engineering Sciences > 201 E. 24th St. > Stop C0200 > The University of Texas at Austin > Austin, TX 78712-1229 > > ------------------------------------------------------------------------------ > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial > Remotely access PCs and mobile devices and provide instant support > Improve your efficiency, and focus on delivering more value-add services > Discover what IT Professionals Know. Rescue delivers > http://p.sf.net/sfu/logmein_12329d2d > _______________________________________________ > Matplotlib-devel mailing list > Matplotlib-devel at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/matplotlib-devel From jason-sage at creativetrax.com Fri Dec 7 17:16:00 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Fri, 07 Dec 2012 16:16:00 -0600 Subject: [Numpy-discussion] [matplotlib-devel] GitHub attachments In-Reply-To: References:

Message-ID: <50C26AA0.7050409@creativetrax.com> On 12/7/12 3:21 PM, Nathaniel Smith wrote: > Oh, never mind, I guess they *only* allow image files. So, uh, no test > data files, but if we have any lolcats in the trac attachments, we can > migrate those. > It looks like what they do is just automatically upload it to their own cloud, and then substitute in their standard markup for embedding images. So it's just replacing the "upload your file to somewhere" to "we'll upload it automatically to our own cloud." That said, it is really important and very nice that they're doing this! Thanks, Jason From ralf.gommers at gmail.com Sat Dec 8 05:24:14 2012 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 8 Dec 2012 11:24:14 +0100 Subject: [Numpy-discussion] turn off square brackets in set_print_options? In-Reply-To: References: Message-ID: On Wed, Dec 5, 2012 at 10:42 PM, Mark Bakker wrote: > Hello List, > > Is it possible to turn off printing the square brackets in > set_print_options? > Am I overlooking something? > You're not, those are hardcoded in _formatArray() in core/arrayprint.py Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Dec 8 06:15:06 2012 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 8 Dec 2012 12:15:06 +0100 Subject: [Numpy-discussion] Hierarchical vs non-hierarchical ndarray.base and __array_interface__ In-Reply-To: <2B413C1254F1B44F908C9BE85A5DA0AD21B7FA@PRDEXMBX-04.the-lab.llnl.gov> References: <2B413C1254F1B44F908C9BE85A5DA0AD21B7FA@PRDEXMBX-04.the-lab.llnl.gov> Message-ID: On Sat, Nov 24, 2012 at 8:34 PM, Gamblin, Todd wrote: > Hi all, > > I posted on the change in semantics of ndarray.base here: > > > https://github.com/numpy/numpy/commit/6c0ad59#commitcomment-2153047 > > And some folks asked me to post my question to the numpy mailing list. > I've implemented a tool for mapping processes in parallel applications to > nodes in cartesian networks. It uses hierarchies of numpy arrays to > represent the domain decomposition of the application, as well as > corresponding groups of processes on the network. You can "map" an > application to the network using assignment of through views. The tool is > here if anyone is curious: https://github.com/tgamblin/rubik. I used > numpy to implement this because I wanted to be able to do mappings for > arbitrary-dimensional networks. Blue Gene/Q, for example, has a 5-D > network. > > The reason I bring this up is because I rely on the ndarray.base pointer > and some of the semantics in __array_interface__ to translate indices > within my hierarchy of views. e.g., if a value is at (0,0) in a view I > want to know that it's actually at (4,4) in its immediate parent array. > > After looking over the commit I linked to above, I realized I'm actually > relying on a lot of stuff that's not guaranteed by numpy. I rely on .base > pointing to its closest parent, and I rely on __array_interface__.data > containing the address of the array's memory and its strides. None of > these is guaranteed by the API docs: > > http://docs.scipy.org/doc/numpy/reference/arrays.interface.html > Are you saying that data/strides aren't guaranteed because they're marked optional on that page? My interpretation of "optional" is that these fields don't have to be present for all objects implementing something that qualifies as an array interface (for example ndarrays don't need a mask), but it does not mean that everything marked optional can be changed without warning for ndarrays. So I guess I have a few questions: > > 1. Is translating indices between base arrays and views something that > would be useful to other people? > > 2. Is there some better way to do this than using ndarray.base and > __array_interface__? > > 3. What's the numpy philosophy on this? Should views know about their > parents or not? They obviously have to know a little bit about their > memory, but whether or not they know how they were derived from their > owning array is a different question. There was some discussion on the > vagueness of .base here: > > > http://thread.gmane.org/gmane.comp.python.numeric.general/51688/focus=51703 > > But it doesn't look like you're deprecating .base in 1.7, only changing > its behavior, which I tend to agree is worse than deprecating it. > > After thinking about all this, I'm not sure what I would like to happen. > I can see the value of not keeping extra references around within numpy, > and my domain is pretty different from the ways that I imagine people use > numpy. I wouldn't have to change my code much to make it work without > .base, but I do rely on __array_interface__. If that doesn't include the > address and strides, t think I'm screwed as far as translating indices go. > > Any suggestions? > The discussion on .base seems to have converged. As for __array_interface__, you could write a test which captures the essence of your use of ndarray.__array_interface__ and send a PR for it. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From staywithpin at gmail.com Sat Dec 8 06:20:21 2012 From: staywithpin at gmail.com (Daniel Wu) Date: Sat, 08 Dec 2012 19:20:21 +0800 Subject: [Numpy-discussion] pandas dataframe memory layout Message-ID: <50C32275.6090203@gmail.com> For numpy array, we can choose to use either C style or Forran stype. For dataframe in Pandas, is it possible to choose memory layout as in numpy array? From d.s.seljebotn at astro.uio.no Sat Dec 8 08:48:31 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 08 Dec 2012 14:48:31 +0100 Subject: [Numpy-discussion] Numpy Trac migration In-Reply-To: <50BFEAA3.1030306@iki.fi> References: <50BFEAA3.1030306@iki.fi> Message-ID: <50C3452F.9050108@astro.uio.no> On 12/06/2012 01:45 AM, Pauli Virtanen wrote: > Hi, > > For those whom it may concern: Since the Numpy Trac -> Github migration > is complete, I went ahead and added redirects > > projects.scipy.org/numpy/register -> github.com/numpy/numpy/issues > projects.scipy.org/numpy/newticket -> github.com/numpy/numpy/issues > > plus an ugly warning bar on top to direct potential users towards Github. Related news: https://github.com/blog/1347-issue-attachments Dag Sverre From chaoyuejoy at gmail.com Sat Dec 8 10:07:24 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Sat, 8 Dec 2012 16:07:24 +0100 Subject: [Numpy-discussion] pandas dataframe memory layout In-Reply-To: <50C32275.6090203@gmail.com> References: <50C32275.6090203@gmail.com> Message-ID: I don't know. Maybe you can ask here: http://stackoverflow.com/questions/tagged/pandas chao On Sat, Dec 8, 2012 at 12:20 PM, Daniel Wu wrote: > For numpy array, we can choose to use either C style or Forran stype. > For dataframe in Pandas, is it possible to choose memory layout as in > numpy array? > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From kamauallan at gmail.com Mon Dec 10 05:57:04 2012 From: kamauallan at gmail.com (Allan Kamau) Date: Mon, 10 Dec 2012 13:57:04 +0300 Subject: [Numpy-discussion] ImportError: libatlas.so.3: cannot open shared object file Message-ID: I have built and installed numpy on Debian from source successfully as follows. export LAPACK=/usr/lib/lapack/liblapack.so;export ATLAS=/usr/lib/atlas-base/libatlas.so; python setup.py build; python setup.py install; Then I change directory from the numpy sources directory. Then I give the command "python -c 'import numpy; numpy.test()' I get the error pasted below. How can I resource this issue? Traceback (most recent call last): File "", line 1, in File "/home/allank/SmashCell/local/lib/python2.7/site-packages/numpy/__init__.py", line 130, in import add_newdocs File "/home/allank/SmashCell/local/lib/python2.7/site-packages/numpy/add_newdocs.py", line 9, in from lib import add_newdoc File "/home/allank/SmashCell/local/lib/python2.7/site-packages/numpy/lib/__init__.py", line 13, in from polynomial import * File "/home/allank/SmashCell/local/lib/python2.7/site-packages/numpy/lib/polynomial.py", line 18, in from numpy.linalg import eigvals, lstsq File "/home/allank/SmashCell/local/lib/python2.7/site-packages/numpy/linalg/__init__.py", line 47, in from linalg import * File "/home/allank/SmashCell/local/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 22, in from numpy.linalg import lapack_lite ImportError: libatlas.so.3: cannot open shared object file: No such file or directory -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.eberspaecher at gmail.com Mon Dec 10 06:54:08 2012 From: alex.eberspaecher at gmail.com (Alexander =?ISO-8859-1?B?RWJlcnNw5GNoZXI=?=) Date: Mon, 10 Dec 2012 12:54:08 +0100 Subject: [Numpy-discussion] ImportError: libatlas.so.3: cannot open shared object file In-Reply-To: References: Message-ID: <20121210125408.0c02d3f3@poetzsch.nat.uni-magdeburg.de> On Mon, 10 Dec 2012 13:57:04 +0300 Allan Kamau wrote: > I have built and installed numpy on Debian from source successfully as > follows. [...] > ImportError: libatlas.so.3: cannot open shared object file: No such > file or directory Are the paths to ATLAS in your $LD_LIBRARY_PATH? If not, try adding those. Hope that helps! Cheers, Alex From kamauallan at gmail.com Mon Dec 10 07:09:21 2012 From: kamauallan at gmail.com (Allan Kamau) Date: Mon, 10 Dec 2012 15:09:21 +0300 Subject: [Numpy-discussion] ImportError: libatlas.so.3: cannot open shared object file In-Reply-To: <20121210125408.0c02d3f3@poetzsch.nat.uni-magdeburg.de> References: <20121210125408.0c02d3f3@poetzsch.nat.uni-magdeburg.de> Message-ID: I did add the paths to LD_LIBRARY_PATH as advised (see below), then "python setup.py clean;python setup.py build;python setup.py install;" but the same error persists. export LAPACK=/usr/lib/lapack/liblapack.so;export ATLAS=/usr/lib/atlas-base/libatlas.so; export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/lapack:/usr/lib/atlas-base; On Mon, Dec 10, 2012 at 2:54 PM, Alexander Ebersp?cher < alex.eberspaecher at gmail.com> wrote: > On Mon, 10 Dec 2012 13:57:04 +0300 > Allan Kamau wrote: > > > I have built and installed numpy on Debian from source successfully as > > follows. > [...] > > ImportError: libatlas.so.3: cannot open shared object file: No such > > file or directory > > Are the paths to ATLAS in your $LD_LIBRARY_PATH? If not, try adding > those. > > Hope that helps! > > Cheers, > > Alex > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Mon Dec 10 08:42:49 2012 From: shish at keba.be (Olivier Delalleau) Date: Mon, 10 Dec 2012 08:42:49 -0500 Subject: [Numpy-discussion] ImportError: libatlas.so.3: cannot open shared object file In-Reply-To: References: <20121210125408.0c02d3f3@poetzsch.nat.uni-magdeburg.de> Message-ID: 2012/12/10 Allan Kamau > I did add the paths to LD_LIBRARY_PATH as advised (see below), then > "python setup.py clean;python setup.py build;python setup.py install;" but > the same error persists. > > export LAPACK=/usr/lib/lapack/liblapack.so;export > ATLAS=/usr/lib/atlas-base/libatlas.so; > export > LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/lapack:/usr/lib/atlas-base; > Is the file libatlas.so.3 present in /usr/lib/lapack:/usr/lib/atlas-base? -=- Olivier > On Mon, Dec 10, 2012 at 2:54 PM, Alexander Ebersp?cher < > alex.eberspaecher at gmail.com> wrote: > >> On Mon, 10 Dec 2012 13:57:04 +0300 >> Allan Kamau wrote: >> >> > I have built and installed numpy on Debian from source successfully as >> > follows. >> [...] >> > ImportError: libatlas.so.3: cannot open shared object file: No such >> > file or directory >> >> Are the paths to ATLAS in your $LD_LIBRARY_PATH? If not, try adding >> those. >> >> Hope that helps! >> >> Cheers, >> >> Alex >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Mon Dec 10 17:30:05 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Mon, 10 Dec 2012 23:30:05 +0100 Subject: [Numpy-discussion] np.dstack vs np.concatenate? Message-ID: Dear all, I want to concate 13 mXn arrays into mXnX13 array. np version 1.6.2 I know the correct way to do that like here: http://stackoverflow.com/questions/8898471/concatenate-two-numpy-arrays-in-the-4th-dimension yet the np.dstack documentation also gives something like this: Equivalent to ``np.concatenate(tup, axis=2)``. so I tried this: In [10]: a = np.arange(8).reshape(2,4) In [11]: b=np.arange(9,17).reshape(2,4) In [12]: abd = np.dstack((a,b)) In [13]: abd.shape Out[13]: (2, 4, 2) In [14]: np.testing.assert_array_equal(abd[...,0],a) In [15]: np.testing.assert_array_equal(abd[...,1],b) In [16]: np.concatenate((a,b),axis=2) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () ----> 1 np.concatenate((a,b),axis=2) ValueError: bad axis1 argument to swapaxes so I guess for the 13 arrays, np.dstack will work. I can also do something like: array_list_old = [arr1, arr2, arr3, arr4, arr5, arr6] array_list = [arr[...,np.newaxis] for arr in array_list_old] array = np.concatenate(tuple(array_list),axis=2) So is there some inconsistency in the documentation? thanks, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From brad.froehle at gmail.com Mon Dec 10 17:38:50 2012 From: brad.froehle at gmail.com (Bradley M. Froehle) Date: Mon, 10 Dec 2012 14:38:50 -0800 Subject: [Numpy-discussion] np.dstack vs np.concatenate? In-Reply-To: References: Message-ID: The source for np.dstack would point the way towards a simpler implementation: array = np.concatenate(map(np.atleast_3d, (arr1, arr2, arr3, arr4, arr5, arr6)), axis=2) array_list_old = [arr1, arr2, arr3, arr4, arr5, arr6] > > array_list = [arr[...,np.newaxis] for arr in array_list_old] > array = np.concatenate(tuple(array_list),axis=2) > > So is there some inconsistency in the documentation? > Maybe. -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Dec 10 19:39:39 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 10 Dec 2012 19:39:39 -0500 Subject: [Numpy-discussion] segfaulting numpy with dot and datetime Message-ID: >>> np.__version__ '1.6.2' >>> aa array([1970-01-13 96:00:00, 1970-01-13 120:00:00, 1970-01-13 144:00:00, 1970-01-13 168:00:00, 1970-01-13 192:00:00], dtype=datetime64[ns]) >>> np.dot(aa, [1]) reported at http://stackoverflow.com/questions/13786209/regression-on-stock-data-using-pandas-and-matplotlib discussion https://groups.google.com/d/topic/pystatsmodels/5zpWBzSH8UE/discussion using linalg is "safe" but I doubt this casting to float is the desired result >>> np.linalg.pinv(aa[:,None]) array([[ 1.87768878e-19, 1.87784114e-19, 1.87799350e-19, 1.87814586e-19, 1.87829822e-19]]) >>> np.linalg.pinv(np.asarray(aa, float)[:,None]) array([[ 1.87768878e-19, 1.87784114e-19, 1.87799350e-19, 1.87814586e-19, 1.87829822e-19]]) Josef From charlesr.harris at gmail.com Mon Dec 10 20:26:35 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 10 Dec 2012 18:26:35 -0700 Subject: [Numpy-discussion] segfaulting numpy with dot and datetime In-Reply-To: References: Message-ID: On Mon, Dec 10, 2012 at 5:39 PM, wrote: > >>> np.__version__ > '1.6.2' > >>> aa > array([1970-01-13 96:00:00, 1970-01-13 120:00:00, 1970-01-13 144:00:00, > 1970-01-13 168:00:00, 1970-01-13 192:00:00], dtype=datetime64[ns]) > >>> np.dot(aa, [1]) > > > Hmm, I can't even get that array using current master, what with illegal hours and all. Datetime in 1.6.x wasn't quite up to snuff, so things might have been fixed in 1.7.0. Is there any way you can check that. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Dec 10 20:54:42 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 10 Dec 2012 20:54:42 -0500 Subject: [Numpy-discussion] segfaulting numpy with dot and datetime In-Reply-To: References: Message-ID: On Mon, Dec 10, 2012 at 8:26 PM, Charles R Harris wrote: > > On Mon, Dec 10, 2012 at 5:39 PM, wrote: >> >> >>> np.__version__ >> '1.6.2' >> >>> aa >> array([1970-01-13 96:00:00, 1970-01-13 120:00:00, 1970-01-13 144:00:00, >> 1970-01-13 168:00:00, 1970-01-13 192:00:00], dtype=datetime64[ns]) >> >>> np.dot(aa, [1]) >> >> > > Hmm, I can't even get that array using current master, what with illegal > hours and all. Datetime in 1.6.x wasn't quite up to snuff, so things might > have been fixed in 1.7.0. Is there any way you can check that. I didn't know the dates are illegal, they were created with pandas. Skipper said he couldn't replicate the segfault so it might be gone with a more recent numpy. I still have to setup a virtualenv for a 1.7.0 beta so I can start to test it. (I rely on binaries for numpy and scipy now.) Thanks, Josef > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Mon Dec 10 21:46:29 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 10 Dec 2012 21:46:29 -0500 Subject: [Numpy-discussion] segfaulting numpy with dot and datetime In-Reply-To: References:

Message-ID: On Mon, Dec 10, 2012 at 8:54 PM, wrote: > On Mon, Dec 10, 2012 at 8:26 PM, Charles R Harris > wrote: >> >> On Mon, Dec 10, 2012 at 5:39 PM, wrote: >>> >>> >>> np.__version__ >>> '1.6.2' >>> >>> aa >>> array([1970-01-13 96:00:00, 1970-01-13 120:00:00, 1970-01-13 144:00:00, >>> 1970-01-13 168:00:00, 1970-01-13 192:00:00], dtype=datetime64[ns]) >>> >>> np.dot(aa, [1]) >>> >>> >> >> Hmm, I can't even get that array using current master, what with illegal >> hours and all. Datetime in 1.6.x wasn't quite up to snuff, so things might >> have been fixed in 1.7.0. Is there any way you can check that. > > I didn't know the dates are illegal, they were created with pandas. here's the minimal numpy version >>> np.ones(5, 'datetime64[ns]') array([1970-01-01 00:00:00, 1970-01-01 00:00:00, 1970-01-01 00:00:00, 1970-01-01 00:00:00, 1970-01-01 00:00:00], dtype=datetime64[ns]) >>> b = np.ones(5, 'datetime64[ns]') >>> np.dot(b, [1]) virtualenv is next Josef > > Skipper said he couldn't replicate the segfault so it might be gone > with a more recent numpy. > I still have to setup a virtualenv for a 1.7.0 beta so I can start to test it. > (I rely on binaries for numpy and scipy now.) > > Thanks, > > Josef > >> >> Chuck >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> From josef.pktd at gmail.com Mon Dec 10 22:28:53 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 10 Dec 2012 22:28:53 -0500 Subject: [Numpy-discussion] segfaulting numpy with dot and datetime In-Reply-To: References:

Message-ID: On Mon, Dec 10, 2012 at 9:46 PM, wrote: > On Mon, Dec 10, 2012 at 8:54 PM, wrote: >> On Mon, Dec 10, 2012 at 8:26 PM, Charles R Harris >> wrote: >>> >>> On Mon, Dec 10, 2012 at 5:39 PM, wrote: >>>> >>>> >>> np.__version__ >>>> '1.6.2' >>>> >>> aa >>>> array([1970-01-13 96:00:00, 1970-01-13 120:00:00, 1970-01-13 144:00:00, >>>> 1970-01-13 168:00:00, 1970-01-13 192:00:00], dtype=datetime64[ns]) >>>> >>> np.dot(aa, [1]) >>>> >>>> >>> >>> Hmm, I can't even get that array using current master, what with illegal >>> hours and all. Datetime in 1.6.x wasn't quite up to snuff, so things might >>> have been fixed in 1.7.0. Is there any way you can check that. >> >> I didn't know the dates are illegal, they were created with pandas. > > here's the minimal numpy version > >>>> np.ones(5, 'datetime64[ns]') > array([1970-01-01 00:00:00, 1970-01-01 00:00:00, 1970-01-01 00:00:00, > 1970-01-01 00:00:00, 1970-01-01 00:00:00], dtype=datetime64[ns]) >>>> b = np.ones(5, 'datetime64[ns]') >>>> np.dot(b, [1]) > > > virtualenv is next much better (py27d) E:\Josef\testing\tox\py27d\Scripts>python Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> np.__version__ '1.7.0b2' >>> b = np.ones(5, 'datetime64[ns]') >>> b array(['1969-12-31T19:00:00.000000001-0500', '1969-12-31T19:00:00.000000001-0500', '1969-12-31T19:00:00.000000001-0500', '1969-12-31T19:00:00.000000001-0500', '1969-12-31T19:00:00.000000001-0500'], dtype='datetime64[ns]') >>> np.dot(b, [1]) Traceback (most recent call last): File "", line 1, in TypeError: Cannot cast array data from dtype('>> b = np.arange(10).astype('datetime64[ns]').reshape(2,5) >>> b array([['1969-12-31T19:00:00.000000000-0500', '1969-12-31T19:00:00.000000001-0500', '1969-12-31T19:00:00.000000002-0500', '1969-12-31T19:00:00.000000003-0500', '1969-12-31T19:00:00.000000004-0500'], ['1969-12-31T19:00:00.000000005-0500', '1969-12-31T19:00:00.000000006-0500', '1969-12-31T19:00:00.000000007-0500', '1969-12-31T19:00:00.000000008-0500', '1969-12-31T19:00:00.000000009-0500']], dtype='datetime64[ns]') >>> np.linalg.pinv(b) array([[ -3.20000000e-01, 1.20000000e-01], [ -1.80000000e-01, 8.00000000e-02], [ -4.00000000e-02, 4.00000000e-02], [ 1.00000000e-01, -1.73472348e-17], [ 2.40000000e-01, -4.00000000e-02]]) >>> np.linalg.qr(b) (array([[ 0., -1.], [-1., 0.]]), array([[-5., -6., -7., -8., -9.], [ 0., -1., -2., -3., -4.]])) >>> c array([['2003-09-28T20:00:00.000000000-0400', '2003-09-29T20:00:00.000000000-0400', '2003-09-30T20:00:00.000000000-0400', '2003-10-01T20:00:00.000000000-0400', '2003-10-02T20:00:00.000000000-0400'], ['2003-10-05T20:00:00.000000000-0400', '2003-10-06T20:00:00.000000000-0400', '2003-10-07T20:00:00.000000000-0400', '2003-10-08T20:00:00.000000000-0400', '2003-10-09T20:00:00.000000000-0400']], dtype='datetime64[ns]') >>> np.linalg.pinv(c) array([[ -4.07870371e-12, 4.07638889e-12], [ -2.03951719e-12, 2.03835978e-12], [ -3.30684814e-16, 3.30684816e-16], [ 2.03885582e-12, -2.03769841e-12], [ 4.07804232e-12, -4.07572751e-12]]) Josef > > Josef > >> >> Skipper said he couldn't replicate the segfault so it might be gone >> with a more recent numpy. >> I still have to setup a virtualenv for a 1.7.0 beta so I can start to test it. >> (I rely on binaries for numpy and scipy now.) >> >> Thanks, >> >> Josef >> >>> >>> Chuck >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> From ndbecker2 at gmail.com Tue Dec 11 11:44:26 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 11 Dec 2012 11:44:26 -0500 Subject: [Numpy-discussion] non-integer index misfeature? Message-ID: I think it's a misfeature that a floating point is silently accepted as an index. I would prefer a warning for: bins = np.arange (...) for b in bins: ... w[b] = blah when I meant: for ib,b in enumerate (bins): w[ib] = blah From jason-sage at creativetrax.com Wed Dec 12 09:51:02 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Wed, 12 Dec 2012 08:51:02 -0600 Subject: [Numpy-discussion] IPython receives $1.15 million from Alfred P. Sloan Foundation Message-ID: <50C899D6.3030805@creativetrax.com> Hi everyone, Just FYI, IPython just received $1.15 million in funding from the Alfred P. Sloan Foundation to support development over the next 2 years. Fernando talks more about this in his post to the IPython mailing list: http://mail.scipy.org/pipermail/ipython-dev/2012-December/010799.html It's great to see a significant open-source python project that many of us use on a day-to-day basis get such great funding! Thanks, Jason -- Jason Grout From ioannis87 at gmail.com Tue Dec 11 05:51:24 2012 From: ioannis87 at gmail.com (ioannis syntychakis) Date: Tue, 11 Dec 2012 11:51:24 +0100 Subject: [Numpy-discussion] unsubscribe Message-ID: -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Dec 12 15:20:04 2012 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 12 Dec 2012 21:20:04 +0100 Subject: [Numpy-discussion] non-integer index misfeature? In-Reply-To: References: Message-ID: On Tue, Dec 11, 2012 at 5:44 PM, Neal Becker wrote: > I think it's a misfeature that a floating point is silently accepted as an > index. I would prefer a warning for: > > bins = np.arange (...) > > for b in bins: > ... > w[b] = blah > > when I meant: > > for ib,b in enumerate (bins): > w[ib] = blah > Agreed. Scipy.special functions were just changed to generate warnings on truncation of float inputs where ints are expected (only if truncation changes the value, so 3.0 is silent and 3.1 is not). For numpy indexing this may not be appropriate though; checking every index value used could slow things down and/or be quite disruptive. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Dec 12 15:48:42 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 12 Dec 2012 20:48:42 +0000 Subject: [Numpy-discussion] non-integer index misfeature? In-Reply-To: References: Message-ID: On Wed, Dec 12, 2012 at 8:20 PM, Ralf Gommers wrote: > > On Tue, Dec 11, 2012 at 5:44 PM, Neal Becker wrote: >> >> I think it's a misfeature that a floating point is silently accepted as an >> index. I would prefer a warning for: >> >> bins = np.arange (...) >> >> for b in bins: >> ... >> w[b] = blah >> >> when I meant: >> >> for ib,b in enumerate (bins): >> w[ib] = blah > > > Agreed. Scipy.special functions were just changed to generate warnings on > truncation of float inputs where ints are expected (only if truncation > changes the value, so 3.0 is silent and 3.1 is not). > > For numpy indexing this may not be appropriate though; checking every index > value used could slow things down and/or be quite disruptive. I doubt this is measurable, and it only affects people who are using floats as indexes, which is a risky thing to be doing in the first place. The only good reason to use floats as indexes is if you're doing floating point arithmetic to calculate indexes -- but now you're going to get bitten as soon as some operation returns N - epsilon instead of N, and gets truncated to N - 1. I'd be +1 for a patch to make numpy warn when indexing with non-integer floats. (Heck, I'd probably be +1 on deprecating allowing floating point numbers as indexes at all... it's risky as heck and reminding people to think about rounding can only be a good thing, given that risk.) -n From d.warde.farley at gmail.com Wed Dec 12 16:09:56 2012 From: d.warde.farley at gmail.com (David Warde-Farley) Date: Wed, 12 Dec 2012 16:09:56 -0500 Subject: [Numpy-discussion] non-integer index misfeature? In-Reply-To: References: Message-ID: On Wed, Dec 12, 2012 at 3:20 PM, Ralf Gommers wrote: > For numpy indexing this may not be appropriate though; checking every index > value used could slow things down and/or be quite disruptive. For array fancy indices, a dtype check on the entire array would suffice. For lists and scalars, I doubt it'd be substantial amount of overhead compared to the overhead that already exists (for lists, I'm not totally sure whether these are coerced to arrays first -- if they are, the dtype check need only be performed once after that). At any rate, a benchmark of any proposed solution is in order. David From njs at pobox.com Wed Dec 12 16:56:32 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 12 Dec 2012 21:56:32 +0000 Subject: [Numpy-discussion] non-integer index misfeature? In-Reply-To: References: Message-ID: On Wed, Dec 12, 2012 at 9:09 PM, David Warde-Farley wrote: > On Wed, Dec 12, 2012 at 3:20 PM, Ralf Gommers wrote: >> For numpy indexing this may not be appropriate though; checking every index >> value used could slow things down and/or be quite disruptive. > > For array fancy indices, a dtype check on the entire array would > suffice. For lists and scalars, I doubt it'd be substantial amount of > overhead compared to the overhead that already exists (for lists, I'm > not totally sure whether these are coerced to arrays first -- if they > are, the dtype check need only be performed once after that). At any > rate, a benchmark of any proposed solution is in order. The current behaviour seems to be: Scalars are silently cast to int: In [6]: a = np.arange(10) In [7]: a[1.5] Out[7]: 1 So are lists: In [8]: a[[1.5, 2.5]] Out[8]: array([1, 2]) In fact, it looks like lists always get passed through np.array(mylist, dtype=int), because even a list of booleans gets cast to integer: In [21]: a[[False] * 10] Out[21]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) But arrays must have integer type to start with: In [10]: a[np.array([1.5, 2.5])] IndexError: arrays used as indices must be of integer (or boolean) type Complex scalars are also cast (but with a warning): In [13]: a[np.complex64(1.5)] /home/njs/.user-python2.7-64bit/bin/ipython:1: ComplexWarning: Casting complex values to real discards the imaginary part Out[13]: 1 So now I'm even more in favour of making floating point indexes a hard error (with a deprecation period). https://github.com/numpy/numpy/issues/2810 Also we should fix it so that a[list] acts exactly like a[np.array(list)]. (Again with a suitable deprecation period.) The current behaviour is really weird! https://github.com/numpy/numpy/issues/2811 -n P.S. to Neal: none of this is going to help your original code though, because your 'bins' array actually contains integers... From sebastian at sipsolutions.net Wed Dec 12 17:29:01 2012 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 12 Dec 2012 23:29:01 +0100 Subject: [Numpy-discussion] non-integer index misfeature? In-Reply-To: References: Message-ID: <1355351341.2609.11.camel@sebastian-laptop> On Wed, 2012-12-12 at 20:48 +0000, Nathaniel Smith wrote: > On Wed, Dec 12, 2012 at 8:20 PM, Ralf Gommers wrote: > > > > On Tue, Dec 11, 2012 at 5:44 PM, Neal Becker wrote: > >> > >> I think it's a misfeature that a floating point is silently accepted as an > >> index. I would prefer a warning for: > >> > >> bins = np.arange (...) > >> > >> for b in bins: > >> ... > >> w[b] = blah > >> > >> when I meant: > >> > >> for ib,b in enumerate (bins): > >> w[ib] = blah > > > > > > Agreed. Scipy.special functions were just changed to generate warnings on > > truncation of float inputs where ints are expected (only if truncation > > changes the value, so 3.0 is silent and 3.1 is not). > > > > For numpy indexing this may not be appropriate though; checking every index > > value used could slow things down and/or be quite disruptive. > > I doubt this is measurable, and it only affects people who are using > floats as indexes, which is a risky thing to be doing in the first > place. The only good reason to use floats as indexes is if you're > doing floating point arithmetic to calculate indexes -- but now you're > going to get bitten as soon as some operation returns N - epsilon > instead of N, and gets truncated to N - 1. > > I'd be +1 for a patch to make numpy warn when indexing with > non-integer floats. (Heck, I'd probably be +1 on deprecating allowing > floating point numbers as indexes at all... it's risky as heck and > reminding people to think about rounding can only be a good thing, > given that risk.) > Personally +1 on just deprecating that stuff in the long run. Just if someone is interested I remember seeing this comment (which applies for the scalar case): /* * PyNumber_Index was introduced in Python 2.5 because of NumPy. * http://www.python.org/dev/peps/pep-0357/ * Let's use it for indexing! * * Unfortunately, SciPy and possibly other code seems to rely * on the lenient coercion. :( */ #if 0 /*PY_VERSION_HEX >= 0x02050000*/ PyObject *ind = PyNumber_Index(op); if (ind != NULL) { value = PyArray_PyIntAsIntp(ind); Py_DECREF(ind); } else { value = -1; } #else and is is somewhat related. But with a long deprecation process switching to using `__index__` would seem possible to me. Regards, Sebastian > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ndbecker2 at gmail.com Thu Dec 13 06:58:09 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 13 Dec 2012 06:58:09 -0500 Subject: [Numpy-discussion] non-integer index misfeature? References: <1355351341.2609.11.camel@sebastian-laptop> Message-ID: I'd be happy with disallowing floating point index at all. I would think it was almost always a mistake. From charlesr.harris at gmail.com Thu Dec 13 11:34:53 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 13 Dec 2012 09:34:53 -0700 Subject: [Numpy-discussion] Proposal to drop python 2.4 support in numpy 1.8 Message-ID: Time to raise this topic again. Opinions welcome. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From teoliphant at gmail.com Thu Dec 13 11:35:01 2012 From: teoliphant at gmail.com (Travis Oliphant) Date: Thu, 13 Dec 2012 10:35:01 -0600 Subject: [Numpy-discussion] www.numpy.org home page Message-ID: <6E409FEC-FD9F-41F6-BC02-F227EE42F077@gmail.com> For people interested in the www.numpy.org home page: Jon Turner has officially transferred the www.numpy.org domain to NumFOCUS. Thank you, Jon for this donation and for being a care-taker of the domain-name. We have setup the domain registration to point to numpy.github.com and I've changed the CNAME in that repostiory to www.numpy.org I've sent an email to have the numpy.scipy.org page to redirect to www.numpy.org. The NumPy home page can still be edited in this repository: git at github.com:numpy/numpy.org.git. Pull requests are always welcome --- especially pull requests that improve the look and feel of the web-page. Two of the content changes that we need to make a decision about is 1) whether or not to put links to books published (Packt publishing for example has offered a higher percentage of their revenues if we put a prominent link on www.numpy.org) 2) whether or not to accept "Sponsored by" links on the home page for donations to the project (e.g. Continuum Analytics has sponsored Ondrej release management, other companies have sponsored pull requests, other companies may want to provide donations and we would want to recognize their contributions to the numpy project). These decisions should be made by the NumPy community which in my mind are interested people on this list. Who is interested in this kind of discussion? We could have these discussions on this list or on the numfocus at googlegroups.com list and keep this list completely technical (which I prefer, but I will do whatever the consensus is). Best regards, -Travis From travis at continuum.io Thu Dec 13 11:36:49 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 13 Dec 2012 10:36:49 -0600 Subject: [Numpy-discussion] Proposal to drop python 2.4 support in numpy 1.8 In-Reply-To: References: Message-ID: A big +1 from me --- but I don't have anyone I know using 2.4 anymore.... -Travis On Dec 13, 2012, at 10:34 AM, Charles R Harris wrote: > Time to raise this topic again. Opinions welcome. > > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ben.root at ou.edu Thu Dec 13 11:39:55 2012 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 13 Dec 2012 11:39:55 -0500 Subject: [Numpy-discussion] Proposal to drop python 2.4 support in numpy 1.8 In-Reply-To: References:

Message-ID: As a point of reference, python 2.4 is on RH5/CentOS5. While RH6 is the current version, there are still enterprises that are using version 5. Of course, at this point, one really should be working on a migration plan and shouldn't be doing new development on those machines... Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From scopatz at gmail.com Thu Dec 13 11:46:54 2012 From: scopatz at gmail.com (Anthony Scopatz) Date: Thu, 13 Dec 2012 10:46:54 -0600 Subject: [Numpy-discussion] Proposal to drop python 2.4 support in numpy 1.8 In-Reply-To: References:

Message-ID: +1, if someone wants to use an older version of Python they can use an older version of numpy. On Thu, Dec 13, 2012 at 10:36 AM, Travis Oliphant wrote: > A big +1 from me --- but I don't have anyone I know using 2.4 anymore.... > > -Travis > > On Dec 13, 2012, at 10:34 AM, Charles R Harris wrote: > > > Time to raise this topic again. Opinions welcome. > > > > Chuck > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Thu Dec 13 11:49:28 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Thu, 13 Dec 2012 11:49:28 -0500 Subject: [Numpy-discussion] Proposal to drop python 2.4 support in numpy 1.8 In-Reply-To: References:

Message-ID: +1, especially if this is for 1.8. What is the plan for when it will be released? It is 1.7 that will be the long term supported version? Fred On Thu, Dec 13, 2012 at 11:46 AM, Anthony Scopatz wrote: > +1, if someone wants to use an older version of Python they can use an older > version of numpy. > > > On Thu, Dec 13, 2012 at 10:36 AM, Travis Oliphant > wrote: >> >> A big +1 from me --- but I don't have anyone I know using 2.4 anymore.... >> >> -Travis >> >> On Dec 13, 2012, at 10:34 AM, Charles R Harris wrote: >> >> > Time to raise this topic again. Opinions welcome. >> > >> > Chuck >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From cournape at gmail.com Thu Dec 13 12:00:12 2012 From: cournape at gmail.com (David Cournapeau) Date: Thu, 13 Dec 2012 18:00:12 +0100 Subject: [Numpy-discussion] Proposal to drop python 2.4 support in numpy 1.8 In-Reply-To: References: Message-ID: On Thu, Dec 13, 2012 at 5:34 PM, Charles R Harris wrote: > Time to raise this topic again. Opinions welcome. I am ok if 1.7 is the LTS. I would even go as far as dropping 2.5 as well then (RHEL 6 uses python 2.6). cheers, David From jsseabold at gmail.com Thu Dec 13 12:03:14 2012 From: jsseabold at gmail.com (Skipper Seabold) Date: Thu, 13 Dec 2012 12:03:14 -0500 Subject: [Numpy-discussion] Proposal to drop python 2.4 support in numpy 1.8 In-Reply-To: References: Message-ID: On Thu, Dec 13, 2012 at 12:00 PM, David Cournapeau wrote: > I would even go as far as dropping 2.5 as well then (RHEL 6 > uses python 2.6). +1 Skipper From sturla at molden.no Thu Dec 13 12:11:41 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 13 Dec 2012 18:11:41 +0100 Subject: [Numpy-discussion] Proposal to drop python 2.4 support in numpy 1.8 In-Reply-To: References: Message-ID: <4E1C143A-B952-4716-BDBA-1A91A3EAD82C@molden.no> Yes, and ditto for SciPy. With dropped 2.4 support we can also use the new memoryview syntax instead of ndarray syntax in Cython. That is more important for SciPy, but it has some relevance for NumPy too. Sturla Sendt fra min iPad Den 13. des. 2012 kl. 17:34 skrev Charles R Harris : > Time to raise this topic again. Opinions welcome. > > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From brad.froehle at gmail.com Thu Dec 13 12:12:38 2012 From: brad.froehle at gmail.com (Bradley M. Froehle) Date: Thu, 13 Dec 2012 09:12:38 -0800 Subject: [Numpy-discussion] Proposal to drop python 2.4 support in numpy 1.8 In-Reply-To: References: Message-ID: Targeting >= 2.6 would be preferable to me. Several other packages including IPython, support only Python >= 2.6, >= 3.2. This change would help me from accidentally writing Python syntax which is allowable in 2.6 & 2.7 (but not in 2.4 or 2.5). Compiling a newer Python interpreter isn't very hard? probably about as difficult as installing NumPy. -Brad On Thursday, December 13, 2012 at 9:03 AM, Skipper Seabold wrote: > On Thu, Dec 13, 2012 at 12:00 PM, David Cournapeau wrote: > > > I would even go as far as dropping 2.5 as well then (RHEL 6 > > uses python 2.6). > > +1 From charlesr.harris at gmail.com Thu Dec 13 12:36:56 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 13 Dec 2012 10:36:56 -0700 Subject: [Numpy-discussion] Proposal to drop python 2.4 support in numpy 1.8 In-Reply-To: References: Message-ID: On Thu, Dec 13, 2012 at 10:12 AM, Bradley M. Froehle wrote: > Targeting >= 2.6 would be preferable to me. Several other packages > including IPython, support only Python >= 2.6, >= 3.2. > > This change would help me from accidentally writing Python syntax which is > allowable in 2.6 & 2.7 (but not in 2.4 or 2.5). > > Compiling a newer Python interpreter isn't very hard? probably about as > difficult as installing NumPy. > > -Brad > > > On Thursday, December 13, 2012 at 9:03 AM, Skipper Seabold wrote: > > > On Thu, Dec 13, 2012 at 12:00 PM, David Cournapeau cournape at gmail.com)> wrote: > > > > > I would even go as far as dropping 2.5 as well then (RHEL 6 > > > uses python 2.6). > > > > +1 > > OK. Dropping support for python 2.4 looks like the majority opinion. I'll put up another post for 2.5. Do we need to coordinate with scipy? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Dec 13 12:38:39 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 13 Dec 2012 10:38:39 -0700 Subject: [Numpy-discussion] Support for python 2.4 dropped. Should we drop 2.5 also? Message-ID: The previous proposal to drop python 2.4 support garnered no opposition. How about dropping support for python 2.5 also? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jniehof at lanl.gov Thu Dec 13 12:39:29 2012 From: jniehof at lanl.gov (Jonathan T. Niehof) Date: Thu, 13 Dec 2012 10:39:29 -0700 Subject: [Numpy-discussion] Proposal to drop python 2.4 support in numpy 1.8 In-Reply-To: References:

Message-ID: <50CA12D1.6030406@lanl.gov> On 12/13/2012 09:39 AM, Benjamin Root wrote: > As a point of reference, python 2.4 is on RH5/CentOS5. While RH6 is the > current version, there are still enterprises that are using version 5. > Of course, at this point, one really should be working on a migration > plan and shouldn't be doing new development on those machines... FWIW this RHEL5* shop uses a local install of 2.6 rather than dealing with 2.4. Happy with dropping 2.4 support. Bonus from dropping 2.5: it's pretty easy to support a 2.6/3.x combined codebase without relying on 2to3. (*Not the same as RH5, from days of yore...) -- Jonathan Niehof ISR-3 Space Data Systems Los Alamos National Laboratory MS-D466 Los Alamos, NM 87545 Phone: 505-667-9595 email: jniehof at lanl.gov Correspondence / Technical data or Software Publicly Available From shish at keba.be Thu Dec 13 12:46:00 2012 From: shish at keba.be (Olivier Delalleau) Date: Thu, 13 Dec 2012 12:46:00 -0500 Subject: [Numpy-discussion] Support for python 2.4 dropped. Should we drop 2.5 also? In-Reply-To: References: Message-ID: I'd say it's a good idea, although I hope 1.7.x will still be maintained for a while for those who are still stuck with Python 2.4-5 (sometimes you don't have a choice). -=- Olivier 2012/12/13 Charles R Harris > The previous proposal to drop python 2.4 support garnered no opposition. > How about dropping support for python 2.5 also? > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Thu Dec 13 13:04:23 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 13 Dec 2012 19:04:23 +0100 Subject: [Numpy-discussion] site.cfg: Custom BLAS / LAPACK configuration In-Reply-To: <50C23C52.3030204@astro.uio.no> References: <20121206132908.769707dd@poetzsch.nat.uni-magdeburg.de>

<7C5D3880ABAC4B1AB19214CDCBA2227A@gmail.com> <50C23C52.3030204@astro.uio.no> Message-ID: <50CA18A7.3030703@astro.uio.no> On 12/07/2012 07:58 PM, Dag Sverre Seljebotn wrote: > One way of fixing this I'm sort of itching to do is to create a > "pylapack" project which can iterate quickly on these build issues, > run-time selection of LAPACK backend and so on. (With some templates > generating some Cython code it shouldn't be more than a few days for an > MVP.) > > Then patch NumPy to attempt to import pylapack (and if it's there, get a > table of function pointers from it). > > The idea would be that powerusers could more easily build pylapack the > way they wanted, in isolation, and then have other things switch to that > without a rebuild (note again that it would need to export a table of > function pointers). > > But I'm itching to do too many things, we'll see. Update: This can be tackled as part of Hashdist funding, so I'm hoping that I can 3-4 days on improving the LAPACK situation in January; as an option for power-users at first, hopefully something nice for everybody eventually. Opinions on the plan above welcome. Dag Sverre > > Dag Sverre > > On 12/07/2012 07:09 PM, Bradley M. Froehle wrote: >> Aha, thanks for the clarification. I've always been surpassed that NumPy doesn't ship with a copy of CBLAS. It's easy to compile --- just a thin wrapper over BLAS, if I remember correctly. >> >> -Brad >> >> >> On Friday, December 7, 2012 at 4:01 AM, David Cournapeau wrote: >> >>> On Thu, Dec 6, 2012 at 7:35 PM, Bradley M. Froehle >>> wrote: >>>> Right, but if I link to libcblas, cblas would be available, no? >>> >>> >>> >>> No, because we don't explicitly check for CBLAS. We assume it is there >>> if Atlas, Accelerate or MKL is found. >>> >>> cheers, >>> David >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ben.root at ou.edu Thu Dec 13 13:12:23 2012 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 13 Dec 2012 13:12:23 -0500 Subject: [Numpy-discussion] Support for python 2.4 dropped. Should we drop 2.5 also? In-Reply-To: References: Message-ID: On Thu, Dec 13, 2012 at 12:38 PM, Charles R Harris < charlesr.harris at gmail.com> wrote: > The previous proposal to drop python 2.4 support garnered no opposition. > How about dropping support for python 2.5 also? > > Chuck > > matplotlib 1.2 supports py2.5. I haven't seen any plan to move off of that for 1.3. Is there a compelling reason for dropping 2.5? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Thu Dec 13 13:14:54 2012 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 13 Dec 2012 13:14:54 -0500 Subject: [Numpy-discussion] Support for python 2.4 dropped. Should we drop 2.5 also? In-Reply-To: References:

Message-ID: My apologies... we support 2.6 and above. +1 on dropping 2.5 support. Ben On Thu, Dec 13, 2012 at 1:12 PM, Benjamin Root wrote: > On Thu, Dec 13, 2012 at 12:38 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> The previous proposal to drop python 2.4 support garnered no opposition. >> How about dropping support for python 2.5 also? >> >> Chuck >> >> > matplotlib 1.2 supports py2.5. I haven't seen any plan to move off of > that for 1.3. Is there a compelling reason for dropping 2.5? > > Ben Root > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Thu Dec 13 13:57:42 2012 From: cournape at gmail.com (David Cournapeau) Date: Thu, 13 Dec 2012 19:57:42 +0100 Subject: [Numpy-discussion] Support for python 2.4 dropped. Should we drop 2.5 also? In-Reply-To: References:

Message-ID: On Thu, Dec 13, 2012 at 7:12 PM, Benjamin Root wrote: > On Thu, Dec 13, 2012 at 12:38 PM, Charles R Harris > wrote: >> >> The previous proposal to drop python 2.4 support garnered no opposition. >> How about dropping support for python 2.5 also? >> >> Chuck >> > > matplotlib 1.2 supports py2.5. I haven't seen any plan to move off of that > for 1.3. Is there a compelling reason for dropping 2.5? A rationale for the record: I don't think people who don't care about 2.4 care about 2.5, and 2.6 is a significant improvement compared to 2.5: - context manager - python 3-compatible exception syntax (writing code that works with 2 and 3 without any change is significantly easier if your baseline in 2.6 instead of 2.4/2.5) - json, ast, multiprocessing are available and potentially quite useful for NumPy itself. cheers, David From cournape at gmail.com Thu Dec 13 13:59:34 2012 From: cournape at gmail.com (David Cournapeau) Date: Thu, 13 Dec 2012 19:59:34 +0100 Subject: [Numpy-discussion] site.cfg: Custom BLAS / LAPACK configuration In-Reply-To: <50C23C52.3030204@astro.uio.no> References: <20121206132908.769707dd@poetzsch.nat.uni-magdeburg.de>

<7C5D3880ABAC4B1AB19214CDCBA2227A@gmail.com> <50C23C52.3030204@astro.uio.no> Message-ID: On Fri, Dec 7, 2012 at 7:58 PM, Dag Sverre Seljebotn wrote: > One way of fixing this I'm sort of itching to do is to create a > "pylapack" project which can iterate quickly on these build issues, > run-time selection of LAPACK backend and so on. (With some templates > generating some Cython code it shouldn't be more than a few days for an > MVP.) > > Then patch NumPy to attempt to import pylapack (and if it's there, get a > table of function pointers from it). > > The idea would be that powerusers could more easily build pylapack the > way they wanted, in isolation, and then have other things switch to that > without a rebuild (note again that it would need to export a table of > function pointers). > > But I'm itching to do too many things, we'll see. It would be hard to support in a cross platform way I think (windows being the elephant in the room). I would be more than happy to be proven wrong, though :) cheers, David From d.s.seljebotn at astro.uio.no Thu Dec 13 14:10:28 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 13 Dec 2012 20:10:28 +0100 Subject: [Numpy-discussion] site.cfg: Custom BLAS / LAPACK configuration In-Reply-To: References: <20121206132908.769707dd@poetzsch.nat.uni-magdeburg.de>

<7C5D3880ABAC4B1AB19214CDCBA2227A@gmail.com> <50C23C52.3030204@astro.uio.no> Message-ID: <50CA2824.1050200@astro.uio.no> On 12/13/2012 07:59 PM, David Cournapeau wrote: > On Fri, Dec 7, 2012 at 7:58 PM, Dag Sverre Seljebotn > wrote: >> One way of fixing this I'm sort of itching to do is to create a >> "pylapack" project which can iterate quickly on these build issues, >> run-time selection of LAPACK backend and so on. (With some templates >> generating some Cython code it shouldn't be more than a few days for an >> MVP.) >> >> Then patch NumPy to attempt to import pylapack (and if it's there, get a >> table of function pointers from it). >> >> The idea would be that powerusers could more easily build pylapack the >> way they wanted, in isolation, and then have other things switch to that >> without a rebuild (note again that it would need to export a table of >> function pointers). >> >> But I'm itching to do too many things, we'll see. > > It would be hard to support in a cross platform way I think (windows > being the elephant in the room). I would be more than happy to be > proven wrong, though :) Right, perhaps I should have said "Linux users" rather than "power users" :-) But seriously, I'd like to have a few more details here. Is it something about calling conventions and "libc" and so on that makes the function-pointer-table approach difficult? (But the NumPy C API is exported in the same way?) Or is it about the build? Can't one just lift whatever numpy.distutils or the NumPy Bento build does for detection? Though what I'll aim for at first is the complete opposite, no auto-detection, but a Windows user should also be able to specify the right flags manually... Dag Sverre From philip at semanchuk.com Thu Dec 13 14:26:23 2012 From: philip at semanchuk.com (Philip Semanchuk) Date: Thu, 13 Dec 2012 14:26:23 -0500 Subject: [Numpy-discussion] Calling LAPACK function dbdsqr()? Message-ID: <9854A6C5-8458-4158-A999-42EA363D1A51@semanchuk.com> Hi all, I'm porting some Fortran code that makes use of a number of BLAS and LAPACK functions, including dbdsqr(). I've found all of the functions I need via scipy.linalg.lapack.get_lapack_funcs/get_blas_funcs() except for dbdsqr(). I see that the numpy source code (I looked at numpy-1.6.0b2) contains dbdsqr() in numpy/linalg/dlapack_lite.c, but grep can't find that string in the binary distribution on my Mac nor on Linux. If it's buried in a numpy binary somewhere, I'm comfortable with using ctypes to call it, but I suspect it isn't. Can anyone point me to a cross-platform (OS X, Linux & Windows) way I can call this function? I'm unfortunately quite na?ve about the math in the code I'm porting, so I'm porting the code blindly -- if you ask me what problem I'm trying to solve with dbdsqr(), I won't be able to explain. Thanks in advance for any suggestions, Philip From charlesr.harris at gmail.com Thu Dec 13 14:47:30 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 13 Dec 2012 12:47:30 -0700 Subject: [Numpy-discussion] Calling LAPACK function dbdsqr()? In-Reply-To: <9854A6C5-8458-4158-A999-42EA363D1A51@semanchuk.com> References: <9854A6C5-8458-4158-A999-42EA363D1A51@semanchuk.com> Message-ID: On Thu, Dec 13, 2012 at 12:26 PM, Philip Semanchuk wrote: > Hi all, > I'm porting some Fortran code that makes use of a number of BLAS and > LAPACK functions, including dbdsqr(). I've found all of the functions I > need via scipy.linalg.lapack.get_lapack_funcs/get_blas_funcs() except for > dbdsqr(). > > I see that the numpy source code (I looked at numpy-1.6.0b2) contains > dbdsqr() in numpy/linalg/dlapack_lite.c, but grep can't find that string in > the binary distribution on my Mac nor on Linux. If it's buried in a numpy > binary somewhere, I'm comfortable with using ctypes to call it, but I > suspect it isn't. > > Can anyone point me to a cross-platform (OS X, Linux & Windows) way I can > call this function? > > I'm unfortunately quite na?ve about the math in the code I'm porting, so > I'm porting the code blindly -- if you ask me what problem I'm trying to > solve with dbdsqr(), I won't be able to explain. > > Not all the functions in lapack_lite are exposed in numpy. You might want to post on the scipy list, there have been recent additions supporting more lapack functions. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Thu Dec 13 15:03:56 2012 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 13 Dec 2012 22:03:56 +0200 Subject: [Numpy-discussion] Calling LAPACK function dbdsqr()? In-Reply-To: <9854A6C5-8458-4158-A999-42EA363D1A51@semanchuk.com> References: <9854A6C5-8458-4158-A999-42EA363D1A51@semanchuk.com> Message-ID: 13.12.2012 21:26, Philip Semanchuk kirjoitti: > I'm porting some Fortran code that makes use of a number of BLAS and > LAPACK functions, including dbdsqr(). I've found all of the functions > I need via scipy.linalg.lapack.get_lapack_funcs/get_blas_funcs() except > for dbdsqr(). [clip] If you tolerate having compiled code, you can relatively easily wrap the LAPACK routine you need using f2py. -- Pauli Virtanen From ralf.gommers at gmail.com Thu Dec 13 15:17:49 2012 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 13 Dec 2012 21:17:49 +0100 Subject: [Numpy-discussion] Proposal to drop python 2.4 support in numpy 1.8 In-Reply-To: References: Message-ID: On Thu, Dec 13, 2012 at 6:36 PM, Charles R Harris wrote: > > > On Thu, Dec 13, 2012 at 10:12 AM, Bradley M. Froehle < > brad.froehle at gmail.com> wrote: > >> Targeting >= 2.6 would be preferable to me. Several other packages >> including IPython, support only Python >= 2.6, >= 3.2. >> >> This change would help me from accidentally writing Python syntax which >> is allowable in 2.6 & 2.7 (but not in 2.4 or 2.5). >> >> Compiling a newer Python interpreter isn't very hard? probably about as >> difficult as installing NumPy. >> >> -Brad >> >> >> On Thursday, December 13, 2012 at 9:03 AM, Skipper Seabold wrote: >> >> > On Thu, Dec 13, 2012 at 12:00 PM, David Cournapeau > cournape at gmail.com)> wrote: >> > >> > > I would even go as far as dropping 2.5 as well then (RHEL 6 >> > > uses python 2.6). >> > >> > +1 >> > +1 > > OK. Dropping support for python 2.4 looks like the majority opinion. I'll > put up another post for 2.5. Do we need to coordinate with scipy? > Not much to coordinate I think. I'll send a message to scipy-dev proposing to simply follow the Numpy decision. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From philip at semanchuk.com Thu Dec 13 15:34:08 2012 From: philip at semanchuk.com (Philip Semanchuk) Date: Thu, 13 Dec 2012 15:34:08 -0500 Subject: [Numpy-discussion] Calling LAPACK function dbdsqr()? In-Reply-To: References: <9854A6C5-8458-4158-A999-42EA363D1A51@semanchuk.com> Message-ID: <19E5F883-2F88-47E1-9860-0D389C3F2110@semanchuk.com> On Dec 13, 2012, at 3:03 PM, Pauli Virtanen wrote: > 13.12.2012 21:26, Philip Semanchuk kirjoitti: >> I'm porting some Fortran code that makes use of a number of BLAS and >> LAPACK functions, including dbdsqr(). I've found all of the functions >> I need via scipy.linalg.lapack.get_lapack_funcs/get_blas_funcs() except >> for dbdsqr(). > [clip] > > If you tolerate having compiled code, you can relatively easily wrap the > LAPACK routine you need using f2py. Hi Pauli, THanks for the tip. We already have access to a compiled version, but maintaining that across 3 platforms and 32/64-bit is a headache, which is why we're aiming for pure Python plus numpy/scipy. bye Philip From philip at semanchuk.com Thu Dec 13 15:40:34 2012 From: philip at semanchuk.com (Philip Semanchuk) Date: Thu, 13 Dec 2012 15:40:34 -0500 Subject: [Numpy-discussion] Calling LAPACK function dbdsqr()? In-Reply-To: References: <9854A6C5-8458-4158-A999-42EA363D1A51@semanchuk.com> Message-ID: <605A5ECB-FD91-4101-B5D8-8AE225866870@semanchuk.com> On Dec 13, 2012, at 2:47 PM, Charles R Harris wrote: > On Thu, Dec 13, 2012 at 12:26 PM, Philip Semanchuk wrote: > >> Hi all, >> I'm porting some Fortran code that makes use of a number of BLAS and >> LAPACK functions, including dbdsqr(). I've found all of the functions I >> need via scipy.linalg.lapack.get_lapack_funcs/get_blas_funcs() except for >> dbdsqr(). >> >> I see that the numpy source code (I looked at numpy-1.6.0b2) contains >> dbdsqr() in numpy/linalg/dlapack_lite.c, but grep can't find that string in >> the binary distribution on my Mac nor on Linux. If it's buried in a numpy >> binary somewhere, I'm comfortable with using ctypes to call it, but I >> suspect it isn't. >> >> Can anyone point me to a cross-platform (OS X, Linux & Windows) way I can >> call this function? >> >> I'm unfortunately quite na?ve about the math in the code I'm porting, so >> I'm porting the code blindly -- if you ask me what problem I'm trying to >> solve with dbdsqr(), I won't be able to explain. >> >> > Not all the functions in lapack_lite are exposed in numpy. You might want > to post on the scipy list, there have been recent additions supporting more > lapack functions. Thanks, I'll check scipy. In numpy, I can see that lapack_litemodule.c is the layer that exposes select lapack functions via the Python interface, and dbdsqr isn't present in that file which is why I figured I would have to use ctypes to call it. I just can't figure out why dbdsqr appears in the source code but neither grep nor nm find any reference to it in the compiled binary. Cheers Philip From chris.barker at noaa.gov Thu Dec 13 16:07:56 2012 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 13 Dec 2012 13:07:56 -0800 Subject: [Numpy-discussion] Support for python 2.4 dropped. Should we drop 2.5 also? In-Reply-To: References:

Message-ID: >>> How about dropping support for python 2.5 also? I"m still dumfounded that people are working on projects where they are free to use the latest an greatest numpy, but *have* to use a more-than-four-year-old-python: """ Python 2.6 (final) was released on October 1st, 2008. """ so +1 on moving forward! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From andrew.collette at gmail.com Thu Dec 13 16:57:23 2012 From: andrew.collette at gmail.com (Andrew Collette) Date: Thu, 13 Dec 2012 14:57:23 -0700 Subject: [Numpy-discussion] ValueError: low level cast function is for unequal type numbers In-Reply-To: <50C1A22D.7050209@uci.edu> References: <50C1A22D.7050209@uci.edu> Message-ID: Hi, > the following code using np.object_ data types works with numpy 1.5.1 > but fails with 1.6.2. Is this intended or a regression? Other data > types, np.float64 for example, seem to work. I am also seeing this problem; there was a change to how string types are handled in h5py 2.1.0 which triggers this bug. It's a serious inconvenience, as people can't do e.g. np.copy() any more. > These downstream issues could be related: > > http://code.google.com/p/h5py/issues/detail?id=217 Yes, this seems to be the cause of issue 217. Andrew From cournape at gmail.com Thu Dec 13 17:02:25 2012 From: cournape at gmail.com (David Cournapeau) Date: Thu, 13 Dec 2012 23:02:25 +0100 Subject: [Numpy-discussion] Support for python 2.4 dropped. Should we drop 2.5 also? In-Reply-To: References:

Message-ID: On Thu, Dec 13, 2012 at 10:07 PM, Chris Barker - NOAA Federal wrote: >>>> How about dropping support for python 2.5 also? > > I"m still dumfounded that people are working on projects where they > are free to use the latest an greatest numpy, but *have* to use a > more-than-four-year-old-python: It happens very easily in corporate environments. Compiling python it a major headache compared to numpy, not because of python itself, but because you need to recompile every single extension you're gonna use. Compiling numpy + its dependencies is at least feasable, but building things like pygtk when you don't have X11 headers installed is more or less impossible. David From cgohlke at uci.edu Thu Dec 13 17:02:45 2012 From: cgohlke at uci.edu (Christoph Gohlke) Date: Thu, 13 Dec 2012 14:02:45 -0800 Subject: [Numpy-discussion] ValueError: low level cast function is for unequal type numbers In-Reply-To: References: <50C1A22D.7050209@uci.edu> Message-ID: <50CA5085.5080606@uci.edu> On 12/13/2012 1:57 PM, Andrew Collette wrote: > Hi, > >> the following code using np.object_ data types works with numpy 1.5.1 >> but fails with 1.6.2. Is this intended or a regression? Other data >> types, np.float64 for example, seem to work. > > I am also seeing this problem; there was a change to how string types > are handled in h5py 2.1.0 which triggers this bug. It's a serious > inconvenience, as people can't do e.g. np.copy() > any more. > >> These downstream issues could be related: >> >> http://code.google.com/p/h5py/issues/detail?id=217 > > Yes, this seems to be the cause of issue 217. > > Andrew Sorry, I forgot to mention that I submitted a pull request at . Christoph From brad.froehle at gmail.com Thu Dec 13 18:01:38 2012 From: brad.froehle at gmail.com (Bradley M. Froehle) Date: Thu, 13 Dec 2012 15:01:38 -0800 Subject: [Numpy-discussion] Support for python 2.4 dropped. Should we drop 2.5 also? In-Reply-To: References:

Message-ID: