From ndbecker2 at gmail.com Fri Nov 2 10:18:32 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 02 Nov 2012 10:18:32 -0400 Subject: [Numpy-discussion] yet another trivial question Message-ID: I'm trying to convert some matlab code. I see this: b(1)=[]; AFAICT, this removes the first element of the array, shifting the others. What is the preferred numpy equivalent? I'm not sure if b[:] = b[1:] is safe or not From jkington at wisc.edu Fri Nov 2 11:16:21 2012 From: jkington at wisc.edu (Joe Kington) Date: Fri, 02 Nov 2012 10:16:21 -0500 Subject: [Numpy-discussion] yet another trivial question In-Reply-To: References: Message-ID: On Fri, Nov 2, 2012 at 9:18 AM, Neal Becker wrote: > I'm trying to convert some matlab code. I see this: > > b(1)=[]; > > AFAICT, this removes the first element of the array, shifting the others. > > What is the preferred numpy equivalent? > > I'm not sure if > > b[:] = b[1:] > Unless I'm missing something, don't you just want: b = b[1:] > > is safe or not > It's not exactly the same as the matlab equivalent, as matlab will always make a copy, and this will be a view of the same array. For example, if you do something like this: import numpy as np a = np.arange(10) b = a[1:] b[3] = 1000 print a print b You'll see that modifying "b" in-place will modify "a" as well, as "b" is just a view into "a". This wouldn't be the case in matlab (if I remember correctly, anyway...). > _________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.warde.farley at gmail.com Fri Nov 2 15:31:12 2012 From: d.warde.farley at gmail.com (David Warde-Farley) Date: Fri, 2 Nov 2012 15:31:12 -0400 Subject: [Numpy-discussion] yet another trivial question In-Reply-To: References: Message-ID: On Fri, Nov 2, 2012 at 11:16 AM, Joe Kington wrote: > On Fri, Nov 2, 2012 at 9:18 AM, Neal Becker wrote: > >> I'm trying to convert some matlab code. I see this: >> >> b(1)=[]; >> >> AFAICT, this removes the first element of the array, shifting the others. >> >> What is the preferred numpy equivalent? >> >> I'm not sure if >> >> b[:] = b[1:] >> > > Unless I'm missing something, don't you just want: > > b = b[1:] > > >> >> is safe or not >> > > > It's not exactly the same as the matlab equivalent, as matlab will always > make a copy, and this will be a view of the same array. For example, if > you do something like this: > > import numpy as np > > a = np.arange(10) > > b = a[1:] > > b[3] = 1000 > > print a > print b > > You'll see that modifying "b" in-place will modify "a" as well, as "b" is > just a view into "a". This wouldn't be the case in matlab (if I remember > correctly, anyway...) > AFAIK that's correct. The closest thing to Matlab's semantics would probably "b = b[1:].copy()". b[:] = b[1:] would probably result in an error, as the RHS has a first-dimension shape one less than the LHS. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Nov 2 15:54:33 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 2 Nov 2012 19:54:33 +0000 Subject: [Numpy-discussion] yet another trivial question In-Reply-To: References: Message-ID: On Fri, Nov 2, 2012 at 2:18 PM, Neal Becker wrote: > I'm trying to convert some matlab code. I see this: > > b(1)=[]; > > AFAICT, this removes the first element of the array, shifting the others. > > What is the preferred numpy equivalent? Perhaps np.delete is what you're looking for. (Esp. if you have multiple items to remove like this, or want to remove something besides the first element in the array.) -n From ralf.gommers at gmail.com Sun Nov 4 14:31:35 2012 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 4 Nov 2012 20:31:35 +0100 Subject: [Numpy-discussion] ticket 2228: Scientific package seeing ABI change in 1.6.x In-Reply-To: References:

Message-ID: On Wed, Oct 31, 2012 at 1:05 PM, Charles R Harris wrote: > > > On Tue, Oct 30, 2012 at 9:26 PM, Travis Oliphant wrote: > >> The NPY_CHAR is not a "real type". There are no type-coercion functions >> attached to it nor ufuncs nor a full dtype object. However, it is used >> to mimic old Numeric character arrays (especially for copying a string). >> >> It should have been deprecated before changing the ABI. I don't think it >> was realized that it was part of the ABI (mostly for older codes that >> depended on Numeric). I think it was just another oversight that >> inserting type-codes changes this part of the ABI. >> >> The positive side is that It's a small part of the ABI and not many codes >> should depend on it. At this point, I'm not sure what can be done, except >> to document that NPY_CHAR has been deprecated in 1.7.0 and remove it in >> 1.8.0 to avoid future ABI difficulties. >> >> The short answer, is that codes that use NPY_CHAR must be recompiled to >> be compatible with 1.6.0. >> >> > IIRC, it was proposed to remove it at one point, but the STScI folks > wanted to keep it because their software depended on it. > I can't find that discussion in the list archives. If you know who from STScI to ask about this, can you do so? Is replacing NPY_CHAR with NPY_STRING supposed to just work? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Sun Nov 4 21:47:12 2012 From: travis at continuum.io (Travis Oliphant) Date: Sun, 4 Nov 2012 20:47:12 -0600 Subject: [Numpy-discussion] ticket 2228: Scientific package seeing ABI change in 1.6.x In-Reply-To: References:

Message-ID: <346E696A-CE25-48A0-9536-095C795851F5@continuum.io> On Nov 4, 2012, at 1:31 PM, Ralf Gommers wrote: > > > > On Wed, Oct 31, 2012 at 1:05 PM, Charles R Harris wrote: > > > On Tue, Oct 30, 2012 at 9:26 PM, Travis Oliphant wrote: > The NPY_CHAR is not a "real type". There are no type-coercion functions attached to it nor ufuncs nor a full dtype object. However, it is used to mimic old Numeric character arrays (especially for copying a string). > > It should have been deprecated before changing the ABI. I don't think it was realized that it was part of the ABI (mostly for older codes that depended on Numeric). I think it was just another oversight that inserting type-codes changes this part of the ABI. > > The positive side is that It's a small part of the ABI and not many codes should depend on it. At this point, I'm not sure what can be done, except to document that NPY_CHAR has been deprecated in 1.7.0 and remove it in 1.8.0 to avoid future ABI difficulties. > > The short answer, is that codes that use NPY_CHAR must be recompiled to be compatible with 1.6.0. > > > IIRC, it was proposed to remove it at one point, but the STScI folks wanted to keep it because their software depended on it. > > I can't find that discussion in the list archives. If you know who from STScI to ask about this, can you do so? > > Is replacing NPY_CHAR with NPY_STRING supposed to just work? No, it's a little more complicated than that, but not too much. Code that uses the NPY_CHAR type can be changed fairly easily to use the NPY_STRING type, but it does take some re-writing. The NPY_CHAR field was added so that code written for Numeric (like ScientificPython's netcdf reader) would continue to "just work" with no changes and behave similarly to how it behaved with Numeric's character type. Unfortunately, adding it to the end of the TYPE list does prevent adding any more types without breaking at least this part of the ABI. -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Nov 6 02:33:41 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 6 Nov 2012 01:33:41 -0600 Subject: [Numpy-discussion] NumFOCUS has received 501(c)3 status Message-ID: <746F39AF-FF3B-4E03-8AB2-332E94E6F5A2@continuum.io> Hello all, I'm really happy to report that NumFOCUS has received it's 501(c)3 status from the IRS. You can now make tax-deductible donations to NumFOCUS for the support of NumPy. We will put a NumPy-specific button on the home-page of NumPy soon so you can specifically direct your funds. But, for now you can go to http://numfocus.org/donate and be confident that your funds will support: 1) Continuous integration 2) The John Hunter Technical fellowships (which are awards made to students and post-docs and their mentors who will contribute substantially to a supported project during a 3-18 month period). 3) Equipment grants 4) Development sprints 5) Student travel to conferences 6) Project specific grants For example, most of Ondrej's time to work on the release of NumPy 1.7.0 has been paid for by donations to NumFOCUS from Continuum Analytics. NumFOCUS is also seeking nominations for 5 new board members (to bring the total to 9). If you would like to nominate someone please subscribe to numfocus at googlegroups.com (by sending an email to numfocus+subscribe at googlegroups.com) and then send your nomination. Alternatively, you can email me or one of the other directors directly. Best, -Travis From travis at continuum.io Tue Nov 6 02:33:50 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 6 Nov 2012 01:33:50 -0600 Subject: [Numpy-discussion] 1.7.0 release Message-ID: Hey all, Ondrej has been tied up finishing his PhD for the past several weeks. He is defending his work shortly and should be available to continue to help with the 1.7.0 release around the first of December. He and I have been in contact during this process, and I've been helping where I can. Fortunately, other NumPy developers have been active closing tickets and reviewing pull requests which has helped the process substantially. The release has taken us longer than we expected, but I'm really glad that we've received the bug-reports and issues that we have seen because it will help the 1.7.0 release be a more stable series. Also, the merging of the Trac issues with Git has exposed over-looked problems as well and will hopefully encourage more Git-focused participation by users. We are targeting getting the final release of 1.7.0 out by mid December (based on Ondrej's availability). But, I would like to find out which issues are seen as blockers by people on this list. I think most of the issues that I had as blockers have been resolved. If there are no more remaining blockers, then we may be able to accelerate the final release of 1.7.0 to just after Thanksgiving. Best regards, -Travis From lists at hilboll.de Tue Nov 6 09:59:38 2012 From: lists at hilboll.de (Andreas Hilboll) Date: Tue, 6 Nov 2012 15:59:38 +0100 Subject: [Numpy-discussion] pyhdf packaging Message-ID: Hi, I would like to package pyhdf for Ubuntu and make the package publicly available. Since the license is not totally clear to me (I cannot find any information in the sources, and the cheeseshop says "public", which doesn't mean anything to me), I tried to contact the maintainer, Andre Gosselin, however the email bounces, so I guess he's gone. Can anyone point me to how to proceed from here? Cheers, Andreas. From p.j.a.cock at googlemail.com Tue Nov 6 12:49:39 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 6 Nov 2012 17:49:39 +0000 Subject: [Numpy-discussion] Compiling NumPy on Windows for Python 3.3 Message-ID: Dear all, Since the NumPy 1.7.0b2 release didn't include a Windows (32 bit) installer for Python 3.3, I am considering compiling it myself for local testing. What compiler is recommended? Thanks, Peter From nouiz at nouiz.org Tue Nov 6 16:16:23 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Tue, 6 Nov 2012 16:16:23 -0500 Subject: [Numpy-discussion] 1.7.0 release In-Reply-To: References: Message-ID: Hi, I updated the numpy master and recompiled it. I still have the compilation error I got from Theano. I'll pop up that email thread again to have the history and I made a PR for this. Also, I think I said that numpy.ndindex changed its interface, in the past numpy.ndindex() was valid, not this raise and error: >>> import numpy >>> a=numpy.ndindex() >>> a.next() () >>> a.next() Traceback (most recent call last): File "", line 1, in File "/opt/lisa/os/epd-7.1.2/lib/python2.7/site-packages/numpy/lib/index_tricks.py", line 577, in next raise StopIteration StopIteration >>> numpy.__version__ '1.6.1' The error I have with master: [...] ValueError: __array_interface__ shape must be at least size 1 That is the only stopper I saw, but I didn't followed what was needed for other people. Fred On Tue, Nov 6, 2012 at 2:33 AM, Travis Oliphant wrote: > Hey all, > > Ondrej has been tied up finishing his PhD for the past several weeks. He > is defending his work shortly and should be available to continue to help > with the 1.7.0 release around the first of December. He and I have been > in contact during this process, and I've been helping where I can. > Fortunately, other NumPy developers have been active closing tickets and > reviewing pull requests which has helped the process substantially. > > The release has taken us longer than we expected, but I'm really glad that > we've received the bug-reports and issues that we have seen because it will > help the 1.7.0 release be a more stable series. Also, the merging of the > Trac issues with Git has exposed over-looked problems as well and will > hopefully encourage more Git-focused participation by users. > > We are targeting getting the final release of 1.7.0 out by mid December > (based on Ondrej's availability). But, I would like to find out which > issues are seen as blockers by people on this list. I think most of the > issues that I had as blockers have been resolved. If there are no more > remaining blockers, then we may be able to accelerate the final release of > 1.7.0 to just after Thanksgiving. > > Best regards, > > -Travis > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Tue Nov 6 16:17:09 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Tue, 6 Nov 2012 16:17:09 -0500 Subject: [Numpy-discussion] np 1.7b2 PyArray_BYTES(obj)=ptr fail In-Reply-To: References:

Message-ID: Hi, I made a PR with my fix: https://github.com/numpy/numpy/pull/2709 Fr?d?ric On Tue, Oct 2, 2012 at 6:18 PM, Charles R Harris wrote: > > > On Tue, Oct 2, 2012 at 1:44 PM, Fr?d?ric Bastien wrote: > >> With numpy 1.6.2, it is working. So this is an interface change. Are >> you sure you want this? This break existing code. >> >> I do not understand what you mean by slot? >> > > Pythonese for structure member ;) > > >> >> I'm not sure is the PyArray_SWAP is a good long term idea. I would not >> make it if it is only for temporarily. >> > > The C++ stdlib provides something similar for std::vector. One common use > case would be to pass in a vector by reference that gets swapped with one > on the stack. When the function exits the one on the stack is cleaned up > and the vector that was passed in has the new data, but it has to be the > same type. > > For PyArray_SWAP I was thinking of swapping everything: type, dims, > strides, data, etc. That is what f2py does. > > >> To set the base ptr, there is PyArray_SetBaseObject() fct that is new >> in 1.7. Is a similar fct useful in the long term for numpy? In the >> case where we implement differently the ndarray object, I think it >> won't be useful. We will also need to know how the memory is laid out >> by numpy for performance critical code. We we will need an attribute >> that tell the intern structure used. >> >> So do you want to force this interface change in numpy 1.7 so that I >> need to write codes now or can I wait to do it when you force the new >> interface? >> >> > Well, no we don't want to force you to use the new interface. If you don't > define NPY_NO_DEPRECATED_API things should still work. Although if it is > defined the function returns an rvalue, so some other method needs to be > provided for what you are doing. > > >> Currently the used code for PyArray_BYTES is: >> >> #define PyArray_BYTES(obj) ((char *)(((PyArrayObject_fields >> *)(obj))->data)) >> >> if I change it to >> >> #define PyArray_BYTES(obj) ((((PyArrayObject_fields *)(obj))->data)) >> >> it work! I don't understand why removing the case make it work. the >> data field is already an (char*), so this should not make a difference >> to my underderstanding. But I'm missing something here, do someone >> know? >> > > What I find strange is that it is the same macro in 1.7 and 1.6, only the > name of the structure was changed. Hmm... This looks almost like some > compiler subtlety, I wonder if the compiler version/optimization flags have > changed? In any case, I think the second form would be more correct for the > lvalue since the structure member is, as you say, already a char*. > > We want things to work for you as they should, so we need to understand > this and fix it. > > > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From phillip.m.feldman at gmail.com Tue Nov 6 21:27:21 2012 From: phillip.m.feldman at gmail.com (Phillip Feldman) Date: Tue, 6 Nov 2012 18:27:21 -0800 Subject: [Numpy-discussion] strange behavior of numpy.unique Message-ID: numpy.unique behaves as I would expect for small inputs like the following: In [12]: x= [0, 0, 1, 0, 1, 2, 0, 1, 2, 3] In [13]: unique(x, return_index=True) Out[13]: (array([0, 1, 2, 3]), array([0, 2, 5, 9], dtype=int64)) But, when I give it something larger, the return index values do not always correspond to the first occurrences in the input. The documentation is silent on the question of how the return index values are chosen when a given element of x appears more than once. Either the documentation should be clarified, or better yet, the behavior should be changed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Tue Nov 6 21:52:24 2012 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Tue, 6 Nov 2012 20:52:24 -0600 Subject: [Numpy-discussion] strange behavior of numpy.unique In-Reply-To: References: Message-ID: On Tue, Nov 6, 2012 at 8:27 PM, Phillip Feldman wrote: > numpy.unique behaves as I would expect for small inputs like the following: > > In [12]: x= [0, 0, 1, 0, 1, 2, 0, 1, 2, 3] > > In [13]: unique(x, return_index=True) > Out[13]: (array([0, 1, 2, 3]), array([0, 2, 5, 9], dtype=int64)) > > But, when I give it something larger, the return index values do not > always correspond to the first occurrences in the input. The documentation > is silent on the question of how the return index values are chosen when a > given element of x appears more than once. Either the documentation should > be > clarified, or better yet, the behavior should be changed. > In fact, it was changed (in the master branch on github) several months ago, but there has not yet been a release with the changes. The sort method that np.unique passes to np.argsort is now 'mergesort', and the docstring states that the indices returned are for the first occurrences of the unique elements. The new docstring is here: http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.unique.html#numpy.unique See https://github.com/numpy/numpy/commit/dbf235169ed3386b359caaa9217f5280bf1d6749for the commit, and https://github.com/numpy/numpy/blob/master/numpy/lib/arraysetops.py for the latest version of the source. Warren > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Wed Nov 7 07:35:24 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 07 Nov 2012 07:35:24 -0500 Subject: [Numpy-discussion] testing with amd libm/acml Message-ID: I'm trying to do a bit of benchmarking to see if amd libm/acml will help me. I got an idea that instead of building all of numpy/scipy and all of my custom modules against these libraries, I could simply use: LD_PRELOAD=/opt/amdlibm-3.0.2/lib/dynamic/libamdlibm.so:/opt/acml5.2.0/gfortran64/lib/libacml.so I'm hoping that both numpy and my own dll's then will take advantage of these libraries. Do you think this will work? From cournape at gmail.com Wed Nov 7 08:34:17 2012 From: cournape at gmail.com (David Cournapeau) Date: Wed, 7 Nov 2012 13:34:17 +0000 Subject: [Numpy-discussion] testing with amd libm/acml In-Reply-To: References: Message-ID: On Wed, Nov 7, 2012 at 12:35 PM, Neal Becker wrote: > I'm trying to do a bit of benchmarking to see if amd libm/acml will help me. > > I got an idea that instead of building all of numpy/scipy and all of my custom > modules against these libraries, I could simply use: > > LD_PRELOAD=/opt/amdlibm-3.0.2/lib/dynamic/libamdlibm.so:/opt/acml5.2.0/gfortran64/lib/libacml.so > > > I'm hoping that both numpy and my own dll's then will take advantage of these > libraries. > > Do you think this will work? Quite unlikely depending on your configuration, because those libraries are rarely if ever ABI compatible (that's why it is such a pain to support). David From ndbecker2 at gmail.com Wed Nov 7 08:56:41 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 07 Nov 2012 08:56:41 -0500 Subject: [Numpy-discussion] testing with amd libm/acml References:

Message-ID: David Cournapeau wrote: > On Wed, Nov 7, 2012 at 12:35 PM, Neal Becker wrote: >> I'm trying to do a bit of benchmarking to see if amd libm/acml will help me. >> >> I got an idea that instead of building all of numpy/scipy and all of my >> custom modules against these libraries, I could simply use: >> >> LD_PRELOAD=/opt/amdlibm-3.0.2/lib/dynamic/libamdlibm.so:/opt/acml5.2.0/gfortran64/lib/libacml.so >> >> >> I'm hoping that both numpy and my own dll's then will take advantage of these >> libraries. >> >> Do you think this will work? > > Quite unlikely depending on your configuration, because those > libraries are rarely if ever ABI compatible (that's why it is such a > pain to support). > > David When you say quite unlikely (to work), you mean a) unlikely that libm/acml will be used to resolve symbols in numpy/dlls at runtime (e.g., exp)? or b) program may produce wrong results and/or crash ? From cournape at gmail.com Wed Nov 7 09:06:41 2012 From: cournape at gmail.com (David Cournapeau) Date: Wed, 7 Nov 2012 14:06:41 +0000 Subject: [Numpy-discussion] testing with amd libm/acml In-Reply-To: References:

Message-ID: On Wed, Nov 7, 2012 at 1:56 PM, Neal Becker wrote: > David Cournapeau wrote: > >> On Wed, Nov 7, 2012 at 12:35 PM, Neal Becker wrote: >>> I'm trying to do a bit of benchmarking to see if amd libm/acml will help me. >>> >>> I got an idea that instead of building all of numpy/scipy and all of my >>> custom modules against these libraries, I could simply use: >>> >>> > LD_PRELOAD=/opt/amdlibm-3.0.2/lib/dynamic/libamdlibm.so:/opt/acml5.2.0/gfortran64/lib/libacml.so >>> >>> >>> I'm hoping that both numpy and my own dll's then will take advantage of these >>> libraries. >>> >>> Do you think this will work? >> >> Quite unlikely depending on your configuration, because those >> libraries are rarely if ever ABI compatible (that's why it is such a >> pain to support). >> >> David > > When you say quite unlikely (to work), you mean > > a) unlikely that libm/acml will be used to resolve symbols in numpy/dlls at > runtime (e.g., exp)? > > or > > b) program may produce wrong results and/or crash ? Both, actually. That's not something I would use myself. Did you try openblas ? It is open source, simple to build, and is pretty fast, David From ndbecker2 at gmail.com Wed Nov 7 09:28:32 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 07 Nov 2012 09:28:32 -0500 Subject: [Numpy-discussion] testing with amd libm/acml References:

Message-ID: David Cournapeau wrote: > On Wed, Nov 7, 2012 at 1:56 PM, Neal Becker wrote: >> David Cournapeau wrote: >> >>> On Wed, Nov 7, 2012 at 12:35 PM, Neal Becker wrote: >>>> I'm trying to do a bit of benchmarking to see if amd libm/acml will help >>>> me. >>>> >>>> I got an idea that instead of building all of numpy/scipy and all of my >>>> custom modules against these libraries, I could simply use: >>>> >>>> >> LD_PRELOAD=/opt/amdlibm-3.0.2/lib/dynamic/libamdlibm.so:/opt/acml5.2.0/gfortran64/lib/libacml.so >>>> >>>> >>>> I'm hoping that both numpy and my own dll's then will take advantage of >>>> these libraries. >>>> >>>> Do you think this will work? >>> >>> Quite unlikely depending on your configuration, because those >>> libraries are rarely if ever ABI compatible (that's why it is such a >>> pain to support). >>> >>> David >> >> When you say quite unlikely (to work), you mean >> >> a) unlikely that libm/acml will be used to resolve symbols in numpy/dlls at >> runtime (e.g., exp)? >> >> or >> >> b) program may produce wrong results and/or crash ? > > Both, actually. That's not something I would use myself. Did you try > openblas ? It is open source, simple to build, and is pretty fast, > > David Actually, for my current work, I'm more concerned with speeding up operations such as exp, log and basic vector arithmetic. Any thoughts on that? From ndbecker2 at gmail.com Wed Nov 7 09:30:28 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 07 Nov 2012 09:30:28 -0500 Subject: [Numpy-discussion] testing with amd libm/acml References:

Message-ID: David Cournapeau wrote: > On Wed, Nov 7, 2012 at 1:56 PM, Neal Becker wrote: >> David Cournapeau wrote: >> >>> On Wed, Nov 7, 2012 at 12:35 PM, Neal Becker wrote: >>>> I'm trying to do a bit of benchmarking to see if amd libm/acml will help >>>> me. >>>> >>>> I got an idea that instead of building all of numpy/scipy and all of my >>>> custom modules against these libraries, I could simply use: >>>> >>>> >> LD_PRELOAD=/opt/amdlibm-3.0.2/lib/dynamic/libamdlibm.so:/opt/acml5.2.0/gfortran64/lib/libacml.so >>>> >>>> >>>> I'm hoping that both numpy and my own dll's then will take advantage of >>>> these libraries. >>>> >>>> Do you think this will work? >>> >>> Quite unlikely depending on your configuration, because those >>> libraries are rarely if ever ABI compatible (that's why it is such a >>> pain to support). >>> >>> David >> >> When you say quite unlikely (to work), you mean >> >> a) unlikely that libm/acml will be used to resolve symbols in numpy/dlls at >> runtime (e.g., exp)? >> >> or >> >> b) program may produce wrong results and/or crash ? > > Both, actually. That's not something I would use myself. Did you try > openblas ? It is open source, simple to build, and is pretty fast, > > David In my current work, probably the largest bottlenecks are 'max*', which are log (\sum e^(x_i)) From d.s.seljebotn at astro.uio.no Wed Nov 7 09:52:28 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 07 Nov 2012 15:52:28 +0100 Subject: [Numpy-discussion] testing with amd libm/acml In-Reply-To: References:

Message-ID: <509A75AC.3010206@astro.uio.no> On 11/07/2012 03:30 PM, Neal Becker wrote: > David Cournapeau wrote: > >> On Wed, Nov 7, 2012 at 1:56 PM, Neal Becker wrote: >>> David Cournapeau wrote: >>> >>>> On Wed, Nov 7, 2012 at 12:35 PM, Neal Becker wrote: >>>>> I'm trying to do a bit of benchmarking to see if amd libm/acml will help >>>>> me. >>>>> >>>>> I got an idea that instead of building all of numpy/scipy and all of my >>>>> custom modules against these libraries, I could simply use: >>>>> >>>>> >>> > LD_PRELOAD=/opt/amdlibm-3.0.2/lib/dynamic/libamdlibm.so:/opt/acml5.2.0/gfortran64/lib/libacml.so >>>>> >>>>> >>>>> I'm hoping that both numpy and my own dll's then will take advantage of >>>>> these libraries. >>>>> >>>>> Do you think this will work? >>>> >>>> Quite unlikely depending on your configuration, because those >>>> libraries are rarely if ever ABI compatible (that's why it is such a >>>> pain to support). >>>> >>>> David >>> >>> When you say quite unlikely (to work), you mean >>> >>> a) unlikely that libm/acml will be used to resolve symbols in numpy/dlls at >>> runtime (e.g., exp)? >>> >>> or >>> >>> b) program may produce wrong results and/or crash ? >> >> Both, actually. That's not something I would use myself. Did you try >> openblas ? It is open source, simple to build, and is pretty fast, >> >> David > > In my current work, probably the largest bottlenecks are 'max*', which are > > log (\sum e^(x_i)) numexpr with Intel VML is the solution I know of that doesn't require you to dig into compiling C code yourself. Did you look into that or is using Intel VML/MKL not an option? Fast exps depend on the CPU evaluating many exp's at the same time (both explicit through vector registers, and implicit through pipelining); even if you get what you try to work (which is unlikely I think) the approach is inherently slow, since just passing a single number at the time through the "exp" function can't be efficient. Dag Sverre From josef.pktd at gmail.com Wed Nov 7 12:24:04 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 7 Nov 2012 12:24:04 -0500 Subject: [Numpy-discussion] strange behavior of numpy.unique In-Reply-To: References: Message-ID: On Tue, Nov 6, 2012 at 9:52 PM, Warren Weckesser wrote: > > > On Tue, Nov 6, 2012 at 8:27 PM, Phillip Feldman > wrote: >> >> numpy.unique behaves as I would expect for small inputs like the >> following: >> >> In [12]: x= [0, 0, 1, 0, 1, 2, 0, 1, 2, 3] >> >> In [13]: unique(x, return_index=True) >> Out[13]: (array([0, 1, 2, 3]), array([0, 2, 5, 9], dtype=int64)) >> >> But, when I give it something larger, the return index values do not >> always correspond to the first occurrences in the input. The documentation >> is silent on the question of how the return index values are chosen when a >> given element of x appears more than once. Either the documentation should >> be >> clarified, or better yet, the behavior should be changed. > > > > In fact, it was changed (in the master branch on github) several months ago, > but there has not yet been a release with the changes. The sort method that > np.unique passes to np.argsort is now 'mergesort', and the docstring states > that the indices returned are for the first occurrences of the unique > elements. The new docstring is here: > http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.unique.html#numpy.unique > > See > https://github.com/numpy/numpy/commit/dbf235169ed3386b359caaa9217f5280bf1d6749 > for the commit, and > https://github.com/numpy/numpy/blob/master/numpy/lib/arraysetops.py for the > latest version of the source. I think it's in 1.6.2 and it broke return_index for structured dtypes, IIRC. Josef > > Warren > > >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ocefpaf at gmail.com Wed Nov 7 13:28:23 2012 From: ocefpaf at gmail.com (Filipe Pires Alvarenga Fernandes) Date: Wed, 7 Nov 2012 16:28:23 -0200 Subject: [Numpy-discussion] Help compiling numpy with new gcc Message-ID: Hi I am trying to compile numpy with gcc 4.7.1 and I am having the following issue. "RuntimeError: Broken toolchain: cannot link a simple C program" I noticed that I need to pass the flag '-fno-use-linker-plugin' to be able to compile it. However, even though I did pass it by exporting the CFLAGS, it does not work. I guess that numpy do not use the CFLAGS for its internal extensions. How can I pass that option to it? Error below: [ 11s] compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/include/python2.7 -c' [ 11s] gcc: _configtest.c [ 11s] gcc -pthread _configtest.o -o _configtest [ 11s] gcc: fatal error: -fuse-linker-plugin, but liblto_plugin.so not found [ 11s] compilation terminated. [ 11s] gcc: fatal error: -fuse-linker-plugin, but liblto_plugin.so not found [ 11s] compilation terminated. [ 11s] failure. [ 11s] removing: _configtest.c _configtest.o [ 11s] Traceback (most recent call last): [ 11s] File "setup.py", line 214, in [ 11s] setup_package() [ 11s] File "setup.py", line 207, in setup_package [ 11s] configuration=configuration ) [ 11s] File "/home/abuild/rpmbuild/BUILD/numpy/numpy/distutils/core.py", line 186, in setup [ 11s] return old_setup(**new_attr) [ 11s] File "/usr/lib64/python2.7/distutils/core.py", line 152, in setup [ 11s] dist.run_commands() [ 11s] File "/usr/lib64/python2.7/distutils/dist.py", line 953, in run_commands [ 11s] self.run_command(cmd) [ 11s] File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command [ 11s] cmd_obj.run() [ 11s] File "/home/abuild/rpmbuild/BUILD/numpy/numpy/distutils/command/build.py", line 37, in run [ 11s] old_build.run(self) [ 11s] File "/usr/lib64/python2.7/distutils/command/build.py", line 127, in run [ 11s] self.run_command(cmd_name) [ 11s] File "/usr/lib64/python2.7/distutils/cmd.py", line 326, in run_command [ 11s] self.distribution.run_command(command) [ 11s] File "/usr/lib64/python2.7/distutils/dist.py", line 972, in run_command [ 11s] cmd_obj.run() [ 11s] File "/home/abuild/rpmbuild/BUILD/numpy/numpy/distutils/command/build_src.py", line 152, in run [ 11s] self.build_sources() [ 11s] File "/home/abuild/rpmbuild/BUILD/numpy/numpy/distutils/command/build_src.py", line 163, in build_sources [ 11s] self.build_library_sources(*libname_info) [ 11s] File "/home/abuild/rpmbuild/BUILD/numpy/numpy/distutils/command/build_src.py", line 298, in build_library_sources [ 11s] sources = self.generate_sources(sources, (lib_name, build_info)) [ 11s] File "/home/abuild/rpmbuild/BUILD/numpy/numpy/distutils/command/build_src.py", line 385, in generate_sources [ 11s] source = func(extension, build_dir) [ 11s] File "numpy/core/setup.py", line 648, in get_mathlib_info [ 11s] raise RuntimeError("Broken toolchain: cannot link a simple C program") [ 11s] RuntimeError: Broken toolchain: cannot link a simple C program [ 11s] error: Bad exit status from /var/tmp/rpm-tmp.yO2SIE (%build) [ 11s] [ 11s] [ 11s] RPM build errors: [ 11s] Bad exit status from /var/tmp/rpm-tmp.yO2SIE (%build) From warren.weckesser at gmail.com Wed Nov 7 14:18:08 2012 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Wed, 7 Nov 2012 13:18:08 -0600 Subject: [Numpy-discussion] strange behavior of numpy.unique In-Reply-To: References: Message-ID: On Wed, Nov 7, 2012 at 11:24 AM, wrote: > On Tue, Nov 6, 2012 at 9:52 PM, Warren Weckesser > wrote: > > > > > > On Tue, Nov 6, 2012 at 8:27 PM, Phillip Feldman > > wrote: > >> > >> numpy.unique behaves as I would expect for small inputs like the > >> following: > >> > >> In [12]: x= [0, 0, 1, 0, 1, 2, 0, 1, 2, 3] > >> > >> In [13]: unique(x, return_index=True) > >> Out[13]: (array([0, 1, 2, 3]), array([0, 2, 5, 9], dtype=int64)) > >> > >> But, when I give it something larger, the return index values do not > >> always correspond to the first occurrences in the input. The > documentation > >> is silent on the question of how the return index values are chosen > when a > >> given element of x appears more than once. Either the documentation > should > >> be > >> clarified, or better yet, the behavior should be changed. > > > > > > > > In fact, it was changed (in the master branch on github) several months > ago, > > but there has not yet been a release with the changes. The sort method > that > > np.unique passes to np.argsort is now 'mergesort', and the docstring > states > > that the indices returned are for the first occurrences of the unique > > elements. The new docstring is here: > > > http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.unique.html#numpy.unique > > > > See > > > https://github.com/numpy/numpy/commit/dbf235169ed3386b359caaa9217f5280bf1d6749 > > for the commit, and > > https://github.com/numpy/numpy/blob/master/numpy/lib/arraysetops.py for > the > > latest version of the source. > > I think it's in 1.6.2 and it broke return_index for structured dtypes, > IIRC. > > You are correct, Josef, that change is in 1.6.2. Thanks. Warren Josef > > > > > > Warren > > > > > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Wed Nov 7 14:41:22 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 07 Nov 2012 14:41:22 -0500 Subject: [Numpy-discussion] testing with amd libm/acml References:

<509A75AC.3010206@astro.uio.no> Message-ID: Would you expect numexpr without MKL to give a significant boost? From charlesr.harris at gmail.com Wed Nov 7 16:48:05 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 7 Nov 2012 14:48:05 -0700 Subject: [Numpy-discussion] strange behavior of numpy.unique In-Reply-To: References: Message-ID: On Tue, Nov 6, 2012 at 7:52 PM, Warren Weckesser wrote: > > > On Tue, Nov 6, 2012 at 8:27 PM, Phillip Feldman < > phillip.m.feldman at gmail.com> wrote: > >> numpy.unique behaves as I would expect for small inputs like the >> following: >> >> In [12]: x= [0, 0, 1, 0, 1, 2, 0, 1, 2, 3] >> >> In [13]: unique(x, return_index=True) >> Out[13]: (array([0, 1, 2, 3]), array([0, 2, 5, 9], dtype=int64)) >> >> But, when I give it something larger, the return index values do not >> always correspond to the first occurrences in the input. The documentation >> is silent on the question of how the return index values are chosen when a >> given element of x appears more than once. Either the documentation should >> be >> clarified, or better yet, the behavior should be changed. >> > > > In fact, it was changed (in the master branch on github) several months > ago, but there has not yet been a release with the changes. The sort > method that np.unique passes to np.argsort is now 'mergesort', and the > docstring states that the indices returned are for the first occurrences of > the unique elements. The new docstring is here: > http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.unique.html#numpy.unique > > See > https://github.com/numpy/numpy/commit/dbf235169ed3386b359caaa9217f5280bf1d6749for the commit, and > https://github.com/numpy/numpy/blob/master/numpy/lib/arraysetops.py for > the latest version of the source. > > That change was backported to 1.6.2, but doesn't work for record/object arrays. That oversight is fixed in master. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From scheffer.nicolas at gmail.com Wed Nov 7 17:41:37 2012 From: scheffer.nicolas at gmail.com (Nicolas SCHEFFER) Date: Wed, 7 Nov 2012 14:41:37 -0800 Subject: [Numpy-discussion] Scipy dot Message-ID: Hi, I've written a snippet of code that we could call scipy.dot, a drop-in replacement for numpy.dot. It's dead easy, and just answer the need of calling the right blas function depending on the type of arrays, C or F order (aka slowness of np.dot(A, A.T)) While this is not the scipy mailing list, I was wondering if this snippet would relevant and/or useful to others, numpy folks, scipy folks or could be integrated directly in numpy (so that we keep the nice A.dot(B) syntax) This bottleneck of temporary copies has been a problem for lots of users and it seems everybody has their own snippets. This code is probably not written as it should, I hope the community can help improving it! ;) First FIXME is to make it work for arrays of dimensions other than 2. Suggestions highly appreciated! Thanks! === Code (also on http://pastebin.com/QrRk0kEf) def dot(A, B, out=None): """ A drop in replaement for numpy.dot Computes A.B optimized using fblas call note: unlike in numpy the returned array is in F order""" import scipy.linalg as sp gemm = sp.get_blas_funcs('gemm', arrays=(A,B)) if out is None: lda, x, y, ldb = A.shape + B.shape if x != y: raise ValueError("matrices are not aligned") dtype = np.max([x.dtype for x in (A, B)]) out = np.empty((lda, ldb), dtype, order='F') if A.flags.c_contiguous and B.flags.c_contiguous: gemm(alpha=1., a=A.T, b=B.T, trans_a=True, trans_b=True, c=out, overwrite_c=True) if A.flags.c_contiguous and B.flags.f_contiguous: gemm(alpha=1., a=A.T, b=B, trans_a=True, c=out, overwrite_c=True) if A.flags.f_contiguous and B.flags.c_contiguous: gemm(alpha=1., a=A, b=B.T, trans_b=True, c=out, overwrite_c=True) if A.flags.f_contiguous and B.flags.f_contiguous: gemm(alpha=1., a=A, b=B, c=out, overwrite_c=True) return out == Timing (EPD, MKL): In [15]: A = np.array(np.random.randn(1000, 1000), 'f') In [16]: %timeit np.dot(A, A) 100 loops, best of 3: 7.19 ms per loop In [17]: %timeit np.dot(A.T, A.T) 10 loops, best of 3: 27.7 ms per loop In [18]: %timeit np.dot(A, A.T) 100 loops, best of 3: 18.3 ms per loop In [19]: %timeit np.dot(A.T, A) 100 loops, best of 3: 18.7 ms per loop In [20]: %timeit dot(A, A) 100 loops, best of 3: 7.16 ms per loop In [21]: %timeit dot(A.T, A.T) 100 loops, best of 3: 6.67 ms per loop In [22]: %timeit dot(A, A.T) 100 loops, best of 3: 6.79 ms per loop In [23]: %timeit dot(A.T, A) 100 loops, best of 3: 7.02 ms per loop From chris.barker at noaa.gov Wed Nov 7 18:35:55 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 7 Nov 2012 15:35:55 -0800 Subject: [Numpy-discussion] testing with amd libm/acml In-Reply-To: References:

<509A75AC.3010206@astro.uio.no> Message-ID: On Wed, Nov 7, 2012 at 11:41 AM, Neal Becker wrote: > Would you expect numexpr without MKL to give a significant boost? It can, depending on the use case: -- It can remove a lot of uneccessary temporary creation. -- IIUC, it works on blocks of data at a time, and thus can keep things in cache more when working with large data sets. -- It can (optionally) use multiple threads for easy parallelization. All you can do is try it on your use-case and see what you get. It's a pretty light lift to try. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From francesc at continuum.io Thu Nov 8 04:33:36 2012 From: francesc at continuum.io (Francesc Alted) Date: Thu, 08 Nov 2012 10:33:36 +0100 Subject: [Numpy-discussion] testing with amd libm/acml In-Reply-To: References:

<509A75AC.3010206@astro.uio.no> Message-ID: <509B7C70.9010508@continuum.io> On 11/7/12 8:41 PM, Neal Becker wrote: > Would you expect numexpr without MKL to give a significant boost? Yes. Have a look at how numexpr's own multi-threaded virtual machine compares with numexpr using VML: http://code.google.com/p/numexpr/wiki/NumexprVML As it can be seen, the best results are obtained by using the multi-threaded VM in numexpr in combination with a single-threaded VML engine. Caution: I did these benchmarks some time ago (couple of years?), so it might be that multi-threaded VML would have improved by now. If performance is critical, some experiments should be done first so as to find the optimal configuration. At any rate, VML will let you to optimally leverage the SIMD instructions in the cores, allowing to compute, for example, exp() in 1 or 2 clock cycles (depending on the vector length, the number of cores in your system and the data precision): http://software.intel.com/sites/products/documentation/hpc/mkl/vml/functions/exp.html Pretty amazing. -- Francesc Alted From francesc at continuum.io Thu Nov 8 05:22:08 2012 From: francesc at continuum.io (Francesc Alted) Date: Thu, 08 Nov 2012 11:22:08 +0100 Subject: [Numpy-discussion] testing with amd libm/acml In-Reply-To: References:

<509A75AC.3010206@astro.uio.no> Message-ID: <509B87D0.60905@continuum.io> On 11/8/12 12:35 AM, Chris Barker wrote: > On Wed, Nov 7, 2012 at 11:41 AM, Neal Becker wrote: >> Would you expect numexpr without MKL to give a significant boost? > It can, depending on the use case: > -- It can remove a lot of uneccessary temporary creation. > -- IIUC, it works on blocks of data at a time, and thus can keep > things in cache more when working with large data sets. Well, the temporaries are still created, but the thing is that, by working with small blocks at a time, these temporaries fit in CPU cache, preventing copies into main memory. I like to name this the 'blocking technique', as explained in slide 26 (and following) in: https://python.g-node.org/wiki/_media/starving_cpu/starving-cpu.pdf A better technique is to reduce the block size to the minimal expression (1 element), so temporaries are stored in registers in CPU instead of small blocks in cache, hence preventing copies even in *cache*. Numba (https://github.com/numba/numba) follows this approach, which is pretty optimal as can be seen in slide 37 of the lecture above. > -- It can (optionally) use multiple threads for easy parallelization. No, the *total* amount of cores detected in the system is the default in numexpr; if you want less, you will need to use set_num_threads(nthreads) function. But agreed, sometimes using too many threads could effectively be counter-producing. -- Francesc Alted From njs at pobox.com Thu Nov 8 06:28:21 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 8 Nov 2012 11:28:21 +0000 Subject: [Numpy-discussion] Scipy dot In-Reply-To: References: Message-ID: On Wed, Nov 7, 2012 at 10:41 PM, Nicolas SCHEFFER wrote: > Hi, > > I've written a snippet of code that we could call scipy.dot, a drop-in > replacement for numpy.dot. > It's dead easy, and just answer the need of calling the right blas > function depending on the type of arrays, C or F order (aka slowness > of np.dot(A, A.T)) > > While this is not the scipy mailing list, I was wondering if this > snippet would relevant and/or useful to others, numpy folks, scipy > folks or could be integrated directly in numpy (so that we keep the > nice A.dot(B) syntax) I think everyone would be very happy to see numpy.dot modified to do this automatically. But adding a scipy.dot IMHO would be fixing things in the wrong place and just create extra confusion. Is it possible to avoid changing the default output order from C to F? (E.g. by transposing everything relative to what you have now?) That seems like a change that would be good to avoid if it's easy. -n From gael.varoquaux at normalesup.org Thu Nov 8 07:07:25 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 8 Nov 2012 13:07:25 +0100 Subject: [Numpy-discussion] Scipy dot In-Reply-To: References: Message-ID: <20121108120725.GL313@phare.normalesup.org> On Thu, Nov 08, 2012 at 11:28:21AM +0000, Nathaniel Smith wrote: > I think everyone would be very happy to see numpy.dot modified to do > this automatically. But adding a scipy.dot IMHO would be fixing things > in the wrong place and just create extra confusion. I am not sure I agree: numpy is often compiled without lapack support, as it is not necessary. On the other hand scipy is always compiled with lapack. Thus this makes more sens in scipy. > Is it possible to avoid changing the default output order from C to F? +1 G From d.s.seljebotn at astro.uio.no Thu Nov 8 07:12:08 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 08 Nov 2012 13:12:08 +0100 Subject: [Numpy-discussion] Scipy dot In-Reply-To: <20121108120725.GL313@phare.normalesup.org> References: <20121108120725.GL313@phare.normalesup.org> Message-ID: <509BA198.4070301@astro.uio.no> On 11/08/2012 01:07 PM, Gael Varoquaux wrote: > On Thu, Nov 08, 2012 at 11:28:21AM +0000, Nathaniel Smith wrote: >> I think everyone would be very happy to see numpy.dot modified to do >> this automatically. But adding a scipy.dot IMHO would be fixing things >> in the wrong place and just create extra confusion. > > I am not sure I agree: numpy is often compiled without lapack support, as > it is not necessary. On the other hand scipy is always compiled with > lapack. Thus this makes more sens in scipy. Well, numpy.dot already contains multiple fallback cases for when it is compiled with BLAS and not. So I'm +1 on just making this an improvement on numpy.dot. I don't think there's a time when you would not want to use this (provided the output order issue is fixed), and it doesn't make sense to not have old codes take advantage of the speed improvement. BTW, something this doesn't fix is that you still have to do "np.dot(x.conjugate().t, x)" in the complex case which needlessly copies the data since LAPACK can do the conjugation. DS > >> Is it possible to avoid changing the default output order from C to F? > > +1 > > G > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ndbecker2 at gmail.com Thu Nov 8 07:37:53 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 08 Nov 2012 07:37:53 -0500 Subject: [Numpy-discussion] numexpr question Message-ID: I'm interested in trying numexpr, but have a question (not sure where's the best forum to ask). The examples I see use ne.evaluate ("some string...") When used within a loop, I would expect the compilation from the string form to add significant overhead. I would have thought a pre-compiled form would be available, similar to a precompiled regexp. yes/no? From d.s.seljebotn at astro.uio.no Thu Nov 8 07:41:45 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 08 Nov 2012 13:41:45 +0100 Subject: [Numpy-discussion] testing with amd libm/acml In-Reply-To: References:

<509A75AC.3010206@astro.uio.no> Message-ID: <509BA889.9070102@astro.uio.no> On 11/07/2012 08:41 PM, Neal Becker wrote: > Would you expect numexpr without MKL to give a significant boost? If you need higher performance than what numexpr can give without using MKL, you could look at code such as this: https://github.com/herumi/fmath/blob/master/fmath.hpp#L480 But that means going to C (e.g., by wrapping that function in Cython). Pay attention to what range you evaluate the function in though (my eyes may deceive me but it seems that the test program only test for arguments drawn from the standard Gaussian which is a bit limited..) Dag Sverre From cournape at gmail.com Thu Nov 8 08:06:53 2012 From: cournape at gmail.com (David Cournapeau) Date: Thu, 8 Nov 2012 13:06:53 +0000 Subject: [Numpy-discussion] Scipy dot In-Reply-To: <509BA198.4070301@astro.uio.no> References: <20121108120725.GL313@phare.normalesup.org> <509BA198.4070301@astro.uio.no> Message-ID: On Thu, Nov 8, 2012 at 12:12 PM, Dag Sverre Seljebotn wrote: > On 11/08/2012 01:07 PM, Gael Varoquaux wrote: >> On Thu, Nov 08, 2012 at 11:28:21AM +0000, Nathaniel Smith wrote: >>> I think everyone would be very happy to see numpy.dot modified to do >>> this automatically. But adding a scipy.dot IMHO would be fixing things >>> in the wrong place and just create extra confusion. >> >> I am not sure I agree: numpy is often compiled without lapack support, as >> it is not necessary. On the other hand scipy is always compiled with >> lapack. Thus this makes more sens in scipy. > > Well, numpy.dot already contains multiple fallback cases for when it is > compiled with BLAS and not. So I'm +1 on just making this an improvement > on numpy.dot. I don't think there's a time when you would not want to > use this (provided the output order issue is fixed), and it doesn't make > sense to not have old codes take advantage of the speed improvement. Indeed, there is no reason not to make this available in NumPy. Nicolas, can you prepare a patch for numpy ? David From francesc at continuum.io Thu Nov 8 10:17:23 2012 From: francesc at continuum.io (Francesc Alted) Date: Thu, 08 Nov 2012 16:17:23 +0100 Subject: [Numpy-discussion] numexpr question In-Reply-To: References: Message-ID: <509BCD03.1020900@continuum.io> On 11/8/12 1:37 PM, Neal Becker wrote: > I'm interested in trying numexpr, but have a question (not sure where's the best > forum to ask). > > The examples I see use > > ne.evaluate ("some string...") > > When used within a loop, I would expect the compilation from the string form to > add significant overhead. I would have thought a pre-compiled form would be > available, similar to a precompiled regexp. yes/no? numexpr comes with an internal cache for recent expressions, so if ne.evaluate() is in a loop, the compiled expression will be re-used without problems. So you don't have to worry about caching it yourself. The best forum for discussing numexpr is this: https://groups.google.com/forum/?fromgroups#!forum/numexpr -- Francesc Alted From chris.barker at noaa.gov Thu Nov 8 11:50:44 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 8 Nov 2012 08:50:44 -0800 Subject: [Numpy-discussion] testing with amd libm/acml In-Reply-To: <509B87D0.60905@continuum.io> References:

<509A75AC.3010206@astro.uio.no> <509B87D0.60905@continuum.io> Message-ID: On Thu, Nov 8, 2012 at 2:22 AM, Francesc Alted wrote: >> -- It can remove a lot of uneccessary temporary creation. > Well, the temporaries are still created, but the thing is that, by > working with small blocks at a time, these temporaries fit in CPU cache, > preventing copies into main memory. hmm -- I thought it was "smart" enough to remove some unnecessary temporaries altogether. Shows what I know. But apparently it does, indeed, avoid creating the full-size temporary arrays. pretty cool stuff, in any case. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From francesc at continuum.io Thu Nov 8 12:06:17 2012 From: francesc at continuum.io (Francesc Alted) Date: Thu, 08 Nov 2012 18:06:17 +0100 Subject: [Numpy-discussion] testing with amd libm/acml In-Reply-To: <509BA889.9070102@astro.uio.no> References:

<509A75AC.3010206@astro.uio.no> <509BA889.9070102@astro.uio.no> Message-ID: <509BE689.9050804@continuum.io> On 11/8/12 1:41 PM, Dag Sverre Seljebotn wrote: > On 11/07/2012 08:41 PM, Neal Becker wrote: >> Would you expect numexpr without MKL to give a significant boost? > If you need higher performance than what numexpr can give without using > MKL, you could look at code such as this: > > https://github.com/herumi/fmath/blob/master/fmath.hpp#L480 Hey, that's cool. I was a bit disappointed not finding this sort of work in open space. It seems that this lacks threading support, but that should be easy to implement by using OpenMP directives. -- Francesc Alted From scopatz at gmail.com Thu Nov 8 12:29:19 2012 From: scopatz at gmail.com (Anthony Scopatz) Date: Thu, 8 Nov 2012 11:29:19 -0600 Subject: [Numpy-discussion] Scipy dot In-Reply-To: References: <20121108120725.GL313@phare.normalesup.org> <509BA198.4070301@astro.uio.no> Message-ID: On Thu, Nov 8, 2012 at 7:06 AM, David Cournapeau wrote: > On Thu, Nov 8, 2012 at 12:12 PM, Dag Sverre Seljebotn > wrote: > > On 11/08/2012 01:07 PM, Gael Varoquaux wrote: > >> On Thu, Nov 08, 2012 at 11:28:21AM +0000, Nathaniel Smith wrote: > >>> I think everyone would be very happy to see numpy.dot modified to do > >>> this automatically. But adding a scipy.dot IMHO would be fixing things > >>> in the wrong place and just create extra confusion. > >> > >> I am not sure I agree: numpy is often compiled without lapack support, > as > >> it is not necessary. On the other hand scipy is always compiled with > >> lapack. Thus this makes more sens in scipy. > > > > Well, numpy.dot already contains multiple fallback cases for when it is > > compiled with BLAS and not. So I'm +1 on just making this an improvement > > on numpy.dot. I don't think there's a time when you would not want to > > use this (provided the output order issue is fixed), and it doesn't make > > sense to not have old codes take advantage of the speed improvement. > > Indeed, there is no reason not to make this available in NumPy. > > Nicolas, can you prepare a patch for numpy ? > +1, I agree, this should be a fix in numpy, not scipy. Be Well Anthony > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Thu Nov 8 12:38:47 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 08 Nov 2012 18:38:47 +0100 Subject: [Numpy-discussion] testing with amd libm/acml In-Reply-To: <509BE689.9050804@continuum.io> References:

<509A75AC.3010206@astro.uio.no> <509BA889.9070102@astro.uio.no> <509BE689.9050804@continuum.io> Message-ID: <509BEE27.4020307@astro.uio.no> On 11/08/2012 06:06 PM, Francesc Alted wrote: > On 11/8/12 1:41 PM, Dag Sverre Seljebotn wrote: >> On 11/07/2012 08:41 PM, Neal Becker wrote: >>> Would you expect numexpr without MKL to give a significant boost? >> If you need higher performance than what numexpr can give without using >> MKL, you could look at code such as this: >> >> https://github.com/herumi/fmath/blob/master/fmath.hpp#L480 > > Hey, that's cool. I was a bit disappointed not finding this sort of > work in open space. It seems that this lacks threading support, but > that should be easy to implement by using OpenMP directives. IMO this is the wrong place to introduce threading; each thread should call expd_v on its chunks. (Which I think is how you said numexpr currently uses VML anyway.) Dag Sverre From nouiz at nouiz.org Thu Nov 8 12:42:47 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Thu, 8 Nov 2012 12:42:47 -0500 Subject: [Numpy-discussion] Scipy dot In-Reply-To: References: <20121108120725.GL313@phare.normalesup.org> <509BA198.4070301@astro.uio.no>

Message-ID: Hi, I also think it should go into numpy.dot and that the output order should not be changed. A new point, what about the additional overhead for small ndarray? To remove this, I would suggest to put this code into the C function that do the actual work (at least, from memory it is a c function, not a python one). HTH Fred On Thu, Nov 8, 2012 at 12:29 PM, Anthony Scopatz wrote: > On Thu, Nov 8, 2012 at 7:06 AM, David Cournapeau wrote: > >> On Thu, Nov 8, 2012 at 12:12 PM, Dag Sverre Seljebotn >> wrote: >> > On 11/08/2012 01:07 PM, Gael Varoquaux wrote: >> >> On Thu, Nov 08, 2012 at 11:28:21AM +0000, Nathaniel Smith wrote: >> >>> I think everyone would be very happy to see numpy.dot modified to do >> >>> this automatically. But adding a scipy.dot IMHO would be fixing things >> >>> in the wrong place and just create extra confusion. >> >> >> >> I am not sure I agree: numpy is often compiled without lapack support, >> as >> >> it is not necessary. On the other hand scipy is always compiled with >> >> lapack. Thus this makes more sens in scipy. >> > >> > Well, numpy.dot already contains multiple fallback cases for when it is >> > compiled with BLAS and not. So I'm +1 on just making this an improvement >> > on numpy.dot. I don't think there's a time when you would not want to >> > use this (provided the output order issue is fixed), and it doesn't make >> > sense to not have old codes take advantage of the speed improvement. >> >> Indeed, there is no reason not to make this available in NumPy. >> >> Nicolas, can you prepare a patch for numpy ? >> > > +1, I agree, this should be a fix in numpy, not scipy. > > Be Well > Anthony > > >> >> David >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From francesc at continuum.io Thu Nov 8 12:59:41 2012 From: francesc at continuum.io (Francesc Alted) Date: Thu, 08 Nov 2012 18:59:41 +0100 Subject: [Numpy-discussion] testing with amd libm/acml In-Reply-To: <509BEE27.4020307@astro.uio.no> References:

<509A75AC.3010206@astro.uio.no> <509BA889.9070102@astro.uio.no> <509BE689.9050804@continuum.io> <509BEE27.4020307@astro.uio.no> Message-ID: <509BF30D.8060903@continuum.io> On 11/8/12 6:38 PM, Dag Sverre Seljebotn wrote: > On 11/08/2012 06:06 PM, Francesc Alted wrote: >> On 11/8/12 1:41 PM, Dag Sverre Seljebotn wrote: >>> On 11/07/2012 08:41 PM, Neal Becker wrote: >>>> Would you expect numexpr without MKL to give a significant boost? >>> If you need higher performance than what numexpr can give without using >>> MKL, you could look at code such as this: >>> >>> https://github.com/herumi/fmath/blob/master/fmath.hpp#L480 >> Hey, that's cool. I was a bit disappointed not finding this sort of >> work in open space. It seems that this lacks threading support, but >> that should be easy to implement by using OpenMP directives. > IMO this is the wrong place to introduce threading; each thread should > call expd_v on its chunks. (Which I think is how you said numexpr > currently uses VML anyway.) Oh sure, but then you need a blocked engine for performing the computations too. And yes, by default numexpr uses its own threading code rather than the existing one in VML (but that can be changed by playing with set_num_threads/set_vml_num_threads). It always stroked to me as a little strange that the internal threading in numexpr was more efficient than VML one, but I suppose this is because the latter is more optimized to deal with large blocks instead of those of medium size (4K) in numexpr. -- Francesc Alted From d.s.seljebotn at astro.uio.no Thu Nov 8 13:55:20 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 08 Nov 2012 19:55:20 +0100 Subject: [Numpy-discussion] testing with amd libm/acml In-Reply-To: <509BF30D.8060903@continuum.io> References:

<509A75AC.3010206@astro.uio.no> <509BA889.9070102@astro.uio.no> <509BE689.9050804@continuum.io> <509BEE27.4020307@astro.uio.no> <509BF30D.8060903@continuum.io> Message-ID: <509C0018.8020503@astro.uio.no> On 11/08/2012 06:59 PM, Francesc Alted wrote: > On 11/8/12 6:38 PM, Dag Sverre Seljebotn wrote: >> On 11/08/2012 06:06 PM, Francesc Alted wrote: >>> On 11/8/12 1:41 PM, Dag Sverre Seljebotn wrote: >>>> On 11/07/2012 08:41 PM, Neal Becker wrote: >>>>> Would you expect numexpr without MKL to give a significant boost? >>>> If you need higher performance than what numexpr can give without using >>>> MKL, you could look at code such as this: >>>> >>>> https://github.com/herumi/fmath/blob/master/fmath.hpp#L480 >>> Hey, that's cool. I was a bit disappointed not finding this sort of >>> work in open space. It seems that this lacks threading support, but >>> that should be easy to implement by using OpenMP directives. >> IMO this is the wrong place to introduce threading; each thread should >> call expd_v on its chunks. (Which I think is how you said numexpr >> currently uses VML anyway.) > > Oh sure, but then you need a blocked engine for performing the > computations too. And yes, by default numexpr uses its own threading I just meant that you can use a chunked OpenMP for-loop wherever in your code that you call expd_v. A "five-line blocked engine", if you like :-) IMO that's the right location since entering/exiting OpenMP blocks takes some time. > code rather than the existing one in VML (but that can be changed by > playing with set_num_threads/set_vml_num_threads). It always stroked to > me as a little strange that the internal threading in numexpr was more > efficient than VML one, but I suppose this is because the latter is more > optimized to deal with large blocks instead of those of medium size (4K) > in numexpr. I don't know enough about numexpr to understand this :-) I guess I just don't see the motivation to use VML threading or why it should be faster? If you pass a single 4K block to a threaded VML call then I could easily see lots of performance problems: a) starting/stopping threads or signalling the threads of a pool is a constant overhead per "parallel section", b) unless you're very careful to only have VML touch the data, and VML always schedules elements in the exact same way, you're going to have the cache lines of that 4K block shuffled between L1 caches of different cores for different operations... As I said, I'm mostly ignorant about how numexpr works, that's probably showing :-) Dag Sverre From d.s.seljebotn at astro.uio.no Thu Nov 8 13:56:27 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 08 Nov 2012 19:56:27 +0100 Subject: [Numpy-discussion] testing with amd libm/acml In-Reply-To: <509C0018.8020503@astro.uio.no> References:

<509A75AC.3010206@astro.uio.no> <509BA889.9070102@astro.uio.no> <509BE689.9050804@continuum.io> <509BEE27.4020307@astro.uio.no> <509BF30D.8060903@continuum.io> <509C0018.8020503@astro.uio.no> Message-ID: <509C005B.2020305@astro.uio.no> On 11/08/2012 07:55 PM, Dag Sverre Seljebotn wrote: > On 11/08/2012 06:59 PM, Francesc Alted wrote: >> On 11/8/12 6:38 PM, Dag Sverre Seljebotn wrote: >>> On 11/08/2012 06:06 PM, Francesc Alted wrote: >>>> On 11/8/12 1:41 PM, Dag Sverre Seljebotn wrote: >>>>> On 11/07/2012 08:41 PM, Neal Becker wrote: >>>>>> Would you expect numexpr without MKL to give a significant boost? >>>>> If you need higher performance than what numexpr can give without >>>>> using >>>>> MKL, you could look at code such as this: >>>>> >>>>> https://github.com/herumi/fmath/blob/master/fmath.hpp#L480 >>>> Hey, that's cool. I was a bit disappointed not finding this sort of >>>> work in open space. It seems that this lacks threading support, but >>>> that should be easy to implement by using OpenMP directives. >>> IMO this is the wrong place to introduce threading; each thread should >>> call expd_v on its chunks. (Which I think is how you said numexpr >>> currently uses VML anyway.) >> >> Oh sure, but then you need a blocked engine for performing the >> computations too. And yes, by default numexpr uses its own threading > > I just meant that you can use a chunked OpenMP for-loop wherever in your > code that you call expd_v. A "five-line blocked engine", if you like :-) > > IMO that's the right location since entering/exiting OpenMP blocks takes > some time. > >> code rather than the existing one in VML (but that can be changed by >> playing with set_num_threads/set_vml_num_threads). It always stroked to >> me as a little strange that the internal threading in numexpr was more >> efficient than VML one, but I suppose this is because the latter is more >> optimized to deal with large blocks instead of those of medium size (4K) >> in numexpr. > > I don't know enough about numexpr to understand this :-) > > I guess I just don't see the motivation to use VML threading or why it > should be faster? If you pass a single 4K block to a threaded VML call > then I could easily see lots of performance problems: a) > starting/stopping threads or signalling the threads of a pool is a > constant overhead per "parallel section", b) unless you're very careful > to only have VML touch the data, and VML always schedules elements in > the exact same way, you're going to have the cache lines of that 4K > block shuffled between L1 caches of different cores for different > operations... c) Your "effective block size" is then 4KB/ncores. (Unless you scale the block size by ncores). DS From francesc at continuum.io Thu Nov 8 14:37:30 2012 From: francesc at continuum.io (Francesc Alted) Date: Thu, 08 Nov 2012 20:37:30 +0100 Subject: [Numpy-discussion] testing with amd libm/acml In-Reply-To: <509C0018.8020503@astro.uio.no> References:

<509A75AC.3010206@astro.uio.no> <509BA889.9070102@astro.uio.no> <509BE689.9050804@continuum.io> <509BEE27.4020307@astro.uio.no> <509BF30D.8060903@continuum.io> <509C0018.8020503@astro.uio.no> Message-ID: <509C09FA.8090409@continuum.io> On 11/8/12 7:55 PM, Dag Sverre Seljebotn wrote: > On 11/08/2012 06:59 PM, Francesc Alted wrote: >> On 11/8/12 6:38 PM, Dag Sverre Seljebotn wrote: >>> On 11/08/2012 06:06 PM, Francesc Alted wrote: >>>> On 11/8/12 1:41 PM, Dag Sverre Seljebotn wrote: >>>>> On 11/07/2012 08:41 PM, Neal Becker wrote: >>>>>> Would you expect numexpr without MKL to give a significant boost? >>>>> If you need higher performance than what numexpr can give without using >>>>> MKL, you could look at code such as this: >>>>> >>>>> https://github.com/herumi/fmath/blob/master/fmath.hpp#L480 >>>> Hey, that's cool. I was a bit disappointed not finding this sort of >>>> work in open space. It seems that this lacks threading support, but >>>> that should be easy to implement by using OpenMP directives. >>> IMO this is the wrong place to introduce threading; each thread should >>> call expd_v on its chunks. (Which I think is how you said numexpr >>> currently uses VML anyway.) >> Oh sure, but then you need a blocked engine for performing the >> computations too. And yes, by default numexpr uses its own threading > I just meant that you can use a chunked OpenMP for-loop wherever in your > code that you call expd_v. A "five-line blocked engine", if you like :-) > > IMO that's the right location since entering/exiting OpenMP blocks takes > some time. Yes, I meant precisely this first hand. >> code rather than the existing one in VML (but that can be changed by >> playing with set_num_threads/set_vml_num_threads). It always stroked to >> me as a little strange that the internal threading in numexpr was more >> efficient than VML one, but I suppose this is because the latter is more >> optimized to deal with large blocks instead of those of medium size (4K) >> in numexpr. > I don't know enough about numexpr to understand this :-) > > I guess I just don't see the motivation to use VML threading or why it > should be faster? If you pass a single 4K block to a threaded VML call > then I could easily see lots of performance problems: a) > starting/stopping threads or signalling the threads of a pool is a > constant overhead per "parallel section", b) unless you're very careful > to only have VML touch the data, and VML always schedules elements in > the exact same way, you're going to have the cache lines of that 4K > block shuffled between L1 caches of different cores for different > operations... > > As I said, I'm mostly ignorant about how numexpr works, that's probably > showing :-) No, on the contrary, you rather hit the core of the issue (or part of it). On one hand, VML needs large blocks in order to maximize the performance of the pipeline and in the other hand numexpr tries to minimize block size in order to make temporaries as small as possible (so avoiding the use of the higher level caches). From this tension (and some benchmarking work) the size of 4K (btw, this is the number of *elements*, so the size is actually either 16 KB and 32 KB for single and double precision respectively) was derived. Incidentally, for numexpr with no VML support, the size is reduced to 1K elements (and perhaps it could be reduced a bit more, but anyways). Anyway, this is way too low level to be discussed here, although we can continue on the numexpr list if you are interested in more details. -- Francesc Alted From scheffer.nicolas at gmail.com Thu Nov 8 14:38:31 2012 From: scheffer.nicolas at gmail.com (Nicolas SCHEFFER) Date: Thu, 8 Nov 2012 11:38:31 -0800 Subject: [Numpy-discussion] Scipy dot In-Reply-To: References: <20121108120725.GL313@phare.normalesup.org> <509BA198.4070301@astro.uio.no>

Message-ID: Thanks for all the responses folks. This is indeed a nice problem to solve. Few points: I. Change the order from 'F' to 'C': I'll look into it. II. Integration with scipy / numpy: opinions are diverging here. Let's wait a bit to get more responses on what people think. One thing though: I'd need the same functionality as get_blas_funcs in numpy. Since numpy does not require lapack, what functions can I get? III. Complex arrays I unfortunately don't have enough knowledge here. If someone could propose a fix, that'd be great. IV. C Writing this in C sounds like a good idea. I'm not sure I'd be the right person to this though. V. Patch in numpy I'd love to do that and learn to do it as a byproduct. Let's make sure we agree this can go in numpy first and that all FIXME can be fixed. Although I guess we can resolve fixmes using git. Let me know how you'd like to proceed, Thanks! FIXMEs: - Fix for ndim != 2 - Fix for dtype == np.complex* - Fix order of output array On Thu, Nov 8, 2012 at 9:42 AM, Fr?d?ric Bastien wrote: > Hi, > > I also think it should go into numpy.dot and that the output order should > not be changed. > > A new point, what about the additional overhead for small ndarray? To remove > this, I would suggest to put this code into the C function that do the > actual work (at least, from memory it is a c function, not a python one). > > HTH > > Fred > > > > On Thu, Nov 8, 2012 at 12:29 PM, Anthony Scopatz wrote: >> >> On Thu, Nov 8, 2012 at 7:06 AM, David Cournapeau >> wrote: >>> >>> On Thu, Nov 8, 2012 at 12:12 PM, Dag Sverre Seljebotn >>> wrote: >>> > On 11/08/2012 01:07 PM, Gael Varoquaux wrote: >>> >> On Thu, Nov 08, 2012 at 11:28:21AM +0000, Nathaniel Smith wrote: >>> >>> I think everyone would be very happy to see numpy.dot modified to do >>> >>> this automatically. But adding a scipy.dot IMHO would be fixing >>> >>> things >>> >>> in the wrong place and just create extra confusion. >>> >> >>> >> I am not sure I agree: numpy is often compiled without lapack support, >>> >> as >>> >> it is not necessary. On the other hand scipy is always compiled with >>> >> lapack. Thus this makes more sens in scipy. >>> > >>> > Well, numpy.dot already contains multiple fallback cases for when it is >>> > compiled with BLAS and not. So I'm +1 on just making this an >>> > improvement >>> > on numpy.dot. I don't think there's a time when you would not want to >>> > use this (provided the output order issue is fixed), and it doesn't >>> > make >>> > sense to not have old codes take advantage of the speed improvement. >>> >>> Indeed, there is no reason not to make this available in NumPy. >>> >>> Nicolas, can you prepare a patch for numpy ? >> >> >> +1, I agree, this should be a fix in numpy, not scipy. >> >> Be Well >> Anthony >> >>> >>> >>> David >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From scheffer.nicolas at gmail.com Thu Nov 8 15:06:04 2012 From: scheffer.nicolas at gmail.com (Nicolas SCHEFFER) Date: Thu, 8 Nov 2012 12:06:04 -0800 Subject: [Numpy-discussion] Scipy dot In-Reply-To: References: <20121108120725.GL313@phare.normalesup.org> <509BA198.4070301@astro.uio.no>

Message-ID: I've made the necessary changes to get the proper order for the output array. Also, a pass of pep8 and some tests (fixmes are in failing tests) http://pastebin.com/M8TfbURi -n On Thu, Nov 8, 2012 at 11:38 AM, Nicolas SCHEFFER wrote: > Thanks for all the responses folks. This is indeed a nice problem to solve. > > Few points: > I. Change the order from 'F' to 'C': I'll look into it. > II. Integration with scipy / numpy: opinions are diverging here. > Let's wait a bit to get more responses on what people think. > One thing though: I'd need the same functionality as get_blas_funcs in numpy. > Since numpy does not require lapack, what functions can I get? > III. Complex arrays > I unfortunately don't have enough knowledge here. If someone could > propose a fix, that'd be great. > IV. C > Writing this in C sounds like a good idea. I'm not sure I'd be the > right person to this though. > V. Patch in numpy > I'd love to do that and learn to do it as a byproduct. > Let's make sure we agree this can go in numpy first and that all FIXME > can be fixed. > Although I guess we can resolve fixmes using git. > > Let me know how you'd like to proceed, > > Thanks! > > FIXMEs: > - Fix for ndim != 2 > - Fix for dtype == np.complex* > - Fix order of output array > > On Thu, Nov 8, 2012 at 9:42 AM, Fr?d?ric Bastien wrote: >> Hi, >> >> I also think it should go into numpy.dot and that the output order should >> not be changed. >> >> A new point, what about the additional overhead for small ndarray? To remove >> this, I would suggest to put this code into the C function that do the >> actual work (at least, from memory it is a c function, not a python one). >> >> HTH >> >> Fred >> >> >> >> On Thu, Nov 8, 2012 at 12:29 PM, Anthony Scopatz wrote: >>> >>> On Thu, Nov 8, 2012 at 7:06 AM, David Cournapeau >>> wrote: >>>> >>>> On Thu, Nov 8, 2012 at 12:12 PM, Dag Sverre Seljebotn >>>> wrote: >>>> > On 11/08/2012 01:07 PM, Gael Varoquaux wrote: >>>> >> On Thu, Nov 08, 2012 at 11:28:21AM +0000, Nathaniel Smith wrote: >>>> >>> I think everyone would be very happy to see numpy.dot modified to do >>>> >>> this automatically. But adding a scipy.dot IMHO would be fixing >>>> >>> things >>>> >>> in the wrong place and just create extra confusion. >>>> >> >>>> >> I am not sure I agree: numpy is often compiled without lapack support, >>>> >> as >>>> >> it is not necessary. On the other hand scipy is always compiled with >>>> >> lapack. Thus this makes more sens in scipy. >>>> > >>>> > Well, numpy.dot already contains multiple fallback cases for when it is >>>> > compiled with BLAS and not. So I'm +1 on just making this an >>>> > improvement >>>> > on numpy.dot. I don't think there's a time when you would not want to >>>> > use this (provided the output order issue is fixed), and it doesn't >>>> > make >>>> > sense to not have old codes take advantage of the speed improvement. >>>> >>>> Indeed, there is no reason not to make this available in NumPy. >>>> >>>> Nicolas, can you prepare a patch for numpy ? >>> >>> >>> +1, I agree, this should be a fix in numpy, not scipy. >>> >>> Be Well >>> Anthony >>> >>>> >>>> >>>> David >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> From scheffer.nicolas at gmail.com Thu Nov 8 17:44:17 2012 From: scheffer.nicolas at gmail.com (Nicolas SCHEFFER) Date: Thu, 8 Nov 2012 14:44:17 -0800 Subject: [Numpy-discussion] Scipy dot In-Reply-To: References: <20121108120725.GL313@phare.normalesup.org> <509BA198.4070301@astro.uio.no>

Message-ID: Well, hinted by what Fabien said, I looked at the C level dot function. Quite verbose! But starting line 757, we can see that it shouldn't be too much work to fix that bug (well there is even a comment there that states just that) https://github.com/numpy/numpy/blob/master/numpy/core/blasdot/_dotblas.c#L757 I now think that should be the cleanest. This would only work for gemm though. I don't know what the benefit is for gemv for instance, but we should make that kind of changes everywhere we can. The evil PyArray_Copy is there twice and that's what we want to get rid of. I'm not sure, but it looks to me that removing the copy and doing the following would do the work: Order = CblasRowMajor; Trans1 = CblasNoTrans; Trans2 = CblasNoTrans; if (!PyArray_ISCONTIGUOUS(ap1)) { Trans1 = CblasTrans; } if (!PyArray_ISCONTIGUOUS(ap2)) { Trans2 = CblasTrans; } might be too easy to be true. On Thu, Nov 8, 2012 at 12:06 PM, Nicolas SCHEFFER wrote: > I've made the necessary changes to get the proper order for the output array. > Also, a pass of pep8 and some tests (fixmes are in failing tests) > http://pastebin.com/M8TfbURi > > -n > > On Thu, Nov 8, 2012 at 11:38 AM, Nicolas SCHEFFER > wrote: >> Thanks for all the responses folks. This is indeed a nice problem to solve. >> >> Few points: >> I. Change the order from 'F' to 'C': I'll look into it. >> II. Integration with scipy / numpy: opinions are diverging here. >> Let's wait a bit to get more responses on what people think. >> One thing though: I'd need the same functionality as get_blas_funcs in numpy. >> Since numpy does not require lapack, what functions can I get? >> III. Complex arrays >> I unfortunately don't have enough knowledge here. If someone could >> propose a fix, that'd be great. >> IV. C >> Writing this in C sounds like a good idea. I'm not sure I'd be the >> right person to this though. >> V. Patch in numpy >> I'd love to do that and learn to do it as a byproduct. >> Let's make sure we agree this can go in numpy first and that all FIXME >> can be fixed. >> Although I guess we can resolve fixmes using git. >> >> Let me know how you'd like to proceed, >> >> Thanks! >> >> FIXMEs: >> - Fix for ndim != 2 >> - Fix for dtype == np.complex* >> - Fix order of output array >> >> On Thu, Nov 8, 2012 at 9:42 AM, Fr?d?ric Bastien wrote: >>> Hi, >>> >>> I also think it should go into numpy.dot and that the output order should >>> not be changed. >>> >>> A new point, what about the additional overhead for small ndarray? To remove >>> this, I would suggest to put this code into the C function that do the >>> actual work (at least, from memory it is a c function, not a python one). >>> >>> HTH >>> >>> Fred >>> >>> >>> >>> On Thu, Nov 8, 2012 at 12:29 PM, Anthony Scopatz wrote: >>>> >>>> On Thu, Nov 8, 2012 at 7:06 AM, David Cournapeau >>>> wrote: >>>>> >>>>> On Thu, Nov 8, 2012 at 12:12 PM, Dag Sverre Seljebotn >>>>> wrote: >>>>> > On 11/08/2012 01:07 PM, Gael Varoquaux wrote: >>>>> >> On Thu, Nov 08, 2012 at 11:28:21AM +0000, Nathaniel Smith wrote: >>>>> >>> I think everyone would be very happy to see numpy.dot modified to do >>>>> >>> this automatically. But adding a scipy.dot IMHO would be fixing >>>>> >>> things >>>>> >>> in the wrong place and just create extra confusion. >>>>> >> >>>>> >> I am not sure I agree: numpy is often compiled without lapack support, >>>>> >> as >>>>> >> it is not necessary. On the other hand scipy is always compiled with >>>>> >> lapack. Thus this makes more sens in scipy. >>>>> > >>>>> > Well, numpy.dot already contains multiple fallback cases for when it is >>>>> > compiled with BLAS and not. So I'm +1 on just making this an >>>>> > improvement >>>>> > on numpy.dot. I don't think there's a time when you would not want to >>>>> > use this (provided the output order issue is fixed), and it doesn't >>>>> > make >>>>> > sense to not have old codes take advantage of the speed improvement. >>>>> >>>>> Indeed, there is no reason not to make this available in NumPy. >>>>> >>>>> Nicolas, can you prepare a patch for numpy ? >>>> >>>> >>>> +1, I agree, this should be a fix in numpy, not scipy. >>>> >>>> Be Well >>>> Anthony >>>> >>>>> >>>>> >>>>> David >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> From sebastian at sipsolutions.net Thu Nov 8 18:24:43 2012 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 09 Nov 2012 00:24:43 +0100 Subject: [Numpy-discussion] Scipy dot In-Reply-To: References: <20121108120725.GL313@phare.normalesup.org> <509BA198.4070301@astro.uio.no>

Message-ID: <1352417083.30611.4.camel@sebastian-laptop> Hey, On Thu, 2012-11-08 at 14:44 -0800, Nicolas SCHEFFER wrote: > Well, hinted by what Fabien said, I looked at the C level dot function. > Quite verbose! > > But starting line 757, we can see that it shouldn't be too much work > to fix that bug (well there is even a comment there that states just > that) > https://github.com/numpy/numpy/blob/master/numpy/core/blasdot/_dotblas.c#L757 > I now think that should be the cleanest. > > This would only work for gemm though. I don't know what the benefit is > for gemv for instance, but we should make that kind of changes > everywhere we can. > The evil PyArray_Copy is there twice and that's what we want to get rid of. > > I'm not sure, but it looks to me that removing the copy and doing the > following would do the work: > Order = CblasRowMajor; > Trans1 = CblasNoTrans; > Trans2 = CblasNoTrans; > if (!PyArray_ISCONTIGUOUS(ap1)) { > Trans1 = CblasTrans; > } > if (!PyArray_ISCONTIGUOUS(ap2)) { > Trans2 = CblasTrans; > } > might be too easy to be true. > Sounds nice, though don't forget that the array may also be neither C- or F-Contiguous, in which case you need a copy in any case. So it would probably be more like: if (PyArray_IS_C_CONTIGUOUS(ap1)) { Trans1 = CblasNoTrans; } else if (PyArray_IS_F_CONTIGUOUS(ap1)) { Trans1 = CblasTrans; } else { Trans1 = CblasNoTrans; PyObject *new = PyArray_Copy(ap1); Py_DECREF(ap1); ap1 = (PyArrayObject *)new; } Regards, Sebastian > > > On Thu, Nov 8, 2012 at 12:06 PM, Nicolas SCHEFFER > wrote: > > I've made the necessary changes to get the proper order for the output array. > > Also, a pass of pep8 and some tests (fixmes are in failing tests) > > http://pastebin.com/M8TfbURi > > > > -n > > > > On Thu, Nov 8, 2012 at 11:38 AM, Nicolas SCHEFFER > > wrote: > >> Thanks for all the responses folks. This is indeed a nice problem to solve. > >> > >> Few points: > >> I. Change the order from 'F' to 'C': I'll look into it. > >> II. Integration with scipy / numpy: opinions are diverging here. > >> Let's wait a bit to get more responses on what people think. > >> One thing though: I'd need the same functionality as get_blas_funcs in numpy. > >> Since numpy does not require lapack, what functions can I get? > >> III. Complex arrays > >> I unfortunately don't have enough knowledge here. If someone could > >> propose a fix, that'd be great. > >> IV. C > >> Writing this in C sounds like a good idea. I'm not sure I'd be the > >> right person to this though. > >> V. Patch in numpy > >> I'd love to do that and learn to do it as a byproduct. > >> Let's make sure we agree this can go in numpy first and that all FIXME > >> can be fixed. > >> Although I guess we can resolve fixmes using git. > >> > >> Let me know how you'd like to proceed, > >> > >> Thanks! > >> > >> FIXMEs: > >> - Fix for ndim != 2 > >> - Fix for dtype == np.complex* > >> - Fix order of output array > >> > >> On Thu, Nov 8, 2012 at 9:42 AM, Fr?d?ric Bastien wrote: > >>> Hi, > >>> > >>> I also think it should go into numpy.dot and that the output order should > >>> not be changed. > >>> > >>> A new point, what about the additional overhead for small ndarray? To remove > >>> this, I would suggest to put this code into the C function that do the > >>> actual work (at least, from memory it is a c function, not a python one). > >>> > >>> HTH > >>> > >>> Fred > >>> > >>> > >>> > >>> On Thu, Nov 8, 2012 at 12:29 PM, Anthony Scopatz wrote: > >>>> > >>>> On Thu, Nov 8, 2012 at 7:06 AM, David Cournapeau > >>>> wrote: > >>>>> > >>>>> On Thu, Nov 8, 2012 at 12:12 PM, Dag Sverre Seljebotn > >>>>> wrote: > >>>>> > On 11/08/2012 01:07 PM, Gael Varoquaux wrote: > >>>>> >> On Thu, Nov 08, 2012 at 11:28:21AM +0000, Nathaniel Smith wrote: > >>>>> >>> I think everyone would be very happy to see numpy.dot modified to do > >>>>> >>> this automatically. But adding a scipy.dot IMHO would be fixing > >>>>> >>> things > >>>>> >>> in the wrong place and just create extra confusion. > >>>>> >> > >>>>> >> I am not sure I agree: numpy is often compiled without lapack support, > >>>>> >> as > >>>>> >> it is not necessary. On the other hand scipy is always compiled with > >>>>> >> lapack. Thus this makes more sens in scipy. > >>>>> > > >>>>> > Well, numpy.dot already contains multiple fallback cases for when it is > >>>>> > compiled with BLAS and not. So I'm +1 on just making this an > >>>>> > improvement > >>>>> > on numpy.dot. I don't think there's a time when you would not want to > >>>>> > use this (provided the output order issue is fixed), and it doesn't > >>>>> > make > >>>>> > sense to not have old codes take advantage of the speed improvement. > >>>>> > >>>>> Indeed, there is no reason not to make this available in NumPy. > >>>>> > >>>>> Nicolas, can you prepare a patch for numpy ? > >>>> > >>>> > >>>> +1, I agree, this should be a fix in numpy, not scipy. > >>>> > >>>> Be Well > >>>> Anthony > >>>> > >>>>> > >>>>> > >>>>> David > >>>>> _______________________________________________ > >>>>> NumPy-Discussion mailing list > >>>>> NumPy-Discussion at scipy.org > >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> NumPy-Discussion mailing list > >>>> NumPy-Discussion at scipy.org > >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >>>> > >>> > >>> > >>> _______________________________________________ > >>> NumPy-Discussion mailing list > >>> NumPy-Discussion at scipy.org > >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >>> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sebastian at sipsolutions.net Thu Nov 8 18:58:29 2012 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 09 Nov 2012 00:58:29 +0100 Subject: [Numpy-discussion] Scipy dot In-Reply-To: <1352417083.30611.4.camel@sebastian-laptop> References: <20121108120725.GL313@phare.normalesup.org> <509BA198.4070301@astro.uio.no>

<1352417083.30611.4.camel@sebastian-laptop> Message-ID: <1352419109.30611.10.camel@sebastian-laptop> On Fri, 2012-11-09 at 00:24 +0100, Sebastian Berg wrote: > Hey, > > On Thu, 2012-11-08 at 14:44 -0800, Nicolas SCHEFFER wrote: > > Well, hinted by what Fabien said, I looked at the C level dot function. > > Quite verbose! > > > > But starting line 757, we can see that it shouldn't be too much work > > to fix that bug (well there is even a comment there that states just > > that) > > https://github.com/numpy/numpy/blob/master/numpy/core/blasdot/_dotblas.c#L757 > > I now think that should be the cleanest. > > > > This would only work for gemm though. I don't know what the benefit is > > for gemv for instance, but we should make that kind of changes > > everywhere we can. > > The evil PyArray_Copy is there twice and that's what we want to get rid of. > > > > I'm not sure, but it looks to me that removing the copy and doing the > > following would do the work: > > Order = CblasRowMajor; > > Trans1 = CblasNoTrans; > > Trans2 = CblasNoTrans; > > if (!PyArray_ISCONTIGUOUS(ap1)) { > > Trans1 = CblasTrans; > > } > > if (!PyArray_ISCONTIGUOUS(ap2)) { > > Trans2 = CblasTrans; > > } > > might be too easy to be true. > > > > Sounds nice, though don't forget that the array may also be neither C- > or F-Contiguous, in which case you need a copy in any case. So it would > probably be more like: > > if (PyArray_IS_C_CONTIGUOUS(ap1)) { > Trans1 = CblasNoTrans; > } > else if (PyArray_IS_F_CONTIGUOUS(ap1)) { > Trans1 = CblasTrans; > } > else { > Trans1 = CblasNoTrans; > PyObject *new = PyArray_Copy(ap1); > Py_DECREF(ap1); > ap1 = (PyArrayObject *)new; > } > Well, of course I forgot error checking there, and maybe you need to set some of the other parameters differently, but it looks like its probably that easy, and I am sure everyone will welcome a PR with such changes. > Regards, > > Sebastian > > > > > > > On Thu, Nov 8, 2012 at 12:06 PM, Nicolas SCHEFFER > > wrote: > > > I've made the necessary changes to get the proper order for the output array. > > > Also, a pass of pep8 and some tests (fixmes are in failing tests) > > > http://pastebin.com/M8TfbURi > > > > > > -n > > > > > > On Thu, Nov 8, 2012 at 11:38 AM, Nicolas SCHEFFER > > > wrote: > > >> Thanks for all the responses folks. This is indeed a nice problem to solve. > > >> > > >> Few points: > > >> I. Change the order from 'F' to 'C': I'll look into it. > > >> II. Integration with scipy / numpy: opinions are diverging here. > > >> Let's wait a bit to get more responses on what people think. > > >> One thing though: I'd need the same functionality as get_blas_funcs in numpy. > > >> Since numpy does not require lapack, what functions can I get? > > >> III. Complex arrays > > >> I unfortunately don't have enough knowledge here. If someone could > > >> propose a fix, that'd be great. > > >> IV. C > > >> Writing this in C sounds like a good idea. I'm not sure I'd be the > > >> right person to this though. > > >> V. Patch in numpy > > >> I'd love to do that and learn to do it as a byproduct. > > >> Let's make sure we agree this can go in numpy first and that all FIXME > > >> can be fixed. > > >> Although I guess we can resolve fixmes using git. > > >> > > >> Let me know how you'd like to proceed, > > >> > > >> Thanks! > > >> > > >> FIXMEs: > > >> - Fix for ndim != 2 > > >> - Fix for dtype == np.complex* > > >> - Fix order of output array > > >> > > >> On Thu, Nov 8, 2012 at 9:42 AM, Fr?d?ric Bastien wrote: > > >>> Hi, > > >>> > > >>> I also think it should go into numpy.dot and that the output order should > > >>> not be changed. > > >>> > > >>> A new point, what about the additional overhead for small ndarray? To remove > > >>> this, I would suggest to put this code into the C function that do the > > >>> actual work (at least, from memory it is a c function, not a python one). > > >>> > > >>> HTH > > >>> > > >>> Fred > > >>> > > >>> > > >>> > > >>> On Thu, Nov 8, 2012 at 12:29 PM, Anthony Scopatz wrote: > > >>>> > > >>>> On Thu, Nov 8, 2012 at 7:06 AM, David Cournapeau > > >>>> wrote: > > >>>>> > > >>>>> On Thu, Nov 8, 2012 at 12:12 PM, Dag Sverre Seljebotn > > >>>>> wrote: > > >>>>> > On 11/08/2012 01:07 PM, Gael Varoquaux wrote: > > >>>>> >> On Thu, Nov 08, 2012 at 11:28:21AM +0000, Nathaniel Smith wrote: > > >>>>> >>> I think everyone would be very happy to see numpy.dot modified to do > > >>>>> >>> this automatically. But adding a scipy.dot IMHO would be fixing > > >>>>> >>> things > > >>>>> >>> in the wrong place and just create extra confusion. > > >>>>> >> > > >>>>> >> I am not sure I agree: numpy is often compiled without lapack support, > > >>>>> >> as > > >>>>> >> it is not necessary. On the other hand scipy is always compiled with > > >>>>> >> lapack. Thus this makes more sens in scipy. > > >>>>> > > > >>>>> > Well, numpy.dot already contains multiple fallback cases for when it is > > >>>>> > compiled with BLAS and not. So I'm +1 on just making this an > > >>>>> > improvement > > >>>>> > on numpy.dot. I don't think there's a time when you would not want to > > >>>>> > use this (provided the output order issue is fixed), and it doesn't > > >>>>> > make > > >>>>> > sense to not have old codes take advantage of the speed improvement. > > >>>>> > > >>>>> Indeed, there is no reason not to make this available in NumPy. > > >>>>> > > >>>>> Nicolas, can you prepare a patch for numpy ? > > >>>> > > >>>> > > >>>> +1, I agree, this should be a fix in numpy, not scipy. > > >>>> > > >>>> Be Well > > >>>> Anthony > > >>>> > > >>>>> > > >>>>> > > >>>>> David > > >>>>> _______________________________________________ > > >>>>> NumPy-Discussion mailing list > > >>>>> NumPy-Discussion at scipy.org > > >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > >>>> > > >>>> > > >>>> > > >>>> _______________________________________________ > > >>>> NumPy-Discussion mailing list > > >>>> NumPy-Discussion at scipy.org > > >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > >>>> > > >>> > > >>> > > >>> _______________________________________________ > > >>> NumPy-Discussion mailing list > > >>> NumPy-Discussion at scipy.org > > >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > >>> > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From scheffer.nicolas at gmail.com Thu Nov 8 20:03:45 2012 From: scheffer.nicolas at gmail.com (Nicolas SCHEFFER) Date: Thu, 8 Nov 2012 17:03:45 -0800 Subject: [Numpy-discussion] Scipy dot In-Reply-To: <1352419109.30611.10.camel@sebastian-laptop> References: <20121108120725.GL313@phare.normalesup.org> <509BA198.4070301@astro.uio.no>

<1352417083.30611.4.camel@sebastian-laptop> <1352419109.30611.10.camel@sebastian-laptop> Message-ID: Thanks Sebastien, didn't think of that. Well I went ahead and tried the change, and it's indeed straightforward. I've run some tests, among which: nosetests numpy/numpy/core/tests/test_blasdot.py and it looks ok. I'm assuming this is good news. I've copy-pasting the diff below, but I have that in my branch and can create a PR if we agree on it. I still cannot believe it's that easy (well this has been bugging me a while... ;)) So I wouldn't mind waiting a day or two to see reactions on the list before moving ahead. diff --git a/numpy/core/blasdot/_dotblas.c b/numpy/core/blasdot/_dotblas.c index c73dd6a..2b4be7c 100644 --- a/numpy/core/blasdot/_dotblas.c +++ b/numpy/core/blasdot/_dotblas.c @@ -770,7 +770,7 @@ dotblas_matrixproduct(PyObject *NPY_UNUSED(dummy), PyObject *args, PyObject* kwa * using appropriate values of Order, Trans1, and Trans2. */ - if (!PyArray_ISCONTIGUOUS(ap2)) { + if (!PyArray_IS_C_CONTIGUOUS(ap2) && !PyArray_IS_F_CONTIGUOUS(ap2)) { PyObject *new = PyArray_Copy(ap2); Py_DECREF(ap2); @@ -779,7 +779,7 @@ dotblas_matrixproduct(PyObject *NPY_UNUSED(dummy), PyObject *args, PyObject* kwa goto fail; } } - if (!PyArray_ISCONTIGUOUS(ap1)) { + if (!PyArray_IS_C_CONTIGUOUS(ap1) && !PyArray_IS_F_CONTIGUOUS(ap1)) { PyObject *new = PyArray_Copy(ap1); Py_DECREF(ap1); @@ -800,6 +800,19 @@ dotblas_matrixproduct(PyObject *NPY_UNUSED(dummy), PyObject *args, PyObject* kwa lda = (PyArray_DIM(ap1, 1) > 1 ? PyArray_DIM(ap1, 1) : 1); ldb = (PyArray_DIM(ap2, 1) > 1 ? PyArray_DIM(ap2, 1) : 1); ldc = (PyArray_DIM(ret, 1) > 1 ? PyArray_DIM(ret, 1) : 1); + + /* + * Avoid temporary copies for arrays in Fortran order + */ + if (PyArray_IS_F_CONTIGUOUS(ap1)) { + Trans1 = CblasTrans; + lda = (PyArray_DIM(ap1, 0) > 1 ? PyArray_DIM(ap1, 0) : 1); + } + if (PyArray_IS_F_CONTIGUOUS(ap2)) { + Trans2 = CblasTrans; + ldb = (PyArray_DIM(ap2, 0) > 1 ? PyArray_DIM(ap2, 0) : 1); + } + if (typenum == NPY_DOUBLE) { cblas_dgemm(Order, Trans1, Trans2, L, N, M, On Thu, Nov 8, 2012 at 3:58 PM, Sebastian Berg wrote: > On Fri, 2012-11-09 at 00:24 +0100, Sebastian Berg wrote: >> Hey, >> >> On Thu, 2012-11-08 at 14:44 -0800, Nicolas SCHEFFER wrote: >> > Well, hinted by what Fabien said, I looked at the C level dot function. >> > Quite verbose! >> > >> > But starting line 757, we can see that it shouldn't be too much work >> > to fix that bug (well there is even a comment there that states just >> > that) >> > https://github.com/numpy/numpy/blob/master/numpy/core/blasdot/_dotblas.c#L757 >> > I now think that should be the cleanest. >> > >> > This would only work for gemm though. I don't know what the benefit is >> > for gemv for instance, but we should make that kind of changes >> > everywhere we can. >> > The evil PyArray_Copy is there twice and that's what we want to get rid of. >> > >> > I'm not sure, but it looks to me that removing the copy and doing the >> > following would do the work: >> > Order = CblasRowMajor; >> > Trans1 = CblasNoTrans; >> > Trans2 = CblasNoTrans; >> > if (!PyArray_ISCONTIGUOUS(ap1)) { >> > Trans1 = CblasTrans; >> > } >> > if (!PyArray_ISCONTIGUOUS(ap2)) { >> > Trans2 = CblasTrans; >> > } >> > might be too easy to be true. >> > >> >> Sounds nice, though don't forget that the array may also be neither C- >> or F-Contiguous, in which case you need a copy in any case. So it would >> probably be more like: >> >> if (PyArray_IS_C_CONTIGUOUS(ap1)) { >> Trans1 = CblasNoTrans; >> } >> else if (PyArray_IS_F_CONTIGUOUS(ap1)) { >> Trans1 = CblasTrans; >> } >> else { >> Trans1 = CblasNoTrans; >> PyObject *new = PyArray_Copy(ap1); >> Py_DECREF(ap1); >> ap1 = (PyArrayObject *)new; >> } >> > > Well, of course I forgot error checking there, and maybe you need to set > some of the other parameters differently, but it looks like its probably > that easy, and I am sure everyone will welcome a PR with such changes. > >> Regards, >> >> Sebastian >> >> > >> > >> > On Thu, Nov 8, 2012 at 12:06 PM, Nicolas SCHEFFER >> > wrote: >> > > I've made the necessary changes to get the proper order for the output array. >> > > Also, a pass of pep8 and some tests (fixmes are in failing tests) >> > > http://pastebin.com/M8TfbURi >> > > >> > > -n >> > > >> > > On Thu, Nov 8, 2012 at 11:38 AM, Nicolas SCHEFFER >> > > wrote: >> > >> Thanks for all the responses folks. This is indeed a nice problem to solve. >> > >> >> > >> Few points: >> > >> I. Change the order from 'F' to 'C': I'll look into it. >> > >> II. Integration with scipy / numpy: opinions are diverging here. >> > >> Let's wait a bit to get more responses on what people think. >> > >> One thing though: I'd need the same functionality as get_blas_funcs in numpy. >> > >> Since numpy does not require lapack, what functions can I get? >> > >> III. Complex arrays >> > >> I unfortunately don't have enough knowledge here. If someone could >> > >> propose a fix, that'd be great. >> > >> IV. C >> > >> Writing this in C sounds like a good idea. I'm not sure I'd be the >> > >> right person to this though. >> > >> V. Patch in numpy >> > >> I'd love to do that and learn to do it as a byproduct. >> > >> Let's make sure we agree this can go in numpy first and that all FIXME >> > >> can be fixed. >> > >> Although I guess we can resolve fixmes using git. >> > >> >> > >> Let me know how you'd like to proceed, >> > >> >> > >> Thanks! >> > >> >> > >> FIXMEs: >> > >> - Fix for ndim != 2 >> > >> - Fix for dtype == np.complex* >> > >> - Fix order of output array >> > >> >> > >> On Thu, Nov 8, 2012 at 9:42 AM, Fr?d?ric Bastien wrote: >> > >>> Hi, >> > >>> >> > >>> I also think it should go into numpy.dot and that the output order should >> > >>> not be changed. >> > >>> >> > >>> A new point, what about the additional overhead for small ndarray? To remove >> > >>> this, I would suggest to put this code into the C function that do the >> > >>> actual work (at least, from memory it is a c function, not a python one). >> > >>> >> > >>> HTH >> > >>> >> > >>> Fred >> > >>> >> > >>> >> > >>> >> > >>> On Thu, Nov 8, 2012 at 12:29 PM, Anthony Scopatz wrote: >> > >>>> >> > >>>> On Thu, Nov 8, 2012 at 7:06 AM, David Cournapeau >> > >>>> wrote: >> > >>>>> >> > >>>>> On Thu, Nov 8, 2012 at 12:12 PM, Dag Sverre Seljebotn >> > >>>>> wrote: >> > >>>>> > On 11/08/2012 01:07 PM, Gael Varoquaux wrote: >> > >>>>> >> On Thu, Nov 08, 2012 at 11:28:21AM +0000, Nathaniel Smith wrote: >> > >>>>> >>> I think everyone would be very happy to see numpy.dot modified to do >> > >>>>> >>> this automatically. But adding a scipy.dot IMHO would be fixing >> > >>>>> >>> things >> > >>>>> >>> in the wrong place and just create extra confusion. >> > >>>>> >> >> > >>>>> >> I am not sure I agree: numpy is often compiled without lapack support, >> > >>>>> >> as >> > >>>>> >> it is not necessary. On the other hand scipy is always compiled with >> > >>>>> >> lapack. Thus this makes more sens in scipy. >> > >>>>> > >> > >>>>> > Well, numpy.dot already contains multiple fallback cases for when it is >> > >>>>> > compiled with BLAS and not. So I'm +1 on just making this an >> > >>>>> > improvement >> > >>>>> > on numpy.dot. I don't think there's a time when you would not want to >> > >>>>> > use this (provided the output order issue is fixed), and it doesn't >> > >>>>> > make >> > >>>>> > sense to not have old codes take advantage of the speed improvement. >> > >>>>> >> > >>>>> Indeed, there is no reason not to make this available in NumPy. >> > >>>>> >> > >>>>> Nicolas, can you prepare a patch for numpy ? >> > >>>> >> > >>>> >> > >>>> +1, I agree, this should be a fix in numpy, not scipy. >> > >>>> >> > >>>> Be Well >> > >>>> Anthony >> > >>>> >> > >>>>> >> > >>>>> >> > >>>>> David >> > >>>>> _______________________________________________ >> > >>>>> NumPy-Discussion mailing list >> > >>>>> NumPy-Discussion at scipy.org >> > >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >>>> >> > >>>> >> > >>>> >> > >>>> _______________________________________________ >> > >>>> NumPy-Discussion mailing list >> > >>>> NumPy-Discussion at scipy.org >> > >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >>>> >> > >>> >> > >>> >> > >>> _______________________________________________ >> > >>> NumPy-Discussion mailing list >> > >>> NumPy-Discussion at scipy.org >> > >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >>> >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From nouiz at nouiz.org Thu Nov 8 20:34:15 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Thu, 8 Nov 2012 20:34:15 -0500 Subject: [Numpy-discussion] Scipy dot In-Reply-To: References: <20121108120725.GL313@phare.normalesup.org> <509BA198.4070301@astro.uio.no>

<1352417083.30611.4.camel@sebastian-laptop> <1352419109.30611.10.camel@sebastian-laptop> Message-ID: Hi, I suspect the current tests are not enought. You need to test all the combination for the 3 inputs with thoses strides: c-contiguous f-contiguous something else like strided. Also, try with matrix with shape of 1 in each dimensions. Not all blas libraries accept the strides that numpy use in that cases. Also, not all blas version accept the same stuff, so if this isn't in the current version, there will be probably some adjustment later on that side. What blas do you use? I think ATLAS was one that was causing problem. When we did this in Theano, it was more complicated then this diff... But much of the code is boillerplate code. Fred On Thu, Nov 8, 2012 at 8:03 PM, Nicolas SCHEFFER wrote: > Thanks Sebastien, didn't think of that. > > Well I went ahead and tried the change, and it's indeed straightforward. > > I've run some tests, among which: > nosetests numpy/numpy/core/tests/test_blasdot.py > and it looks ok. I'm assuming this is good news. > > I've copy-pasting the diff below, but I have that in my branch and can > create a PR if we agree on it. > I still cannot believe it's that easy (well this has been bugging me a > while... ;)) > So I wouldn't mind waiting a day or two to see reactions on the list > before moving ahead. > > diff --git a/numpy/core/blasdot/_dotblas.c b/numpy/core/blasdot/_dotblas.c > index c73dd6a..2b4be7c 100644 > --- a/numpy/core/blasdot/_dotblas.c > +++ b/numpy/core/blasdot/_dotblas.c > @@ -770,7 +770,7 @@ dotblas_matrixproduct(PyObject *NPY_UNUSED(dummy), > PyObject *args, PyObject* kwa > * using appropriate values of Order, Trans1, and Trans2. > */ > > - if (!PyArray_ISCONTIGUOUS(ap2)) { > + if (!PyArray_IS_C_CONTIGUOUS(ap2) && > !PyArray_IS_F_CONTIGUOUS(ap2)) { > PyObject *new = PyArray_Copy(ap2); > > Py_DECREF(ap2); > @@ -779,7 +779,7 @@ dotblas_matrixproduct(PyObject *NPY_UNUSED(dummy), > PyObject *args, PyObject* kwa > goto fail; > } > } > - if (!PyArray_ISCONTIGUOUS(ap1)) { > + if (!PyArray_IS_C_CONTIGUOUS(ap1) && > !PyArray_IS_F_CONTIGUOUS(ap1)) { > PyObject *new = PyArray_Copy(ap1); > > Py_DECREF(ap1); > @@ -800,6 +800,19 @@ dotblas_matrixproduct(PyObject > *NPY_UNUSED(dummy), PyObject *args, PyObject* kwa > lda = (PyArray_DIM(ap1, 1) > 1 ? PyArray_DIM(ap1, 1) : 1); > ldb = (PyArray_DIM(ap2, 1) > 1 ? PyArray_DIM(ap2, 1) : 1); > ldc = (PyArray_DIM(ret, 1) > 1 ? PyArray_DIM(ret, 1) : 1); > + > + /* > + * Avoid temporary copies for arrays in Fortran order > + */ > + if (PyArray_IS_F_CONTIGUOUS(ap1)) { > + Trans1 = CblasTrans; > + lda = (PyArray_DIM(ap1, 0) > 1 ? PyArray_DIM(ap1, 0) : 1); > + } > + if (PyArray_IS_F_CONTIGUOUS(ap2)) { > + Trans2 = CblasTrans; > + ldb = (PyArray_DIM(ap2, 0) > 1 ? PyArray_DIM(ap2, 0) : 1); > + } > + > if (typenum == NPY_DOUBLE) { > cblas_dgemm(Order, Trans1, Trans2, > L, N, M, > > On Thu, Nov 8, 2012 at 3:58 PM, Sebastian Berg > wrote: > > On Fri, 2012-11-09 at 00:24 +0100, Sebastian Berg wrote: > >> Hey, > >> > >> On Thu, 2012-11-08 at 14:44 -0800, Nicolas SCHEFFER wrote: > >> > Well, hinted by what Fabien said, I looked at the C level dot > function. > >> > Quite verbose! > >> > > >> > But starting line 757, we can see that it shouldn't be too much work > >> > to fix that bug (well there is even a comment there that states just > >> > that) > >> > > https://github.com/numpy/numpy/blob/master/numpy/core/blasdot/_dotblas.c#L757 > >> > I now think that should be the cleanest. > >> > > >> > This would only work for gemm though. I don't know what the benefit is > >> > for gemv for instance, but we should make that kind of changes > >> > everywhere we can. > >> > The evil PyArray_Copy is there twice and that's what we want to get > rid of. > >> > > >> > I'm not sure, but it looks to me that removing the copy and doing the > >> > following would do the work: > >> > Order = CblasRowMajor; > >> > Trans1 = CblasNoTrans; > >> > Trans2 = CblasNoTrans; > >> > if (!PyArray_ISCONTIGUOUS(ap1)) { > >> > Trans1 = CblasTrans; > >> > } > >> > if (!PyArray_ISCONTIGUOUS(ap2)) { > >> > Trans2 = CblasTrans; > >> > } > >> > might be too easy to be true. > >> > > >> > >> Sounds nice, though don't forget that the array may also be neither C- > >> or F-Contiguous, in which case you need a copy in any case. So it would > >> probably be more like: > >> > >> if (PyArray_IS_C_CONTIGUOUS(ap1)) { > >> Trans1 = CblasNoTrans; > >> } > >> else if (PyArray_IS_F_CONTIGUOUS(ap1)) { > >> Trans1 = CblasTrans; > >> } > >> else { > >> Trans1 = CblasNoTrans; > >> PyObject *new = PyArray_Copy(ap1); > >> Py_DECREF(ap1); > >> ap1 = (PyArrayObject *)new; > >> } > >> > > > > Well, of course I forgot error checking there, and maybe you need to set > > some of the other parameters differently, but it looks like its probably > > that easy, and I am sure everyone will welcome a PR with such changes. > > > >> Regards, > >> > >> Sebastian > >> > >> > > >> > > >> > On Thu, Nov 8, 2012 at 12:06 PM, Nicolas SCHEFFER > >> > wrote: > >> > > I've made the necessary changes to get the proper order for the > output array. > >> > > Also, a pass of pep8 and some tests (fixmes are in failing tests) > >> > > http://pastebin.com/M8TfbURi > >> > > > >> > > -n > >> > > > >> > > On Thu, Nov 8, 2012 at 11:38 AM, Nicolas SCHEFFER > >> > > wrote: > >> > >> Thanks for all the responses folks. This is indeed a nice problem > to solve. > >> > >> > >> > >> Few points: > >> > >> I. Change the order from 'F' to 'C': I'll look into it. > >> > >> II. Integration with scipy / numpy: opinions are diverging here. > >> > >> Let's wait a bit to get more responses on what people think. > >> > >> One thing though: I'd need the same functionality as > get_blas_funcs in numpy. > >> > >> Since numpy does not require lapack, what functions can I get? > >> > >> III. Complex arrays > >> > >> I unfortunately don't have enough knowledge here. If someone could > >> > >> propose a fix, that'd be great. > >> > >> IV. C > >> > >> Writing this in C sounds like a good idea. I'm not sure I'd be the > >> > >> right person to this though. > >> > >> V. Patch in numpy > >> > >> I'd love to do that and learn to do it as a byproduct. > >> > >> Let's make sure we agree this can go in numpy first and that all > FIXME > >> > >> can be fixed. > >> > >> Although I guess we can resolve fixmes using git. > >> > >> > >> > >> Let me know how you'd like to proceed, > >> > >> > >> > >> Thanks! > >> > >> > >> > >> FIXMEs: > >> > >> - Fix for ndim != 2 > >> > >> - Fix for dtype == np.complex* > >> > >> - Fix order of output array > >> > >> > >> > >> On Thu, Nov 8, 2012 at 9:42 AM, Fr?d?ric Bastien > wrote: > >> > >>> Hi, > >> > >>> > >> > >>> I also think it should go into numpy.dot and that the output > order should > >> > >>> not be changed. > >> > >>> > >> > >>> A new point, what about the additional overhead for small > ndarray? To remove > >> > >>> this, I would suggest to put this code into the C function that > do the > >> > >>> actual work (at least, from memory it is a c function, not a > python one). > >> > >>> > >> > >>> HTH > >> > >>> > >> > >>> Fred > >> > >>> > >> > >>> > >> > >>> > >> > >>> On Thu, Nov 8, 2012 at 12:29 PM, Anthony Scopatz < > scopatz at gmail.com> wrote: > >> > >>>> > >> > >>>> On Thu, Nov 8, 2012 at 7:06 AM, David Cournapeau < > cournape at gmail.com> > >> > >>>> wrote: > >> > >>>>> > >> > >>>>> On Thu, Nov 8, 2012 at 12:12 PM, Dag Sverre Seljebotn > >> > >>>>> wrote: > >> > >>>>> > On 11/08/2012 01:07 PM, Gael Varoquaux wrote: > >> > >>>>> >> On Thu, Nov 08, 2012 at 11:28:21AM +0000, Nathaniel Smith > wrote: > >> > >>>>> >>> I think everyone would be very happy to see numpy.dot > modified to do > >> > >>>>> >>> this automatically. But adding a scipy.dot IMHO would be > fixing > >> > >>>>> >>> things > >> > >>>>> >>> in the wrong place and just create extra confusion. > >> > >>>>> >> > >> > >>>>> >> I am not sure I agree: numpy is often compiled without > lapack support, > >> > >>>>> >> as > >> > >>>>> >> it is not necessary. On the other hand scipy is always > compiled with > >> > >>>>> >> lapack. Thus this makes more sens in scipy. > >> > >>>>> > > >> > >>>>> > Well, numpy.dot already contains multiple fallback cases for > when it is > >> > >>>>> > compiled with BLAS and not. So I'm +1 on just making this an > >> > >>>>> > improvement > >> > >>>>> > on numpy.dot. I don't think there's a time when you would not > want to > >> > >>>>> > use this (provided the output order issue is fixed), and it > doesn't > >> > >>>>> > make > >> > >>>>> > sense to not have old codes take advantage of the speed > improvement. > >> > >>>>> > >> > >>>>> Indeed, there is no reason not to make this available in NumPy. > >> > >>>>> > >> > >>>>> Nicolas, can you prepare a patch for numpy ? > >> > >>>> > >> > >>>> > >> > >>>> +1, I agree, this should be a fix in numpy, not scipy. > >> > >>>> > >> > >>>> Be Well > >> > >>>> Anthony > >> > >>>> > >> > >>>>> > >> > >>>>> > >> > >>>>> David > >> > >>>>> _______________________________________________ > >> > >>>>> NumPy-Discussion mailing list > >> > >>>>> NumPy-Discussion at scipy.org > >> > >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > >>>> > >> > >>>> > >> > >>>> > >> > >>>> _______________________________________________ > >> > >>>> NumPy-Discussion mailing list > >> > >>>> NumPy-Discussion at scipy.org > >> > >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > >>>> > >> > >>> > >> > >>> > >> > >>> _______________________________________________ > >> > >>> NumPy-Discussion mailing list > >> > >>> NumPy-Discussion at scipy.org > >> > >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > >>> > >> > _______________________________________________ > >> > NumPy-Discussion mailing list > >> > NumPy-Discussion at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > >> > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scheffer.nicolas at gmail.com Fri Nov 9 01:18:32 2012 From: scheffer.nicolas at gmail.com (Nicolas SCHEFFER) Date: Thu, 8 Nov 2012 22:18:32 -0800 Subject: [Numpy-discussion] Scipy dot In-Reply-To: References: <20121108120725.GL313@phare.normalesup.org> <509BA198.4070301@astro.uio.no>

<1352417083.30611.4.camel@sebastian-laptop> <1352419109.30611.10.camel@sebastian-laptop>

Message-ID: Fred, Thanks for the advice. The code will only affect the part in _dotblas.c where gemm is called. There's tons of check before that make sure both matrices are of ndim 2. We should check though if we can do these tricks in other parts of the function. Otherwise: - I've built against ATLAS 3.10 - I'm happy to add a couple more test for C and F-contiguous. I'm not sure how to get the third type (strided), would you have an example? The following test for instance checks integrity against multiarray.dot, which I believe is default when not compiled with BLAS. Dot is a hard function to test imho, so if anybody has ideas on what kind of test they'd like to see, please let me know. If that's ok I might now be able to: - Check for more bugs, I need to dig a bit more in the gemm call, make sure everything is ok. - Create an issue on github and link to this discussion - Make a commit in a seperate branch - Move forward like that. == import numpy as np from time import time from numpy.testing import assert_almost_equal def test_dot_regression(): """ Test numpy dot by comparing with multiarray dot """ np.random.seed(7) a = np.random.randn(3, 3) b = np.random.randn(3, 2) c = np.random.randn(2, 3) _dot = np.core.multiarray.dot assert_almost_equal(np.dot(a, a), _dot(a, a)) assert_almost_equal(np.dot(b, c), _dot(b, c)) assert_almost_equal(np.dot(b.T, c.T), _dot(b.T, c.T)) assert_almost_equal(np.dot(a.T, a), _dot(a.T, a)) assert_almost_equal(np.dot(a, a.T), _dot(a, a.T)) assert_almost_equal(np.dot(a.T, a.T), _dot(a.T, a.T)) On Thu, Nov 8, 2012 at 5:34 PM, Fr?d?ric Bastien wrote: > Hi, > > I suspect the current tests are not enought. You need to test all the > combination for the 3 inputs with thoses strides: > > c-contiguous > f-contiguous > something else like strided. > > Also, try with matrix with shape of 1 in each dimensions. Not all blas > libraries accept the strides that numpy use in that cases. Also, not all > blas version accept the same stuff, so if this isn't in the current version, > there will be probably some adjustment later on that side. What blas do you > use? I think ATLAS was one that was causing problem. > > > When we did this in Theano, it was more complicated then this diff... But > much of the code is boillerplate code. > > Fred > > > > On Thu, Nov 8, 2012 at 8:03 PM, Nicolas SCHEFFER > wrote: >> >> Thanks Sebastien, didn't think of that. >> >> Well I went ahead and tried the change, and it's indeed straightforward. >> >> I've run some tests, among which: >> nosetests numpy/numpy/core/tests/test_blasdot.py >> and it looks ok. I'm assuming this is good news. >> >> I've copy-pasting the diff below, but I have that in my branch and can >> create a PR if we agree on it. >> I still cannot believe it's that easy (well this has been bugging me a >> while... ;)) >> So I wouldn't mind waiting a day or two to see reactions on the list >> before moving ahead. >> >> diff --git a/numpy/core/blasdot/_dotblas.c b/numpy/core/blasdot/_dotblas.c >> index c73dd6a..2b4be7c 100644 >> --- a/numpy/core/blasdot/_dotblas.c >> +++ b/numpy/core/blasdot/_dotblas.c >> @@ -770,7 +770,7 @@ dotblas_matrixproduct(PyObject *NPY_UNUSED(dummy), >> PyObject *args, PyObject* kwa >> * using appropriate values of Order, Trans1, and Trans2. >> */ >> >> - if (!PyArray_ISCONTIGUOUS(ap2)) { >> + if (!PyArray_IS_C_CONTIGUOUS(ap2) && >> !PyArray_IS_F_CONTIGUOUS(ap2)) { >> PyObject *new = PyArray_Copy(ap2); >> >> Py_DECREF(ap2); >> @@ -779,7 +779,7 @@ dotblas_matrixproduct(PyObject *NPY_UNUSED(dummy), >> PyObject *args, PyObject* kwa >> goto fail; >> } >> } >> - if (!PyArray_ISCONTIGUOUS(ap1)) { >> + if (!PyArray_IS_C_CONTIGUOUS(ap1) && >> !PyArray_IS_F_CONTIGUOUS(ap1)) { >> PyObject *new = PyArray_Copy(ap1); >> >> Py_DECREF(ap1); >> @@ -800,6 +800,19 @@ dotblas_matrixproduct(PyObject >> *NPY_UNUSED(dummy), PyObject *args, PyObject* kwa >> lda = (PyArray_DIM(ap1, 1) > 1 ? PyArray_DIM(ap1, 1) : 1); >> ldb = (PyArray_DIM(ap2, 1) > 1 ? PyArray_DIM(ap2, 1) : 1); >> ldc = (PyArray_DIM(ret, 1) > 1 ? PyArray_DIM(ret, 1) : 1); >> + >> + /* >> + * Avoid temporary copies for arrays in Fortran order >> + */ >> + if (PyArray_IS_F_CONTIGUOUS(ap1)) { >> + Trans1 = CblasTrans; >> + lda = (PyArray_DIM(ap1, 0) > 1 ? PyArray_DIM(ap1, 0) : 1); >> + } >> + if (PyArray_IS_F_CONTIGUOUS(ap2)) { >> + Trans2 = CblasTrans; >> + ldb = (PyArray_DIM(ap2, 0) > 1 ? PyArray_DIM(ap2, 0) : 1); >> + } >> + >> if (typenum == NPY_DOUBLE) { >> cblas_dgemm(Order, Trans1, Trans2, >> L, N, M, >> >> On Thu, Nov 8, 2012 at 3:58 PM, Sebastian Berg >> wrote: >> > On Fri, 2012-11-09 at 00:24 +0100, Sebastian Berg wrote: >> >> Hey, >> >> >> >> On Thu, 2012-11-08 at 14:44 -0800, Nicolas SCHEFFER wrote: >> >> > Well, hinted by what Fabien said, I looked at the C level dot >> >> > function. >> >> > Quite verbose! >> >> > >> >> > But starting line 757, we can see that it shouldn't be too much work >> >> > to fix that bug (well there is even a comment there that states just >> >> > that) >> >> > >> >> > https://github.com/numpy/numpy/blob/master/numpy/core/blasdot/_dotblas.c#L757 >> >> > I now think that should be the cleanest. >> >> > >> >> > This would only work for gemm though. I don't know what the benefit >> >> > is >> >> > for gemv for instance, but we should make that kind of changes >> >> > everywhere we can. >> >> > The evil PyArray_Copy is there twice and that's what we want to get >> >> > rid of. >> >> > >> >> > I'm not sure, but it looks to me that removing the copy and doing the >> >> > following would do the work: >> >> > Order = CblasRowMajor; >> >> > Trans1 = CblasNoTrans; >> >> > Trans2 = CblasNoTrans; >> >> > if (!PyArray_ISCONTIGUOUS(ap1)) { >> >> > Trans1 = CblasTrans; >> >> > } >> >> > if (!PyArray_ISCONTIGUOUS(ap2)) { >> >> > Trans2 = CblasTrans; >> >> > } >> >> > might be too easy to be true. >> >> > >> >> >> >> Sounds nice, though don't forget that the array may also be neither C- >> >> or F-Contiguous, in which case you need a copy in any case. So it would >> >> probably be more like: >> >> >> >> if (PyArray_IS_C_CONTIGUOUS(ap1)) { >> >> Trans1 = CblasNoTrans; >> >> } >> >> else if (PyArray_IS_F_CONTIGUOUS(ap1)) { >> >> Trans1 = CblasTrans; >> >> } >> >> else { >> >> Trans1 = CblasNoTrans; >> >> PyObject *new = PyArray_Copy(ap1); >> >> Py_DECREF(ap1); >> >> ap1 = (PyArrayObject *)new; >> >> } >> >> >> > >> > Well, of course I forgot error checking there, and maybe you need to set >> > some of the other parameters differently, but it looks like its probably >> > that easy, and I am sure everyone will welcome a PR with such changes. >> > >> >> Regards, >> >> >> >> Sebastian >> >> >> >> > >> >> > >> >> > On Thu, Nov 8, 2012 at 12:06 PM, Nicolas SCHEFFER >> >> > wrote: >> >> > > I've made the necessary changes to get the proper order for the >> >> > > output array. >> >> > > Also, a pass of pep8 and some tests (fixmes are in failing tests) >> >> > > http://pastebin.com/M8TfbURi >> >> > > >> >> > > -n >> >> > > >> >> > > On Thu, Nov 8, 2012 at 11:38 AM, Nicolas SCHEFFER >> >> > > wrote: >> >> > >> Thanks for all the responses folks. This is indeed a nice problem >> >> > >> to solve. >> >> > >> >> >> > >> Few points: >> >> > >> I. Change the order from 'F' to 'C': I'll look into it. >> >> > >> II. Integration with scipy / numpy: opinions are diverging here. >> >> > >> Let's wait a bit to get more responses on what people think. >> >> > >> One thing though: I'd need the same functionality as >> >> > >> get_blas_funcs in numpy. >> >> > >> Since numpy does not require lapack, what functions can I get? >> >> > >> III. Complex arrays >> >> > >> I unfortunately don't have enough knowledge here. If someone could >> >> > >> propose a fix, that'd be great. >> >> > >> IV. C >> >> > >> Writing this in C sounds like a good idea. I'm not sure I'd be the >> >> > >> right person to this though. >> >> > >> V. Patch in numpy >> >> > >> I'd love to do that and learn to do it as a byproduct. >> >> > >> Let's make sure we agree this can go in numpy first and that all >> >> > >> FIXME >> >> > >> can be fixed. >> >> > >> Although I guess we can resolve fixmes using git. >> >> > >> >> >> > >> Let me know how you'd like to proceed, >> >> > >> >> >> > >> Thanks! >> >> > >> >> >> > >> FIXMEs: >> >> > >> - Fix for ndim != 2 >> >> > >> - Fix for dtype == np.complex* >> >> > >> - Fix order of output array >> >> > >> >> >> > >> On Thu, Nov 8, 2012 at 9:42 AM, Fr?d?ric Bastien >> >> > >> wrote: >> >> > >>> Hi, >> >> > >>> >> >> > >>> I also think it should go into numpy.dot and that the output >> >> > >>> order should >> >> > >>> not be changed. >> >> > >>> >> >> > >>> A new point, what about the additional overhead for small >> >> > >>> ndarray? To remove >> >> > >>> this, I would suggest to put this code into the C function that >> >> > >>> do the >> >> > >>> actual work (at least, from memory it is a c function, not a >> >> > >>> python one). >> >> > >>> >> >> > >>> HTH >> >> > >>> >> >> > >>> Fred >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> On Thu, Nov 8, 2012 at 12:29 PM, Anthony Scopatz >> >> > >>> wrote: >> >> > >>>> >> >> > >>>> On Thu, Nov 8, 2012 at 7:06 AM, David Cournapeau >> >> > >>>> >> >> > >>>> wrote: >> >> > >>>>> >> >> > >>>>> On Thu, Nov 8, 2012 at 12:12 PM, Dag Sverre Seljebotn >> >> > >>>>> wrote: >> >> > >>>>> > On 11/08/2012 01:07 PM, Gael Varoquaux wrote: >> >> > >>>>> >> On Thu, Nov 08, 2012 at 11:28:21AM +0000, Nathaniel Smith >> >> > >>>>> >> wrote: >> >> > >>>>> >>> I think everyone would be very happy to see numpy.dot >> >> > >>>>> >>> modified to do >> >> > >>>>> >>> this automatically. But adding a scipy.dot IMHO would be >> >> > >>>>> >>> fixing >> >> > >>>>> >>> things >> >> > >>>>> >>> in the wrong place and just create extra confusion. >> >> > >>>>> >> >> >> > >>>>> >> I am not sure I agree: numpy is often compiled without >> >> > >>>>> >> lapack support, >> >> > >>>>> >> as >> >> > >>>>> >> it is not necessary. On the other hand scipy is always >> >> > >>>>> >> compiled with >> >> > >>>>> >> lapack. Thus this makes more sens in scipy. >> >> > >>>>> > >> >> > >>>>> > Well, numpy.dot already contains multiple fallback cases for >> >> > >>>>> > when it is >> >> > >>>>> > compiled with BLAS and not. So I'm +1 on just making this an >> >> > >>>>> > improvement >> >> > >>>>> > on numpy.dot. I don't think there's a time when you would not >> >> > >>>>> > want to >> >> > >>>>> > use this (provided the output order issue is fixed), and it >> >> > >>>>> > doesn't >> >> > >>>>> > make >> >> > >>>>> > sense to not have old codes take advantage of the speed >> >> > >>>>> > improvement. >> >> > >>>>> >> >> > >>>>> Indeed, there is no reason not to make this available in NumPy. >> >> > >>>>> >> >> > >>>>> Nicolas, can you prepare a patch for numpy ? >> >> > >>>> >> >> > >>>> >> >> > >>>> +1, I agree, this should be a fix in numpy, not scipy. >> >> > >>>> >> >> > >>>> Be Well >> >> > >>>> Anthony >> >> > >>>> >> >> > >>>>> >> >> > >>>>> >> >> > >>>>> David >> >> > >>>>> _______________________________________________ >> >> > >>>>> NumPy-Discussion mailing list >> >> > >>>>> NumPy-Discussion at scipy.org >> >> > >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> _______________________________________________ >> >> > >>>> NumPy-Discussion mailing list >> >> > >>>> NumPy-Discussion at scipy.org >> >> > >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > >>>> >> >> > >>> >> >> > >>> >> >> > >>> _______________________________________________ >> >> > >>> NumPy-Discussion mailing list >> >> > >>> NumPy-Discussion at scipy.org >> >> > >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > >>> >> >> > _______________________________________________ >> >> > NumPy-Discussion mailing list >> >> > NumPy-Discussion at scipy.org >> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > >> >> >> >> >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From njs at pobox.com Fri Nov 9 03:32:52 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 9 Nov 2012 08:32:52 +0000 Subject: [Numpy-discussion] Scipy dot In-Reply-To: References: <20121108120725.GL313@phare.normalesup.org> <509BA198.4070301@astro.uio.no>

<1352417083.30611.4.camel@sebastian-laptop> <1352419109.30611.10.camel@sebastian-laptop>

Message-ID: On Fri, Nov 9, 2012 at 6:18 AM, Nicolas SCHEFFER wrote: > Fred, > > Thanks for the advice. > The code will only affect the part in _dotblas.c where gemm is called. > There's tons of check before that make sure both matrices are of ndim 2. > We should check though if we can do these tricks in other parts of the function. > > Otherwise: > - I've built against ATLAS 3.10 > - I'm happy to add a couple more test for C and F-contiguous. I'm not > sure how to get the third type (strided), would you have an example? def with_memory_order(a, order): assert order in ("C", "F", "discontig") assert a.ndim == 2 if order in ("C", "F"): return np.asarray(a, order=order) else: buf = np.empty((a.shape[0] * 2, a.shape[1] * 2), dtype=a.dtype) buf[::2, ::2] = a # This returns a view onto every other element of 'buf': result = buf[::2, ::2] assert not result.flags.c_contiguous and not result.flags.f_contiguous return result > The following test for instance checks integrity against > multiarray.dot, which I believe is default when not compiled with > BLAS. > Dot is a hard function to test imho, so if anybody has ideas on what > kind of test they'd like to see, please let me know. > > If that's ok I might now be able to: > - Check for more bugs, I need to dig a bit more in the gemm call, make > sure everything is ok. > - Create an issue on github and link to this discussion > - Make a commit in a seperate branch > - Move forward like that. > > == > import numpy as np > from time import time > from numpy.testing import assert_almost_equal > > def test_dot_regression(): > """ Test numpy dot by comparing with multiarray dot > """ > np.random.seed(7) > a = np.random.randn(3, 3) > b = np.random.randn(3, 2) > c = np.random.randn(2, 3) > > _dot = np.core.multiarray.dot > > assert_almost_equal(np.dot(a, a), _dot(a, a)) > assert_almost_equal(np.dot(b, c), _dot(b, c)) > assert_almost_equal(np.dot(b.T, c.T), _dot(b.T, c.T)) > > assert_almost_equal(np.dot(a.T, a), _dot(a.T, a)) > assert_almost_equal(np.dot(a, a.T), _dot(a, a.T)) > assert_almost_equal(np.dot(a.T, a.T), _dot(a.T, a.T)) You should check that the result is C-contiguous in all cases too. for a_order in ("C", "F", "discontig"): for b_order in ("C", "F", "discontig"): this_a = with_memory_order(a, a_order) this_b = with_memory_order(b, b_order) result = np.dot(this_a, this_b) assert_almost_equal(result, expected) assert result.flags.c_contiguous You could also wrap the above in yet another loop to try a few different combinations of a and b matrices (perhaps after sticking the code into a utility function, like run_dot_tests(a, b, expected), so the indentation doesn't get out of hand ;-)). Then you can easily test some of the edge cases, like Nx1 matrices. -n > On Thu, Nov 8, 2012 at 5:34 PM, Fr?d?ric Bastien wrote: >> Hi, >> >> I suspect the current tests are not enought. You need to test all the >> combination for the 3 inputs with thoses strides: >> >> c-contiguous >> f-contiguous >> something else like strided. >> >> Also, try with matrix with shape of 1 in each dimensions. Not all blas >> libraries accept the strides that numpy use in that cases. Also, not all >> blas version accept the same stuff, so if this isn't in the current version, >> there will be probably some adjustment later on that side. What blas do you >> use? I think ATLAS was one that was causing problem. >> >> >> When we did this in Theano, it was more complicated then this diff... But >> much of the code is boillerplate code. >> >> Fred >> >> >> >> On Thu, Nov 8, 2012 at 8:03 PM, Nicolas SCHEFFER >> wrote: >>> >>> Thanks Sebastien, didn't think of that. >>> >>> Well I went ahead and tried the change, and it's indeed straightforward. >>> >>> I've run some tests, among which: >>> nosetests numpy/numpy/core/tests/test_blasdot.py >>> and it looks ok. I'm assuming this is good news. >>> >>> I've copy-pasting the diff below, but I have that in my branch and can >>> create a PR if we agree on it. >>> I still cannot believe it's that easy (well this has been bugging me a >>> while... ;)) >>> So I wouldn't mind waiting a day or two to see reactions on the list >>> before moving ahead. >>> >>> diff --git a/numpy/core/blasdot/_dotblas.c b/numpy/core/blasdot/_dotblas.c >>> index c73dd6a..2b4be7c 100644 >>> --- a/numpy/core/blasdot/_dotblas.c >>> +++ b/numpy/core/blasdot/_dotblas.c >>> @@ -770,7 +770,7 @@ dotblas_matrixproduct(PyObject *NPY_UNUSED(dummy), >>> PyObject *args, PyObject* kwa >>> * using appropriate values of Order, Trans1, and Trans2. >>> */ >>> >>> - if (!PyArray_ISCONTIGUOUS(ap2)) { >>> + if (!PyArray_IS_C_CONTIGUOUS(ap2) && >>> !PyArray_IS_F_CONTIGUOUS(ap2)) { >>> PyObject *new = PyArray_Copy(ap2); >>> >>> Py_DECREF(ap2); >>> @@ -779,7 +779,7 @@ dotblas_matrixproduct(PyObject *NPY_UNUSED(dummy), >>> PyObject *args, PyObject* kwa >>> goto fail; >>> } >>> } >>> - if (!PyArray_ISCONTIGUOUS(ap1)) { >>> + if (!PyArray_IS_C_CONTIGUOUS(ap1) && >>> !PyArray_IS_F_CONTIGUOUS(ap1)) { >>> PyObject *new = PyArray_Copy(ap1); >>> >>> Py_DECREF(ap1); >>> @@ -800,6 +800,19 @@ dotblas_matrixproduct(PyObject >>> *NPY_UNUSED(dummy), PyObject *args, PyObject* kwa >>> lda = (PyArray_DIM(ap1, 1) > 1 ? PyArray_DIM(ap1, 1) : 1); >>> ldb = (PyArray_DIM(ap2, 1) > 1 ? PyArray_DIM(ap2, 1) : 1); >>> ldc = (PyArray_DIM(ret, 1) > 1 ? PyArray_DIM(ret, 1) : 1); >>> + >>> + /* >>> + * Avoid temporary copies for arrays in Fortran order >>> + */ >>> + if (PyArray_IS_F_CONTIGUOUS(ap1)) { >>> + Trans1 = CblasTrans; >>> + lda = (PyArray_DIM(ap1, 0) > 1 ? PyArray_DIM(ap1, 0) : 1); >>> + } >>> + if (PyArray_IS_F_CONTIGUOUS(ap2)) { >>> + Trans2 = CblasTrans; >>> + ldb = (PyArray_DIM(ap2, 0) > 1 ? PyArray_DIM(ap2, 0) : 1); >>> + } >>> + >>> if (typenum == NPY_DOUBLE) { >>> cblas_dgemm(Order, Trans1, Trans2, >>> L, N, M, >>> >>> On Thu, Nov 8, 2012 at 3:58 PM, Sebastian Berg >>> wrote: >>> > On Fri, 2012-11-09 at 00:24 +0100, Sebastian Berg wrote: >>> >> Hey, >>> >> >>> >> On Thu, 2012-11-08 at 14:44 -0800, Nicolas SCHEFFER wrote: >>> >> > Well, hinted by what Fabien said, I looked at the C level dot >>> >> > function. >>> >> > Quite verbose! >>> >> > >>> >> > But starting line 757, we can see that it shouldn't be too much work >>> >> > to fix that bug (well there is even a comment there that states just >>> >> > that) >>> >> > >>> >> > https://github.com/numpy/numpy/blob/master/numpy/core/blasdot/_dotblas.c#L757 >>> >> > I now think that should be the cleanest. >>> >> > >>> >> > This would only work for gemm though. I don't know what the benefit >>> >> > is >>> >> > for gemv for instance, but we should make that kind of changes >>> >> > everywhere we can. >>> >> > The evil PyArray_Copy is there twice and that's what we want to get >>> >> > rid of. >>> >> > >>> >> > I'm not sure, but it looks to me that removing the copy and doing the >>> >> > following would do the work: >>> >> > Order = CblasRowMajor; >>> >> > Trans1 = CblasNoTrans; >>> >> > Trans2 = CblasNoTrans; >>> >> > if (!PyArray_ISCONTIGUOUS(ap1)) { >>> >> > Trans1 = CblasTrans; >>> >> > } >>> >> > if (!PyArray_ISCONTIGUOUS(ap2)) { >>> >> > Trans2 = CblasTrans; >>> >> > } >>> >> > might be too easy to be true. >>> >> > >>> >> >>> >> Sounds nice, though don't forget that the array may also be neither C- >>> >> or F-Contiguous, in which case you need a copy in any case. So it would >>> >> probably be more like: >>> >> >>> >> if (PyArray_IS_C_CONTIGUOUS(ap1)) { >>> >> Trans1 = CblasNoTrans; >>> >> } >>> >> else if (PyArray_IS_F_CONTIGUOUS(ap1)) { >>> >> Trans1 = CblasTrans; >>> >> } >>> >> else { >>> >> Trans1 = CblasNoTrans; >>> >> PyObject *new = PyArray_Copy(ap1); >>> >> Py_DECREF(ap1); >>> >> ap1 = (PyArrayObject *)new; >>> >> } >>> >> >>> > >>> > Well, of course I forgot error checking there, and maybe you need to set >>> > some of the other parameters differently, but it looks like its probably >>> > that easy, and I am sure everyone will welcome a PR with such changes. >>> > >>> >> Regards, >>> >> >>> >> Sebastian >>> >> >>> >> > >>> >> > >>> >> > On Thu, Nov 8, 2012 at 12:06 PM, Nicolas SCHEFFER >>> >> > wrote: >>> >> > > I've made the necessary changes to get the proper order for the >>> >> > > output array. >>> >> > > Also, a pass of pep8 and some tests (fixmes are in failing tests) >>> >> > > http://pastebin.com/M8TfbURi >>> >> > > >>> >> > > -n >>> >> > > >>> >> > > On Thu, Nov 8, 2012 at 11:38 AM, Nicolas SCHEFFER >>> >> > > wrote: >>> >> > >> Thanks for all the responses folks. This is indeed a nice problem >>> >> > >> to solve. >>> >> > >> >>> >> > >> Few points: >>> >> > >> I. Change the order from 'F' to 'C': I'll look into it. >>> >> > >> II. Integration with scipy / numpy: opinions are diverging here. >>> >> > >> Let's wait a bit to get more responses on what people think. >>> >> > >> One thing though: I'd need the same functionality as >>> >> > >> get_blas_funcs in numpy. >>> >> > >> Since numpy does not require lapack, what functions can I get? >>> >> > >> III. Complex arrays >>> >> > >> I unfortunately don't have enough knowledge here. If someone could >>> >> > >> propose a fix, that'd be great. >>> >> > >> IV. C >>> >> > >> Writing this in C sounds like a good idea. I'm not sure I'd be the >>> >> > >> right person to this though. >>> >> > >> V. Patch in numpy >>> >> > >> I'd love to do that and learn to do it as a byproduct. >>> >> > >> Let's make sure we agree this can go in numpy first and that all >>> >> > >> FIXME >>> >> > >> can be fixed. >>> >> > >> Although I guess we can resolve fixmes using git. >>> >> > >> >>> >> > >> Let me know how you'd like to proceed, >>> >> > >> >>> >> > >> Thanks! >>> >> > >> >>> >> > >> FIXMEs: >>> >> > >> - Fix for ndim != 2 >>> >> > >> - Fix for dtype == np.complex* >>> >> > >> - Fix order of output array >>> >> > >> >>> >> > >> On Thu, Nov 8, 2012 at 9:42 AM, Fr?d?ric Bastien >>> >> > >> wrote: >>> >> > >>> Hi, >>> >> > >>> >>> >> > >>> I also think it should go into numpy.dot and that the output >>> >> > >>> order should >>> >> > >>> not be changed. >>> >> > >>> >>> >> > >>> A new point, what about the additional overhead for small >>> >> > >>> ndarray? To remove >>> >> > >>> this, I would suggest to put this code into the C function that >>> >> > >>> do the >>> >> > >>> actual work (at least, from memory it is a c function, not a >>> >> > >>> python one). >>> >> > >>> >>> >> > >>> HTH >>> >> > >>> >>> >> > >>> Fred >>> >> > >>> >>> >> > >>> >>> >> > >>> >>> >> > >>> On Thu, Nov 8, 2012 at 12:29 PM, Anthony Scopatz >>> >> > >>> wrote: >>> >> > >>>> >>> >> > >>>> On Thu, Nov 8, 2012 at 7:06 AM, David Cournapeau >>> >> > >>>> >>> >> > >>>> wrote: >>> >> > >>>>> >>> >> > >>>>> On Thu, Nov 8, 2012 at 12:12 PM, Dag Sverre Seljebotn >>> >> > >>>>>