[Numpy-discussion] Scipy dot

Mon Nov 12 15:08:39 EST 2012

I've pushed my code to a branch here
https://github.com/leschef/numpy/tree/faster_dot
with the commit
https://github.com/leschef/numpy/commit/ea037770e03f23aca1a06274a1a8e8bf0e0e2ee4

Let me know if that's enough to create a pull request.

Thanks,

-nicolas

On Sat, Nov 10, 2012 at 4:39 AM, George Nurser <gnurser at gmail.com> wrote:
> Note also that OpenBlas claims performance as good as MKL with Sandy Bridge
> processors.
>
> https://github.com/xianyi/OpenBLAS/wiki/faq#wiki-sandybridge_perf
>
> George Nurser.
>
>
> On 10 November 2012 00:38, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no>
> wrote:
>>
>> On 11/09/2012 11:57 PM, Matthieu Brucher wrote:
>> > Hi,
>> >
>> > A.A slower than A.A' is not a surprise for me. The latter is far more
>> > cache friendly that the former. Everything follows cache lines, so it is
>> > faster than something that will use one element from each cache line. In
>> > fact it is exactly what "proves" that the new version is correct.
>> > Good job (if all the tests were made and still pass ;) )
>>
>> Cache lines shouldn't matter much with a decent BLAS?
>>
>> http://dl.acm.org/citation.cfm?id=1356053
>>
>> (Googling for "Anatomy of High-Performance Matrix Multiplication" will
>> give you a preprint outside of paywall, but Google appears to not want
>> to give me the URL of a too long search result so I can't paste it).
>>
>> Dag Sverre
>>
>> >
>> > Cheers,
>> >
>> > Matthieu
>> >
>> >
>> > 2012/11/9 Nicolas SCHEFFER <scheffer.nicolas at gmail.com
>> > <mailto:scheffer.nicolas at gmail.com>>
>> >
>> >     Ok: comparing apples to apples. I'm clueless on my observations and
>> >     would need input from you guys.
>> >
>> >     Using ATLAS 3.10, numpy with and without my changes, I'm getting
>> > these
>> >     timings and comparisons.
>> >
>> >     #
>> >     #I. Generate matrices using regular dot:
>> >     #
>> >     big = np.array(np.random.randn(2000, 2000), 'f');
>> >     np.savez('out', big=big, none=big.dot(big), both=big.T.dot(big.T),
>> >     left=big.T.dot(big), right=big.dot(big.T))"
>> >
>> >     #
>> >     #II. Timings with regular dot
>> >     #
>> >     In [3]: %timeit np.dot(big, big)
>> >     10 loops, best of 3: 138 ms per loop
>> >
>> >     In [4]: %timeit np.dot(big, big.T)
>> >     10 loops, best of 3: 166 ms per loop
>> >
>> >     In [5]: %timeit np.dot(big.T, big.T)
>> >     10 loops, best of 3: 193 ms per loop
>> >
>> >     In [6]: %timeit np.dot(big.T, big)
>> >     10 loops, best of 3: 165 ms per loop
>> >
>> >     #
>> >     #III. I load these arrays and time again with the "fast" dot
>> >     #
>> >     In [21]: %timeit np.dot(big, big)
>> >     10 loops, best of 3: 138 ms per loop
>> >
>> >     In [22]: %timeit np.dot(big.T, big)
>> >     10 loops, best of 3: 104 ms per loop
>> >
>> >     In [23]: %timeit np.dot(big.T, big.T)
>> >     10 loops, best of 3: 138 ms per loop
>> >
>> >     In [24]: %timeit np.dot(big, big.T)
>> >     10 loops, best of 3: 102 ms per loop
>> >
>> >     1. A'.A': great!
>> >     2. A.A' becomes faster than A.A !?!
>> >
>> >     #
>> >     #IV. MSE on differences
>> >     #
>> >     In [25]: np.sqrt(((arr['none'] - none)**2).sum())
>> >     Out[25]: 0.0
>> >
>> >     In [26]: np.sqrt(((arr['both'] - both)**2).sum())
>> >     Out[26]: 0.0
>> >
>> >     In [27]: np.sqrt(((arr['left'] - left)**2).sum())
>> >     Out[27]: 0.015900515
>> >
>> >     In [28]: np.sqrt(((arr['right'] - right)**2).sum())
>> >     Out[28]: 0.015331409
>> >
>> >     #
>> >     # CCl
>> >     #
>> >     While the MSE are small, I'm wondering whether:
>> >     - It's a bug: it should be exactly the same
>> >     - It's a feature: BLAS is taking shortcuts when you have A.A'. The
>> >     difference is not significant. Quick: PR that asap!
>> >
>> >     I don't have enough expertise to answer that...
>> >
>> >     Thanks much!
>> >
>> >     -nicolas
>> >     On Fri, Nov 9, 2012 at 2:13 PM, Nicolas SCHEFFER
>> >     <scheffer.nicolas at gmail.com <mailto:scheffer.nicolas at gmail.com>>
>> > wrote:
>> >      > I too encourage users to use scipy.linalg for speed and
>> > robustness
>> >      > (hence calling this scipy.dot), but it just brings so much
>> > confusion!
>> >      > When using the scipy + numpy ecosystem, you'd almost want
>> > everything
>> >      > be done with scipy so that you get the best implementation in all
>> >      > cases: scipy.zeros(), scipy.array(), scipy.dot(),
>> > scipy.linalg.inv().
>> >      >
>> >      > Anyway this is indeed for another thread, the confusion we'd like
>> > to
>> >      > fix here is that users shouldn't have to understand the C/F
>> >     contiguous
>> >      > concepts to get the maximum speed for np.dot()
>> >      >
>> >      > To summarize:
>> >      > - The python snippet I posted is still valid and can speed up
>> > your
>> >      > code if you can change all your dot() calls.
>> >      > - The change in dotblas.c is a bit more problematic because it's
>> > very
>> >      > core. I'm having issues right now to replicate the timings, I've
>> > got
>> >      > better timing for a.dot(a.T) than for a.dot(a). There might be a
>> > bug.
>> >      >
>> >      > It's a pain to test because I cannot do the test in a single
>> >     python session.
>> >      > I'm going to try to integrate most of your suggestions, I cannot
>> >      > guarantee I'll have time to do them all though.
>> >      >
>> >      > -nicolas
>> >      > On Fri, Nov 9, 2012 at 8:56 AM, Nathaniel Smith <njs at pobox.com
>> >     <mailto:njs at pobox.com>> wrote:
>> >      >> On Fri, Nov 9, 2012 at 4:25 PM, Gael Varoquaux
>> >      >> <gael.varoquaux at normalesup.org
>> >     <mailto:gael.varoquaux at normalesup.org>> wrote:
>> >      >>> On Fri, Nov 09, 2012 at 03:12:42PM +0000, Nathaniel Smith
>> > wrote:
>> >      >>>> But what if someone compiles numpy against an optimized blas
>> > (mkl,
>> >      >>>> say) and then compiles SciPy against the reference blas? What
>> >     do you
>> >      >>>> do then!? ;-)
>> >      >>>
>> >      >>> This could happen. But the converse happens very often. What
>> >     happens is
>> >      >>> that users (eg on shared computing resource) ask for a
>> >     scientific python
>> >      >>> environment. The administrator than installs the package
>> >     starting from
>> >      >>> the most basic one, to the most advanced one, thus starting
>> >     with numpy
>> >      >>> that can very well build without any external blas. When he
>> >     gets to scipy
>> >      >>> he hits the problem that the build system does not detect
>> >     properly the
>> >      >>> blas, and he solves that problem.
>> >      >>>
>> >      >>> Also, it used to be that on the major linux distributions,
>> >     numpy would not
>> >      >>> be build with an optimize lapack because numpy was in the
>> >     'base' set of
>> >      >>> packages, but not lapack. On the contrary, scipy being in the
>> >     'contrib'
>> >      >>> set, it could depend on lapack. I just checked, and this has
>> >     been fixed
>> >      >>> in the major distributions (Fedora, Debian, Ubuntu).
>> >      >>>
>> >      >>> Now we can discuss with such problems should not happen, and
>> >     put the
>> >      >>> blame on the users/administrators, the fact is that they happen
>> >     often. I
>> >      >>> keep seeing environments in which np.linalg is unreasonnably
>> > slow.
>> >      >>
>> >      >> If this is something that's been a problem for you, maybe we
>> > should
>> >      >> start another thread on things we could do to fix it directly?
>> >     Improve
>> >      >> build instructions, advertise build systems that set up the
>> > whole
>> >      >> environment (and thus do the right thing), make numpy's setup.py
>> >      >> scream and yell if blas isn't available...?
>> >      >>
>> >      >> -n
>> >      >> _______________________________________________
>> >      >> NumPy-Discussion mailing list
>> >      >> NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>> >      >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >     _______________________________________________
>> >     NumPy-Discussion mailing list
>> >     NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>> >     http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> >
>> >
>> > --
>> > Information System Engineer, Ph.D.
>> > Blog: http://matt.eifelle.com
>> > LinkedIn: http://www.linkedin.com/in/matthieubrucher
>> > Music band: http://liliejay.com/
>> >
>> >
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>