[Numpy-discussion] Scipy dot
George Nurser
gnurser at gmail.com
Sat Nov 10 07:39:03 EST 2012
Note also that OpenBlas claims performance as good as MKL with Sandy Bridge
processors.
https://github.com/xianyi/OpenBLAS/wiki/faq#wiki-sandybridge_perf
George Nurser.
On 10 November 2012 00:38, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no>wrote:
> On 11/09/2012 11:57 PM, Matthieu Brucher wrote:
> > Hi,
> >
> > A.A slower than A.A' is not a surprise for me. The latter is far more
> > cache friendly that the former. Everything follows cache lines, so it is
> > faster than something that will use one element from each cache line. In
> > fact it is exactly what "proves" that the new version is correct.
> > Good job (if all the tests were made and still pass ;) )
>
> Cache lines shouldn't matter much with a decent BLAS?
>
> http://dl.acm.org/citation.cfm?id=1356053
>
> (Googling for "Anatomy of High-Performance Matrix Multiplication" will
> give you a preprint outside of paywall, but Google appears to not want
> to give me the URL of a too long search result so I can't paste it).
>
> Dag Sverre
>
> >
> > Cheers,
> >
> > Matthieu
> >
> >
> > 2012/11/9 Nicolas SCHEFFER <scheffer.nicolas at gmail.com
> > <mailto:scheffer.nicolas at gmail.com>>
> >
> > Ok: comparing apples to apples. I'm clueless on my observations and
> > would need input from you guys.
> >
> > Using ATLAS 3.10, numpy with and without my changes, I'm getting
> these
> > timings and comparisons.
> >
> > #
> > #I. Generate matrices using regular dot:
> > #
> > big = np.array(np.random.randn(2000, 2000), 'f');
> > np.savez('out', big=big, none=big.dot(big), both=big.T.dot(big.T),
> > left=big.T.dot(big), right=big.dot(big.T))"
> >
> > #
> > #II. Timings with regular dot
> > #
> > In [3]: %timeit np.dot(big, big)
> > 10 loops, best of 3: 138 ms per loop
> >
> > In [4]: %timeit np.dot(big, big.T)
> > 10 loops, best of 3: 166 ms per loop
> >
> > In [5]: %timeit np.dot(big.T, big.T)
> > 10 loops, best of 3: 193 ms per loop
> >
> > In [6]: %timeit np.dot(big.T, big)
> > 10 loops, best of 3: 165 ms per loop
> >
> > #
> > #III. I load these arrays and time again with the "fast" dot
> > #
> > In [21]: %timeit np.dot(big, big)
> > 10 loops, best of 3: 138 ms per loop
> >
> > In [22]: %timeit np.dot(big.T, big)
> > 10 loops, best of 3: 104 ms per loop
> >
> > In [23]: %timeit np.dot(big.T, big.T)
> > 10 loops, best of 3: 138 ms per loop
> >
> > In [24]: %timeit np.dot(big, big.T)
> > 10 loops, best of 3: 102 ms per loop
> >
> > 1. A'.A': great!
> > 2. A.A' becomes faster than A.A !?!
> >
> > #
> > #IV. MSE on differences
> > #
> > In [25]: np.sqrt(((arr['none'] - none)**2).sum())
> > Out[25]: 0.0
> >
> > In [26]: np.sqrt(((arr['both'] - both)**2).sum())
> > Out[26]: 0.0
> >
> > In [27]: np.sqrt(((arr['left'] - left)**2).sum())
> > Out[27]: 0.015900515
> >
> > In [28]: np.sqrt(((arr['right'] - right)**2).sum())
> > Out[28]: 0.015331409
> >
> > #
> > # CCl
> > #
> > While the MSE are small, I'm wondering whether:
> > - It's a bug: it should be exactly the same
> > - It's a feature: BLAS is taking shortcuts when you have A.A'. The
> > difference is not significant. Quick: PR that asap!
> >
> > I don't have enough expertise to answer that...
> >
> > Thanks much!
> >
> > -nicolas
> > On Fri, Nov 9, 2012 at 2:13 PM, Nicolas SCHEFFER
> > <scheffer.nicolas at gmail.com <mailto:scheffer.nicolas at gmail.com>>
> wrote:
> > > I too encourage users to use scipy.linalg for speed and robustness
> > > (hence calling this scipy.dot), but it just brings so much
> confusion!
> > > When using the scipy + numpy ecosystem, you'd almost want
> everything
> > > be done with scipy so that you get the best implementation in all
> > > cases: scipy.zeros(), scipy.array(), scipy.dot(),
> scipy.linalg.inv().
> > >
> > > Anyway this is indeed for another thread, the confusion we'd like
> to
> > > fix here is that users shouldn't have to understand the C/F
> > contiguous
> > > concepts to get the maximum speed for np.dot()
> > >
> > > To summarize:
> > > - The python snippet I posted is still valid and can speed up your
> > > code if you can change all your dot() calls.
> > > - The change in dotblas.c is a bit more problematic because it's
> very
> > > core. I'm having issues right now to replicate the timings, I've
> got
> > > better timing for a.dot(a.T) than for a.dot(a). There might be a
> bug.
> > >
> > > It's a pain to test because I cannot do the test in a single
> > python session.
> > > I'm going to try to integrate most of your suggestions, I cannot
> > > guarantee I'll have time to do them all though.
> > >
> > > -nicolas
> > > On Fri, Nov 9, 2012 at 8:56 AM, Nathaniel Smith <njs at pobox.com
> > <mailto:njs at pobox.com>> wrote:
> > >> On Fri, Nov 9, 2012 at 4:25 PM, Gael Varoquaux
> > >> <gael.varoquaux at normalesup.org
> > <mailto:gael.varoquaux at normalesup.org>> wrote:
> > >>> On Fri, Nov 09, 2012 at 03:12:42PM +0000, Nathaniel Smith wrote:
> > >>>> But what if someone compiles numpy against an optimized blas
> (mkl,
> > >>>> say) and then compiles SciPy against the reference blas? What
> > do you
> > >>>> do then!? ;-)
> > >>>
> > >>> This could happen. But the converse happens very often. What
> > happens is
> > >>> that users (eg on shared computing resource) ask for a
> > scientific python
> > >>> environment. The administrator than installs the package
> > starting from
> > >>> the most basic one, to the most advanced one, thus starting
> > with numpy
> > >>> that can very well build without any external blas. When he
> > gets to scipy
> > >>> he hits the problem that the build system does not detect
> > properly the
> > >>> blas, and he solves that problem.
> > >>>
> > >>> Also, it used to be that on the major linux distributions,
> > numpy would not
> > >>> be build with an optimize lapack because numpy was in the
> > 'base' set of
> > >>> packages, but not lapack. On the contrary, scipy being in the
> > 'contrib'
> > >>> set, it could depend on lapack. I just checked, and this has
> > been fixed
> > >>> in the major distributions (Fedora, Debian, Ubuntu).
> > >>>
> > >>> Now we can discuss with such problems should not happen, and
> > put the
> > >>> blame on the users/administrators, the fact is that they happen
> > often. I
> > >>> keep seeing environments in which np.linalg is unreasonnably
> slow.
> > >>
> > >> If this is something that's been a problem for you, maybe we
> should
> > >> start another thread on things we could do to fix it directly?
> > Improve
> > >> build instructions, advertise build systems that set up the whole
> > >> environment (and thus do the right thing), make numpy's setup.py
> > >> scream and yell if blas isn't available...?
> > >>
> > >> -n
> > >> _______________________________________________
> > >> NumPy-Discussion mailing list
> > >> NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
> > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> >
> > --
> > Information System Engineer, Ph.D.
> > Blog: http://matt.eifelle.com
> > LinkedIn: http://www.linkedin.com/in/matthieubrucher
> > Music band: http://liliejay.com/
> >
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20121110/23702894/attachment.html>
More information about the NumPy-Discussion
mailing list