How fast can we multiply?

Wed Jul 21 19:06:47 EDT 1999

[Jeffrey Chang <jefftc at leland.Stanford.EDU>]
> Oh, shoot.  I'm 2 days into a week long calculation of normalizing
> a dense ~6000x6000 matrix.  Do you mean I'm the only person doing
> this kind of thing?  ;)

Yes <wink>.

> I've been toying around with the idea of working on optimized
> matrix functions for NumPy (while waiting for the computations to
> finish!).   Since I often work with large matrices, there could
> be big wins for me.  However, it's a little disheartening to hear
> that few other people would benefit from it.
>
> I know the O(n**log2(7)) algorithm for matrix multiply and am
> aware that there are faster ones.  Does anyone know what they are?
> Can anyone recommend a good general book for optimized matrix
> algorithms?  Is Knuth 2 the best source?

It's probably the best accessible source for a quick overview of the
theory.

In practice, though, you're pretty much hosed:  I don't know what your
6000x6000 matrices *contain*, but if it's doubles or even floats then
each consumes a significant fraction of a gigabyte.  So unless you're
running on a real-memory machine, cache effects are likely as least as
important as operation count; on many platforms, much more important.
When high performance requires coding to the most obscure details of a
platform's memory hierarchy, portability goes out the window.

There's an interesting paper about how all this relates to Strassen-
Winograd matmult (the simplest of the "advanced" methods) at:

     http://www.cs.duke.edu/~alvy/papers/sc98/index.htm

Here's part of the abstract:

    Strassen's algorithm for matrix multiplication gains its lower
    arithmetic complexity at the expense of reduced locality of
    reference, which makes it challenging to implement the algorithm
    efficiently on a modern machine with a hierarchical memory
    system. We report on an implementation of this algorithm that
    uses several unconventional techniques to make the algorithm
    memory-friendly.
    ...
    Performance comparisons of our implementation with that of
    competing implementations show that our implementation often
    outperforms the alternative techniques (up to 25%). However, we
    also observe wide variability across platforms and across matrix
    sizes, indicating that at this time, no single implementation is
    a clear choice for all platforms or matrix sizes.
    ...

Par for the course.

buy-an-old-cray-while-they're-still-cheap<wink>-ly y'rs  - tim

Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.