Matrix multiplication

Tue Jan 4 12:08:32 EST 2011

On Tue, Jan 4, 2011 at 4:22 AM, Zdenko <oknedz at gmail.com> wrote:
> On 4.1.2011 11:15, Ulrich Eckhardt wrote:
>>
>> Zdenko wrote:
>>>
>>> Please, can anybody write me a simple recursive matrix multiplication
>>> using multiple threads in Python, or at least show me some guidelines
>>> how to write it myself
>>
>> No problem, I just need your bank account data to withdraw the payment and
>> the address of your teacher whom to send the results. ;^)
>>
>> Seriously, things don't work like this. If you show effort, you will get
>> help, but you don't get working solutions without even a bit of effort on
>> your own.
>>
>> Start with a recursive algorithm, then make it multithreaded. Be prepared
>> to
>> make some experiments on how to distribute tasks to threads and collect
>> them again.
>>
>> Uli
>>
>
> Ok, I expected something like this :)
> Lets try this way:
>
> I wrote these two codes for example:
>
> this one is naive way of matrix multiplication using one thread
>
> from random import *
> from time import clock
> import threading
>
> t0 = clock()
> A=[]
> B=[]
> C=[]
> m=300  #size of matrix m x m
>
> for i in range(m):
>    A.append([])
>    B.append([])
>    C.append([])
>    for j in range(m):
>        A[i].append(randint(1,9))
>        B[i].append(randint(1,9))
>        C[i].append(0)
>
> class MyThread ( threading.Thread ):
>
>   def run ( self ):
>           for i in range(m):
>                for j in range(m):
>                    for k in range(m):
>                        C[i][j]+=A[i][k]*B[k][j]
>
> t=MyThread()
> t.start()
> t.join()
> print clock() - t0, "seconds"
>
>
>
> this one is using two threads, each one is multiplying half of matrix
>
> import time
> from threading import Thread
> from random import randint
>
> t0 = time.clock()
> class MyThread(Thread):
>    def __init__(self, poc, kr):
>        Thread.__init__(self)
>        self.poc=poc
>        self.kr=kr
>    def run(self):
>           for i in range(self.poc,self.kr):
>                for j in range(m):
>                    for k in range(m):
>                        C[i][j]+=A[i][k]*B[k][j]
>
> A=[]
> B=[]
> C=[]
> m=300 #size of matrix m x m
>
> for i in range(m):
>    A.append([])
>    B.append([])
>    C.append([])
>    for j in range(m):
>        A[i].append(randint(1,9))
>        B[i].append(randint(1,9))
>        C[i].append(0)
> thr1=MyThread(0,m/2)
> thr2=MyThread(m/2,m)
> thr1.start()
> thr2.start()
> thr1.join()
> thr2.join()
> print time.clock() - t0, "seconds"
>
> why is second one more than twice slower than first when it should be
> faster. I suspect that problem is in simultaneous access to matrix C but i
> don't know how to solve this.
>
> I used this just for example to see how the threading works so i can work on
> recursive and Strassen algorithm.
> --
> http://mail.python.org/mailman/listinfo/python-list
>

Threads in CPython are decent when you're doing I/O, not so good when
you're doing CPU-bound tasks.  When you're doing CPU-bound tasks, you
may actually find that your application runs about 1/n times as fast,
where n is the number of CPU's - that is, much slower.  I don't mean
that it just doesn't scale well, I mean that each additional CPU makes
things slower for CPU-bound jobs.

If you require java-like threading, you might be better off with
Jython or Iron Python for now.

pypy's likely to have good threading eventually, but last I heard it
wasn't quite there yet.

If you do require CPython, you might check out the greenlets module
and/or PyCSP.

Oh, and there's "stackless python", which amounts to a branch of
CPython that's unlikely to be merged. It's supposed to be able to
thread really well, and ISTR hearing that it's pretty similar to
greenlets.  The main force behind stackless is working on pypy now, I
believe.

Finally, with CPython, you can use the multiprocessing module to get
pretty good parallelism via distinct processes.  This way you tend to
end up using queues for interprocess communication; these queues use
shared memory.  However, you might actually be able to put your matrix
in shared memory - if it's large enough to justify that.  I know you
could put a linear array of homogeneous, scalar type in shared memory;
I've not heard of a way of putting an aggregate type in shared memory.
 Naturally, you can fake an n dimensional array using a 1 dimensional
array with a little multiplication and addition.

As an example of using CPython with multiprocessing without having
your matrix in shared memory, you could probably have one worker
subprocess for each core on a system, divide up the cells of your
result matrix by their coordinates (perhaps using an iterator and izip
across the queues) and send a message (via a shared memory queue) with
those coordinates to the appropriate subprocess.  Then those
subprocess send back the coordinates and the result at said coordinate
via a different result queue.  I suspect you'd want one queue for all
the results, and one queue for each subprocess for initiating
computation.  Each subprocess would have its own COW copy of the input
matrix.