finding most common elements between thousands of multiple arrays.

Sat Jul 4 19:05:50 EDT 2009

mclovin wrote:
> On Jul 4, 3:29 pm, MRAB <pyt... at mrabarnett.plus.com> wrote:
>> mclovin wrote:
>>
>> [snip]
>>
>>> like I said I need to do this 480,000 times so to get this done
>>> realistically I need to analyse about 5 a second. It appears that the
>>> average matrix size contains about 15 million elements.
>>> I threaded my program using your code and I did about 1,000 in an hour
>>> so it is still much too slow.
>>> When I selected 1 million random elements to count, 8 out of the top
>>> 10 of those were in the top 25 of the precise way and 18 of the 25
>>> were in the top 25 of the precise way. so I suppose that could be an
>>> option.
>> The values are integers, aren't they? What is the range of values?
> 
> There are appox 550k unique values with a range of 0-2million with
> gaps.

I've done a little experimentation with lists (no numpy involved) and
found that I got a x2 speed increase if I did the counting using a list,
something like this:

counts = [0] * 2000000
for x in values:
     counts[x] += 1
counts = dict(e for e in enumerate(values) if e[1] != 0)