finding most common elements between thousands of multiple arrays.

Lie Ryan lie.1296 at gmail.com
Sat Jul 4 14:36:38 EDT 2009


mclovin wrote:
> OK then. I will try some of the strategies here but I guess things
> arent looking too good. I need to run this over a dataset that someone
> pickled. I need to run this 480,000 times so you can see my
> frustration. So it doesn't need to be "real time" but it would be nice
> it was done sorting this month.
> 
> Is there a "bet guess" strategy where it is not 100% accurate but much
> faster?

Heuristics?

If you don't need 100% accuraccy, you can simply sample 10000 or so
element and find the most common element in this small sample space. It
should be much faster, though you'll probably need to determine the best
cutoff number (too small and you're risking biases, too large and it
would be slower). random.sample() might be useful here.



More information about the Python-list mailing list