a huge shared read-only data in parallel accesses -- How? multithreading? multiprocessing?

Fri Dec 11 09:00:46 EST 2009

Le Wed, 09 Dec 2009 06:58:11 -0800, Valery a écrit :
> 
> I have a huge data structure that takes >50% of RAM. My goal is to have
> many computational threads (or processes) that can have an efficient
> read-access to the huge and complex data structure.
> 
> "Efficient" in particular means "without serialization" and "without
> unneeded lockings on read-only data"

I was going to suggest memcached but it probably serializes non-atomic 
types. It doesn't mean it will be slow, though. Serialization implemented 
in C may well be faster than any "smart" non-serializing scheme 
implemented in Python.

> 2. multi-threading
>  => d. CPython is told to have problems here because of GIL --  any
> comments?

What do you call "problems because of the GIL"? It is quite a vague 
statement, and an answer would depend on your OS, the number of threads 
you're willing to run, and whether you want to extract throughput from 
multiple threads or are just concerned about latency.

In any case, you have to do some homework and compare the various 
approaches on your own data, and decide whether the numbers are 
satisfying to you.

> I am a big fan of parallel map() approach

I don't see what map() has to do with accessing data. map() is for 
*processing* of data. In other words, whether or not you use a map()-like 
primitive does not say anything about how the underlying data should be 
accessed.