cPickle alternative?

Bengt Richter bokr at oz.net
Fri Aug 15 18:22:54 EDT 2003


On Fri, 15 Aug 2003 16:27:18 +0200, "Drochom" <pedrosch at gazeta.pl> wrote:

>Hello,
>    I have a huge problem with loading very simple structure into memory
>    it is a list of tuples, it has 6MB and consists of 100000 elements
>
If speed is important, you may want to do different things depending on e.g.,
what is in those tuples, and whether they are all the same length, etc. E.g.,
if they were all fixed length tuples of integers, you could do hugely better
than store the data as a list of tuples.

Secondly, you might want to consider being lazy about extracting the data you
are actually going to use, depending on use patterns. One way to do that would
be to have a compact index to the data, or store it in such a way that you can
compute an index, and then write some simple class to define access methods.
That's not a bad idea anyway, since it will let you change the way you store
and retrieve data later, without changing the code that uses it.

You could store the whole thing in a mmap image, with a length-prefixed pickle
string in the front representing index info.

There's a lot of different things you could do. But Alex's suggestion (upgrade to
2.3 and use protocol 2 pickle) will probably take care of it ;-)

>>import cPickle
>
>>plik = open("mealy","r")
>>mealy = cPickle.load(plik)
>>plik.close()
>
>    this takes about 30 seconds!
>    How can I accelerate it?

Find a way to avoid doing it? Or doing much of it?
What are your access needs once the data is accessible?

Regards,
Bengt Richter




More information about the Python-list mailing list