cPickle alternative?

Bengt Richter bokr at oz.net
Fri Aug 15 23:03:05 EDT 2003


On Sat, 16 Aug 2003 00:41:42 +0200, "Drochom" <pedrosch at gazeta.pl> wrote:

>Hello,
>
>> If speed is important, you may want to do different things depending on
>e.g.,
>> what is in those tuples, and whether they are all the same length, etc.
>E.g.,
>> if they were all fixed length tuples of integers, you could do hugely
>better
>> than store the data as a list of tuples.
>Those tuples have different length indeed.
>
>> You could store the whole thing in a mmap image, with a length-prefixed
>pickle
>> string in the front representing index info.
>If i only knew how do to it...:-)
>
>> Find a way to avoid doing it? Or doing much of it?
>> What are your access needs once the data is accessible?
>My structure stores a finite state automaton with polish dictionary (lexicon
>to be more precise) and it should be loaded
>once but fast!
>
I wonder how much space it would take to store the Polish complete language word
list with one entry each in a Python dictionary. 300k words of 6-7 characters avg?
Say 2MB plus the dict hash stuff. I bet it would be fast.

Is that in effect what you are doing, except sort of like a regex state machine
to match words character by character?

Regards,
Bengt Richter




More information about the Python-list mailing list