Map lots of words to lots of integers

‘5ÛHH575-UAZWKVVP-7H2H48V3 thomas at cintra.no
Thu May 4 08:41:21 EDT 2000


Hi,

I need a fast way of mapping words to integers. A single word must be
able to point to many, *many*, integers. Tried stuff like a dict,
words as keys, pointing to a list of integers. This is all fine and
nice if the thing is located in memory. I want to (or need ) to store
all of this on disk. And the method must be fast. Thought I could use
a Berkley DB file using words as keys, but what should they point to? 

The number of words can of course be thousands and the integers they
point to even more. Does Zopes internals like ZODB etc. offer anything
I could use?

What I`ve tried so far is to make a general indexing-module, where you
do something like

x = Indexer('data_file.db') 

# extract words from documents etc.

x.add(word2index, id)
etc. etc.
x.index()
print x.locate('python')
[432,6363,326,65464,6544,456465465,65433,76] # of course this would be
# HUGE and may not fit into a list

What I`d really need is to store several integers as one key/id, ex.
as a tuple, but I`ll settle for less if somebody just could give me
some pointers.

NOTE! The number of words are as many as there are eh ... words, and
integers, well, how far can a human count?

Thanks.

Thomas



More information about the Python-list mailing list