Map lots of words to lots of integers

Bjorn Pettersen bjorn at roguewave.com
Thu May 4 13:58:33 EDT 2000


The shelve module should be most useful to you:

  import shelve
  db = shelve.open('foobar.db')
  db['key'] = [1,2,3]
  for k in db.keys():
    print k, db[k]
  db.close()

if your lists get _really_ long, just add a layer of indirection
(something along the lines of):

  db['word'] = ['@word at 1', '@word at 2'] # list of keys to sublists
  db['@word at 1'] = [1,2,3]             # first sublist
  db['@word at 2'] = [4,5,6]             # second sublist

-- bjorn

?5?HH575-UAZWKVVP-7H2H48V3 wrote:
> 
> Hi,
> 
> I need a fast way of mapping words to integers. A single word must be
> able to point to many, *many*, integers. Tried stuff like a dict,
> words as keys, pointing to a list of integers. This is all fine and
> nice if the thing is located in memory. I want to (or need ) to store
> all of this on disk. And the method must be fast. Thought I could use
> a Berkley DB file using words as keys, but what should they point to?
> 
> The number of words can of course be thousands and the integers they
> point to even more. Does Zopes internals like ZODB etc. offer anything
> I could use?
> 
> What I`ve tried so far is to make a general indexing-module, where you
> do something like
> 
> x = Indexer('data_file.db')
> 
> # extract words from documents etc.
> 
> x.add(word2index, id)
> etc. etc.
> x.index()
> print x.locate('python')
> [432,6363,326,65464,6544,456465465,65433,76] # of course this would be
> # HUGE and may not fit into a list
> 
> What I`d really need is to store several integers as one key/id, ex.
> as a tuple, but I`ll settle for less if somebody just could give me
> some pointers.
> 
> NOTE! The number of words are as many as there are eh ... words, and
> integers, well, how far can a human count?
> 
> Thanks.
> 
> Thomas
> --
> http://www.python.org/mailman/listinfo/python-list




More information about the Python-list mailing list