Efficient posting-list

Shagshag13 shagshag13 at yahoo.fr
Mon Jun 3 06:12:20 EDT 2002


A posting list is something like that (hope i'm not too wrong) :

[key_0] -> ...
...
[key_i] -> [id_j, informations] -> [id_k, informations] -> ...
...
[key_n] -> ...

where in information retrieval :
- key are often words,
- id are document id (so we have key_i in document id_j and in document id_k and ...)
- informations are often in document frequency, but might be more...

I'm looking for an efficient way of implementing this in full python, by now i use a dict
for keys and a python list containing node objects for [id_j, informations].
But i have two troubles this is very slow to populate, and too memory consuming do you
have an idea for optimizing this ?

(keep in mind that : there are more than 500,000 text keys ; id are numbered from 1 to
150,000 ; length of a posting list might be from 1 to 500,000...)

s13.

ps : i could send my code to anyone asking it by mail...
ps2 : sorry again for my bad english...
psx : and thanks again to the guys which helps me before...





More information about the Python-list mailing list