little question

Van Gale cgale1 at cox.net
Fri May 10 06:55:39 EDT 2002


Alex Martelli wrote:
> shagshag wrote:
>
> > Does anyone knows where to find (a) python module(s) to handle inverted
> > index and build it from text/xml file ?
>
> I'm not sure what an *inverted* index is.  A dictionary indexed by
> word and for each word giving the set of (filename,linenumber) where
> the word appears appears as a tiny example in my Linux Magazine article,
> April issue -- a few lines to build it, a few to use it for queries
> "where is this word found".  But that's what I'd call an index,
> nothing 'inverted' about it, so I don't know what you need.
>
>
> Alex
>

"Inverted index" is an older term for a file with a list of words in a
document and their offset into the document.  Exactly what you implemented
with a dictionary, but normally implemented with a btree type index because
full-text searching often needs things like stemming and wildcards (e.g.
"tax*" getting hits on tax, taxes, taxation, ...)  The term "inverted index"
has historical roots in bibliographic indexing and hasn't really had any
consistent meaning in the database world.

As for Mr. shagshag, I think this link has what you're looking for :)

http://gnosis.cx/publish/programming/charming_python_15.txt

Van






More information about the Python-list mailing list