Text Search Engine that works with Python

David Mertz, Ph.D. mertz at gnosis.cx
Mon Mar 4 13:50:49 EST 2002


|...text search engine that works with Python?  What I'm looking for
|specifically is something that will compress the text and still allow
|searches and retrievals that can be exact matches or proximity based.
|The text I want to compress and search is huge (70 megs) and should
|compress down to half, not including any index files that might be
|required by the search engine.

My indexer.py modules does this (mostly).  I wrote an articles
discussing the module at:

    http://gnosis.cx/publish/programming/charming_python_15.txt

I have now incorporated it into a package at:

    http://gnosis.cx/download/Gnosis_XML_Utils-0.9.tar.gz

The indexer is sort of an ugly duckling in there, since it doesn't have
anything to do with XML, per se.  But xml_indexer.py uses indexer.py for
support, so I bundled things this way.

Anyway, indexer does not allow proximity searches, but does allow
searches for multiple words that occur in the same documents.  The
indexes are quite reasonable sized, and the indexer will operate on
gzip'd files happily (it wouldn't be difficult to add support for zip,
bzip2, etc).  The module itself doesn't perform compressions, but that's
what 'gzip' is for.

--
 mertz@   _/_/_/_/_/_/_/ THIS MESSAGE WAS BROUGHT TO YOU BY:_/_/_/_/ v i
gnosis  _/_/                    Postmodern Enterprises         _/_/  s r
.cx    _/_/  MAKERS OF CHAOS....                              _/_/   i u
      _/_/_/_/_/ LOOK FOR IT IN A NEIGHBORHOOD NEAR YOU_/_/_/_/_/    g s





More information about the Python-list mailing list