Comments: full-text indexing of RDBMS

Michal Wallace sabren at manifestation.com
Wed Jun 28 09:04:56 EDT 2000


On Tue, 27 Jun 2000, Thomas Weholt wrote:

> I get a list of all the tables in a specified database from the
> databasemodule, remove all database-system tables, goes thru each
> table, select * from table. The result ends up in a dictionary-object.
> I check all the values for plain text. Integers and numbers in general
> are ignored. A function take the plain text and returns a list of
> words found in that record. I put each word in a new table, gives it a
> unique id. A record in PostgreSQL has a OID, an uniqe
> object-identifier.  The OID is mapped against the ids of the words
> that occurred in that record. 


Hey Thomas,

  You emailed me a while back about ransacker... I've actually
improved the speed quite a bit since then, provided you use the bsd
module for the hash stuff, since performance of gdbm seemed to be a
large part of the problem... It's not perfect yet, but you might
want to try it out.

  http://ransacker.sourceforge.net/

  Your algorithm would still be what you have above, but ransacker
does a lot of it for you. If nothing else, you might consider it in
the prototyping stage.

Cheers,

- Michal
------------------------------------------------------------------------
www.manifestation.com  www.sabren.com  www.linkwatcher.com  www.zike.net
------------------------------------------------------------------------





More information about the Python-list mailing list