Binary trees storing huge amounts of data in nodes

Thomas Weholt thomas at cintra.no
Tue Nov 7 10:22:02 EST 2000


Hi,

I need to build a customized full-text search engine ( Yes, I've asked about
this before ) and I'd like to use something that's well supported in Python,
either gdm or Berkley DB. I want to store a word as key and data about
occurences in the value-part. My problem is that the amount of data in the
node can be huge. ( The amount of data to be scanned is a collection of
programming articles and source code, documents etc. ).

How can I best do this? Could I use Berkley DB for storing words and
pointers to someplace the data was stored? How should I organize the data
for best response time? The generated index is pretty static, ie. data are
appended, not often removed or moved.

I've looked at Ransacker but it doesn't seem to fit the amount of data I
need to scan. I'm using a PostgreSQL-database to store my data in so if
anybody know how I best could make a full-text search engine in Python using
PostgreSQL as back-end, that would be just great.

Thanks.

Best regards,
Thomas





More information about the Python-list mailing list