ANN: NUCULAR B3 Full text indexing (now on Win32 too)

Paul Rubin http
Fri Feb 22 17:31:06 EST 2008


Aaron Watters <aaron.watters at gmail.com> writes:
> [apologies to the list: I would have done this offline,
> but I can't figure out Paul's email address.]
> 
> 1) Paul please forward your email address

Will send it privately.  I don't have a public email address any more
(death to spam!!!).  My general purpose online contact point is
http://paulrubin.com which currently has an expired certificate that
I'll get around to renewing someday.  Meanwhile you have to click
"accept" to connect using the expired cert.

> 3) Since you seem to know about these things: I was thinking
> of adding an optional feature to Nucular which would allow
> a look-up like "given a word find all attributes that contain
> that word anywhere and give a count of the number of times it
> is found in that attribute as well as the entry id for an example
> instance (arbitrarily chosen).  I was thinking about calling
> this "inverted faceting", but you probably know a
> better/standard name, yes?  What is it please?  Thanks!
> Answers from anyone else welcomed also.

In Solr this is called the DisMax (disjunction maximum) handler, I
think.  I tried it and it doesn't work very well, and ended up using a
script written by a co-worker, that expands such queries to more
complex queries that put user-supplied weights on each field.  It is a
somewhat messy problem.  Otis Gospodnetic's book "Lucene in Action"
talks about it some, I believe.  Manning and Schutz are working on a
new book at http://informationretrieval.org that discusses fancier
methods.  I think these are worth looking into, but I haven't had the
bandwidth to spend time on it so far.



More information about the Python-list mailing list