Fwd: NUCULAR fielded text searchable indexing

Thu Oct 11 13:29:46 EDT 2007

regarding http://nucular.sourceforge.net

On Oct 11, 12:32 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
>
> How many items did each query return?  When I refer to large result
> sets, I mean you often get queries that return 10k items or more (a
> pretty small number: typing "python" into google gets almost 30
> million hits) and you need to actually examine each item, as opposed
> to displaying ten at a time or something like that (e.g. you want to
> present faceted results).

I can't give a detailed report.  I think 10k result sets were
not unusual or noticably slower.  Of the online demos, looking at

http://www.xfeedme.com/nucular/gut.py/go?FREETEXT=w

(w for "web") we get 6294 entries which takes about 500ms on
a cold index and about 150ms on a warm index.  This is on a very
active shared hosting machine.

You are right that you might want to
use more in-process memory for a really smart, multi-faceted relevance
ordering or whatever, but you have to be willing to pay for it
in terms of system resources, config/development time, etcetera.
If you want cheap and easy, nucular might be good enough, afaik.

Regarding the 30 million number -- I bet google does
estimations and culling of some kind (not really looking at all 10M).
I'm pretty sure of this because in some cases I've looked at all
results available and it turned out to be a lot smaller than the
estimate on the first page.
I'm not interested in really addressing the "google" size of data set
at the moment.

>  http://www.newegg.com/Product/Product.aspx?Item=N82E16820147021

holy rusty metal batman! way-cool!

thanks,  -- Aaron Watters

===
less is more