Fast full-text searching in Python (job for Whoosh?)

Thomas Passin list1 at tompassin.net
Wed Mar 8 16:00:32 EST 2023


On 3/8/2023 3:27 PM, Peter J. Holzer wrote:
> On 2023-03-08 00:12:04 -0500, Thomas Passin wrote:
>> On 3/7/2023 7:33 AM, Dino wrote:
>>> in fact it's a dilemma I am facing now. My back-end returns 10
>>> entries (I am limiting to max 10 matches server side for reasons you
>>> can imagine). As the user keeps typing, should I restrict the
>>> existing result set based on the new information or re-issue a API
>>> call to the server? Things get confusing pretty fast for the user.
>>> You don't want too many cooks in kitchen, I guess.
>>> Played a little bit with both approaches in my little application.
>>> Re-requesting from the server seems to win hands down in my case.
>>> I am sure that them google engineers reached spectacular levels of UI
>>> finesse with stuff like this.
>>
>> Subject of course to trying this out, I would be inclined to send a much
>> larger list of responses to the client, and let the client reduce the number
>> to be displayed.  The latency for sending a longer list will be smaller than
>> establishing a new connection or even reusing an old one to send a new,
>> short list of responses.
> 
> That depends very much on how long that list can become. If it's 200
> matches - sure, send them all, even if the client will display only 10
> of them. Probably even for 2000. But if you might get 20 million matches
> you surely don't want to send them all to the client.

Yes, of course.  OTOH, if you have 2000+ possibilities it's basically 
pointless to send them to the client.  You can send the first 10, and 
hope that will be worth something (it probably won't).  You can send all 
2000 and let the client show the first say 10, but that probably won't 
be worth much either.  If you have some way to prioritize them, you can 
include the scores and send the top say 100 what you send to the client, 
and let the client figure out what to do.

If you are going to have that many responses you will need some more 
complex and sophisticated approach anyway, so the whole discussion would 
  not be applicable.  And this would be getting miles (kms) away from 
the OP's situation.




More information about the Python-list mailing list