Fast full-text searching in Python (job for Whoosh?)

Dino dino at no.spam.ar
Sat Mar 4 22:43:54 EST 2023


I need fast text-search on a large (not huge, let's say 30k records 
totally) list of items. Here's a sample of my raw data (a list of US 
cars: model and make)

     $ head all_cars_unique.csv\
     Acura,CL
     Acura,ILX
     Acura,Integra
     Acura,Legend
     Acura,MDX
     Acura,MDX Sport Hybrid
     Acura,NSX
     Acura,RDX
     Acura,RL
     Acura,RLX
     $ wc -l all_cars_unique.csv
     1415 all_cars_unique.csv
     $ grep -i v60 all_cars_unique.csv
     Genesis,GV60
     Volvo,V60
     $

Essentially, I want my input field to suggest autofill options with data 
from this file/list. The user types "v60" and a REST point will offer:

     [
      {"model":"GV60", "manufacturer":"Genesis"},
      {"model":"V60", "manufacturer":"Volvo"}
     ]

i.e. a JSON response that I can use to generate the autofill with 
JavaScript. My Back-End is Python (Flask).

How can I implement this? A library called Whoosh seems very promising 
(albeit it's so feature-rich that it's almost like shooting a fly with a 
bazooka in my case), but I see two problems:

  1) Whoosh is either abandoned or the project is a mess in terms of 
community and support (https://groups.google.com/g/whoosh/c/QM_P8cGi4v4 
) and

  2) Whoosh seems to be a Python only thing, which is great for now, but 
I wouldn't want this to become an obstacle should I need port it to a 
different language at some point.

are there other options that are fast out there? Can I "grep" through a 
data structure in python... but faster?

Thanks

Dino



More information about the Python-list mailing list