Fast full-text searching in Python (job for Whoosh?)

Thomas Passin list1 at tompassin.net
Sun Mar 5 21:05:10 EST 2023


On 3/4/2023 11:12 PM, Dino wrote:
> On 3/4/2023 10:43 PM, Dino wrote:
>>
>> I need fast text-search on a large (not huge, let's say 30k records 
>> totally) list of items. Here's a sample of my raw data (a list of US 
>> cars: model and make)
> 
> I suspect I am really close to answering my own question...
> 
>  >>> import time
>  >>> lis = [str(a**2+a*3+a) for a in range(0,30000)]
>  >>> s = time.process_time_ns(); res = [el for el in lis if "13467" in 
> el]; print(time.process_time_ns() -s);
> 753800
>  >>> s = time.process_time_ns(); res = [el for el in lis if "52356" in 
> el]; print(time.process_time_ns() -s);
> 1068300
>  >>> s = time.process_time_ns(); res = [el for el in lis if "5256" in 
> el]; print(time.process_time_ns() -s);
> 862000
>  >>> s = time.process_time_ns(); res = [el for el in lis if "6" in el]; 
> print(time.process_time_ns() -s);
> 1447300
>  >>> s = time.process_time_ns(); res = [el for el in lis if "1" in el]; 
> print(time.process_time_ns() -s);
> 1511100
>  >>> s = time.process_time_ns(); res = [el for el in lis if "13467" in 
> el]; print(time.process_time_ns() -s); print(len(res), res[:10])
> 926900
> 2 ['134676021', '313467021']
>  >>>
> 
> I can do a substring search in a list of 30k elements in less than 2ms 
> with Python. Is my reasoning sound?

I would probably ingest the data at startup into a dictionary - or 
perhaps several depending on your access patterns - and then you will 
only need to to a fast lookup in one or more dictionaries.

If your access pattern would be easier with SQL queries, load the data 
into an SQLite database on startup.

IOW, do the bulk of the work once at startup.


More information about the Python-list mailing list