matching strings in a large set of strings

News123 news1234 at free.fr
Sat May 1 07:48:02 EDT 2010


Dennis Lee Bieber wrote:
> On Thu, 29 Apr 2010 11:38:28 +0200, "Karin Lagesen"
> <karin.lagesen at bio.uio.no> declaimed the following in comp.lang.python:
> 
>> Hello.
>>
>> I have approx 83 million strings, all 14 characters long. I need to be
>> able to take another string and find out whether this one is present
>> within the 83 million strings.
>>

>>
> 	So don't load them into memory... First use a file-based (not memory
>
> 
> 	That lets you do a binary search on the file. Much faster than a
> linear search (linear search will average out to 41.5M read operations;
> binary should be around 10000 reads)

Don't you meant 27 reads instead of 41.5 M reads?

>>> from math import log
>>> log(83e6)/log(2)
26.306608000671101
>>>


N



More information about the Python-list mailing list