Fast lookup of bulky "table"

Dino dino at no.spam.ar
Sun Jan 15 22:13:06 EST 2023


On 1/15/2023 2:23 PM, Weatherby,Gerard wrote:
> That’s about what I got using a Python dictionary on random data on a high memory machine.
> 
> https://github.com/Gerardwx/database_testing.git
> 
> It’s not obvious to me how to get it much faster than that.

Gerard, you are a rockstar. This is going to be really useful if I do 
decide to adopt sqlite3 for my PoC, as I understand what's going on 
conceptually, but never really used sqlite (nor SQL in a long long 
time), so this may save me a bunch of time.

I created a 300 Mb DB using your script. Then:

$ ./readone.py
testing 2654792 of 4655974
Found somedata0002654713 for 1ed9f9cd-0a9e-47e3-b0a7-3e1fcdabe166 in 
0.239335000020219 seconds

$ ./prefetch.py
Index build 4.420937899994897 seconds
testing 3058568 of 4655974
Found somedata0000202200 for 5dca1455-9cd6-4e4d-8e5a-7e6400de7ca7 in 
4.4000043999403715e-06 seconds

So, if I understand right:

1) once I built a dict out of the DB (in about 4 seconds), I was able to 
lookup an entry/record in 4 microseconds(!)

2) looking up a record/entry using a Sqlite query took 0.2 seconds (i.e. 
500x slower)

Interesting. Thank you for this. Very informative. I really appreciate 
that you took the time to write this.

The conclusion seems to me that I probably don't want to go the Sqlite 
route, as I would be placing my data into a database just to extract it 
back into a dict when I need it if I want it fast.

Ps: a few minor fixes to the README as this may be helpful to others.

./venv/... => ./env/..

i.e.
  ./env/bin/pip install -U pip
  ./env/bin/pip install -e .

Also add part in []

Run create.py [size of DB in bytes] prior to running readone.py and/or 
prefetch.py

BTW, can you tell me what is going on here? what's := ?

    while (increase := add_some(conn,adding)) == 0:

https://github.com/Gerardwx/database_testing/blob/main/src/database_testing/create.py#L40

Dino


More information about the Python-list mailing list