hashing strings to integers for sqlite3 keys
Adam Funk
a24061 at ducksburg.com
Thu May 22 07:47:31 EDT 2014
I'm using Python 3.3 and the sqlite3 module in the standard library.
I'm processing a lot of strings from input files (among other things,
values of headers in e-mail & news messages) and suppressing
duplicates using a table of seen strings in the database.
It seems to me --- from past experience with other things, where
testing integers for equality is faster than testing strings, as well
as from reading the SQLite3 documentation about INTEGER PRIMARY KEY
--- that the SELECT tests should be faster if I am looking up an
INTEGER PRIMARY KEY value rather than TEXT PRIMARY KEY. Is that
right?
If so, what sort of hashing function should I use? The "maxint" for
SQLite3 is a lot smaller than the size of even MD5 hashes. The only
thing I've thought of so far is to use MD5 or SHA-something modulo the
maxint value. (Security isn't an issue --- i.e., I'm not worried
about someone trying to create a hash collision.)
Thanks,
Adam
--
"It is the role of librarians to keep government running in difficult
times," replied Dramoren. "Librarians are the last line of defence
against chaos." (McMullen 2001)
More information about the Python-list
mailing list