hashing strings to integers for sqlite3 keys
alister
alister.nospam.ware at ntlworld.com
Thu May 22 10:48:19 EDT 2014
On Thu, 22 May 2014 12:47:31 +0100, Adam Funk wrote:
> I'm using Python 3.3 and the sqlite3 module in the standard library. I'm
> processing a lot of strings from input files (among other things, values
> of headers in e-mail & news messages) and suppressing duplicates using a
> table of seen strings in the database.
>
> It seems to me --- from past experience with other things, where testing
> integers for equality is faster than testing strings, as well as from
> reading the SQLite3 documentation about INTEGER PRIMARY KEY --- that the
> SELECT tests should be faster if I am looking up an INTEGER PRIMARY KEY
> value rather than TEXT PRIMARY KEY. Is that right?
>
> If so, what sort of hashing function should I use? The "maxint" for
> SQLite3 is a lot smaller than the size of even MD5 hashes. The only
> thing I've thought of so far is to use MD5 or SHA-something modulo the
> maxint value. (Security isn't an issue --- i.e., I'm not worried about
> someone trying to create a hash collision.)
>
> Thanks,
> Adam
why not just set the filed in the DB to be unique & then catch the error
when you try to Wright a duplicate?
let the DB engine handle the task
--
Your step will soil many countries.
More information about the Python-list
mailing list