hashing strings to integers for sqlite3 keys

Adam Funk a24061 at ducksburg.com
Thu May 22 09:54:08 EDT 2014


On 2014-05-22, Tim Chase wrote:

> On 2014-05-22 12:47, Adam Funk wrote:
>> I'm using Python 3.3 and the sqlite3 module in the standard library.
>> I'm processing a lot of strings from input files (among other
>> things, values of headers in e-mail & news messages) and suppressing
>> duplicates using a table of seen strings in the database.
>> 
>> It seems to me --- from past experience with other things, where
>> testing integers for equality is faster than testing strings, as
>> well as from reading the SQLite3 documentation about INTEGER
>> PRIMARY KEY --- that the SELECT tests should be faster if I am
>> looking up an INTEGER PRIMARY KEY value rather than TEXT PRIMARY
>> KEY.  Is that right?
>
> If sqlite can handle the absurd length of a Python long, you *can* do
> it as ints:

It can't.  SQLite3 INTEGER is an 8-byte signed one.

https://www.sqlite.org/datatype3.html

But after reading the other replies to my question, I've concluded
that what I was trying to do is pointless.


>  >>> from hashlib import sha1
>  >>> s = "Hello world"
>  >>> h = sha1(s)
>  >>> h.hexdigest()
>   '7b502c3a1f48c8609ae212cdfb639dee39673f5e'
>  >>> int(h.hexdigest(), 16)
>   703993777145756967576188115661016000849227759454L

That ties in with a related question I've been wondering about lately
(using MD5s & SHAs for other things) --- getting a hash value (which
is internally numeric, rather than string, right?) out as a hex string
& then converting that to an int looks inefficient to me --- is there
any better way to get an int?  (I haven't seen any other way in the
API.)


-- 
A firm rule must be imposed upon our nation before it destroys
itself. The United States needs some theology and geometry, some taste
and decency. I suspect that we are teetering on the edge of the abyss.
                                                 --- Ignatius J Reilly



More information about the Python-list mailing list