Generating unique row ID ints.

Nick Vatamaniuc vatamane at gmail.com
Sun Oct 1 13:57:33 EDT 2006


The primary key is usually there to uniquely identify each row. There
are a couple of ways to generate them:

  One is to always create  a new integer id (sequentially or random)
when you insert a new row. This means  that if you insert the same data
 again, the row will always be duplicated. Usually you don't want this.


  Then the primary key integer must somehow 'represent' the whole data
row. So you   would want for the key to be a function of the whole data
row, such that when the data is the same the row ID is the same and
when the data is different the row ID is different. The answer to your
problem is to use a message digest (actually a  message authentication
code function).

  For example if your data row is in the tuple 'row' then  you can do:
hex_digest_key=md5.new("|".join(row)).hexdigest(), you would have to
import the md5 module before doing this.

What that will do is your row will be contcatenated into one string
with each field separated by "|" then the md5 hash of that string will
be taken and the result returned to you in hexadecimal form. You'll get
back something like hex_digest_key='5d41402abc4b2a76b9719d911017c592'.
Then you can turn that into an integer by doing int(hex_digest_key,16)
-- and you can use that integer as your primary key.

  But since one of the problems you want to solve is the user's ability
to predict the next key, you cannot just use a simple message digest
function. If the user finally figures out that you are running an MD5
algorithm, the user can also run the same algorithm and generate the
same message digest -- If that is a problem,  then use a MAC (Message
Authentication Code) function. It works almost like a message digest
except you concatenate a secret key to the input so MD5 is run on the
row_as_text+secret_key.

Unless the user knows your secret key, they could not generate a
primary key from a given row even if they know you used MD5 and even if
the know the data content  of  your row.

 NOTE: When using a message digest (and friends) it is important to
realize that there will be some collision between the keys if the
number of all possible digests (as limited by the digest algoritm) is
smaller than the number of the possible messages. In practice if you
have large enough integers (64)  you shouldn't see any collisions
occur, but it is still good to be aware of them...

Hope this helps,
-Nick Vatamaniuc

Simon Wittber wrote:
> I'm building a web application using sqlalchemy in my db layer.
>
> Some of the tables require single integer primary keys which might be
> exposed in some parts of the web interface. If users can guess the next
> key in a sequence, it might be possible for them to 'game' or
> manipulate the system in unexpected ways. I want to avoid this by
> generating a random key for each row ID, and have decided to use the
> same approach for all my single key tables.
>
> Are there any best practices for implementing this?
>
> If the random module is suitable, does anyone have any good ideas on
> how this could be implemented?
>
> Some questions which came to mind are:
>     Would I need to save and restore the random module state when
> generating id's for each table?
>     What would be an appropriate seed?
>     How many random integers can I generate before a repeat becomes
> probable?
>
> I've got my own ideas for implementing this, but am interested to see
> how/if anyone else has tackled the same problem.
> 
> 
> -Sw.




More information about the Python-list mailing list