Generating valid identifiers

Ian Kelly ian.g.kelly at gmail.com
Thu Jul 26 15:28:26 EDT 2012


On Thu, Jul 26, 2012 at 9:30 AM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> What happens if you get a collision?
>
> That is, you have two different long identifiers:
>
> a.b.c.d...something
> a.b.c.d...anotherthing
>
> which by bad luck both hash to the same value:
>
> a.b.c.d.$AABB99
> a.b.c.d.$AABB99
>
> (or whatever).

The odds of a given pair of identifiers having the same digest to 10
hex digits are 1 in 16^10, or approximately 1 in a trillion.  If you
bought one lottery ticket a day at those odds, you would win
approximately once every 3 billion years.  But it's not enough just to
have a hash collision, they also have to match exactly in the first 21
(or 30, or whatever) characters of their actual names, and they have
to both be long enough to invoke the truncating scheme in the first
place.

The Oracle backend for Django uses this same approach with an MD5 sum
to ensure that identifiers will be no more than 30 characters long (a
hard limit imposed by Oracle).  It actually truncates the hash to 4
digits, though, not 10.  This hasn't caused any problems that I'm
aware of.



More information about the Python-list mailing list