MD5 hash for url and utf unicode converting to ascii

Nick Craig-Wood nick at craig-wood.com
Tue Jun 24 04:32:11 EDT 2008


joe shoemaker <joemystery123 at gmail.com> wrote:
>  I would like to convert url into md5 hash. My question is that md5
>  hash will create collision at 2^64. If you do long(value,16), where
>  value is the md5 hash string, would value returned from long(value,
>  16) be unique as long as md5 hashed string is unique? when you move
>  md5 hashed string to long, where will the collision occur, at anything
> >= 2^64?
> 
>        hash = md5.new()
>        hash.update("some_url_")
>        value = hash.digest()
>        value_in_int = long(value, 16) #would this be unique as long as
>  hashed string is unique(i.e < 2^64)
>        hash = md5.new() hash.update("some_url_") value = hash.digest()
>  value_in_int = long(value, 16) #would this be unique as long as hashed
>  string is unique(i.e < 2^64)

MD5 Sums don't guarantee uniqueness for any length of string.

If your hash had as many or more bits in as the input string then
there are hashes which are guaranteed unique, but MD5 isn't one of
them.  You could (lets say) AES encrypt the string instead.

>  Do I need to also convert the value to base64.encodestring(value)?
>  What is the purpose of base64.encodestring?

To turn the buffer into printable characters.  You can do it like this also...

>>> import md5
>>> hash = md5.new()
>>> hash.update("some_url_")
>>> value = hash.digest()
>>> value
'\xc9\x11}\x8f?64\x83\xf3\xcaPz\x1d!\xddd'
>>> value.encode("hex")
'c9117d8f3f363483f3ca507a1d21dd64'
>>> long(value.encode("hex"), 16)
267265642849753964132104960801656397156L
>>>

>  For unicode encoding, I can do, md5.update(value.encode('utf-8')) to
>  give me ascii values.

Yes that would be fine

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick



More information about the Python-list mailing list