md5.hexdigest() converting unicode string to ascii

Peter Hansen peter at engcorp.com
Fri Apr 16 18:35:16 EDT 2004


uebertester wrote:

> None of the suggestions seem to address the issue.  sValue =
> _winreg.QueryValueEx(y,"") returns a tuple containing the following
> (u'http://', 1).  The string u'http://' is added to the md5 object via
> the update() and then hashed via hexdigest().  How do I keep the
> unicode string from being converted to ascii with the md5 functions? 
> Or can I?

You cannot.  You missed the key fact, which is that Unicode strings
are sequences of "characters" (roughly, 16-bit values), not sequences
of bytes.  MD5 is defined on byte sequences.  *You* must specify the
encoding scheme you want to use, by converting the string before
passing it to the hash function.

If you are trying to match the MD5 values calculated by some other
tool, you must find out what encoding scheme that other tool was
using (maybe by trial and error, starting with utf-8 probably).
If this is just for your own purposes, simply pick a convenient
scheme and encode consistently.

  md5.update(yourUnicode.encode('utf-8')) for example...

-Peter



More information about the Python-list mailing list