md5.hexdigest() converting unicode string to ascii

uebertester mgibson at tripwire.com
Tue Apr 20 19:16:33 EDT 2004


"Fredrik Lundh" <fredrik at pythonware.com> wrote in message news:<mailman.727.1082181832.20120.python-list at python.org>...
> "uebertester" wrote:
> 
> > None of the suggestions seem to address the issue.  sValue =
> > _winreg.QueryValueEx(y,"") returns a tuple containing the following
> > (u'http://', 1).  The string u'http://' is added to the md5 object via
> > the update() and then hashed via hexdigest().  How do I keep the
> > unicode string from being converted to ascii with the md5 functions?
> 
> krzysztof already explained this:
> 
> - MD5 is calculated on bytes, not characters.
> - Unicode strings contain characters, not bytes.
> - if you pass in a Unicode string where Python expects a byte string,
>   Python converts the Unicode string to an 8-bit string using the default
>   rules (which simply creates 8-bit bytes with the same values as the
>   corresponding Unicode characters, as long as the Unicode string only
>   contains characters for which ord(ch) < 128).
> - if you're not happy with that rule, you have to convert the Unicode
>   string to a byte string yourself, using the "encode" method.
> 
>         m.update(u.encode(encoding))
> 
> - if you don't know what encoding you're supposed to use, you have
>   to guess.  if it doesn't matter, as long as you remember what you used,
>   I'd suggest "utf-8" or perhaps "utf-16-le".
> 
> > Or can I?
> 
> given how things work, the "how do I keep the string from being
> converted" doesn't really make sense.
> 
> </F>

Thanks for the clarification.  My confusion stemed from the Python
Library Reference which states, "Its use is quite straightforward: use
new() to create an md5 object. You can now feed this object with
arbitrary strings using the update() method, and at any point you can
ask it for the digest...".

I've attempted the suggested solution specifying different encodings,
however, the hash value that is returned does not match what I expect
based upon another utility I'm checking against.  Hash value returned
by python specifying utf16 encoding: 731f46dd88cb3a67a4ee1392aa84c6f4
.  Hash value returned by other utility:
0b0ebc769e2b89cf61a10a72d5a11dda .  Note:  I've tried other encoding
also.  As the utility I'm verifying against is extensively used, I'm
assuming it is returning the correct value.  I appreciate any help in
resolving this as I'm trying to enhance an automated test suite
written in python.

Thanks



More information about the Python-list mailing list