unicode and hashlib
Bryan Olson
fakeaddress at nowhere.org
Mon Dec 1 08:53:51 EST 2008
Jeff H wrote:
> [...] So once I have character strings transformed
> internally to unicode objects, I should encode them in 'utf-8' before
> attempting to do things that guess at the proper way to encode them
> for further processing.(i.e. hashlib)
It looks like hashlib in Python 3 will not even attempt to digest a
unicode object. Trying to hash 'abcdefg' in in Python 3.0rc3 I get:
TypeError: object supporting the buffer API required
I think that's good behavior, except that the error message is likely to
send beginners to look up the obscure buffer interface before they find
they just need mystring.decode('utf8') or bytes(mystring, 'utf8').
>>>> a='André'
>>>> b=unicode(a,'cp1252')
>>>> b
> u'Andr\xc3\xa9'
>>>> hashlib.md5(b.encode('utf-8')).hexdigest()
> 'b4e5418a36bc4badfc47deb657a2b50c'
Incidentally, MD5 has fallen and SHA-1 is falling. Python's hashlib also
includes the stronger SHA-2 family.
--
--Bryan
More information about the Python-list
mailing list