unicode and hashlib

Bryan Olson fakeaddress at nowhere.org
Mon Dec 1 08:53:51 EST 2008


Jeff H wrote:
> [...] So once I have character strings transformed
> internally to unicode objects, I should encode them in 'utf-8' before
> attempting to do things that guess at the proper way to encode them
> for further processing.(i.e. hashlib)

It looks like hashlib in Python 3 will not even attempt to digest a 
unicode object. Trying to hash 'abcdefg' in in Python 3.0rc3 I get:

   TypeError: object supporting the buffer API required

I think that's good behavior, except that the error message is likely to 
send beginners to look up the obscure buffer interface before they find 
they just need mystring.decode('utf8') or bytes(mystring, 'utf8').

>>>> a='André'
>>>> b=unicode(a,'cp1252')
>>>> b
> u'Andr\xc3\xa9'
>>>> hashlib.md5(b.encode('utf-8')).hexdigest()
> 'b4e5418a36bc4badfc47deb657a2b50c'

Incidentally, MD5 has fallen and SHA-1 is falling. Python's hashlib also 
includes the stronger SHA-2 family.


-- 
--Bryan



More information about the Python-list mailing list