unicode and hashlib

MRAB google at mrabarnett.plus.com
Fri Nov 28 14:25:45 EST 2008


Jeff H wrote:
> hashlib.md5 does not appear to like unicode,
>   UnicodeEncodeError: 'ascii' codec can't encode character u'\xa6' in
> position 1650: ordinal not in range(128)
> 
> After googling, I've found BDFL and others on Py3K talking about the
> problems of hashing non-bytes (i.e. buffers)
> http://www.mail-archive.com/python-3000@python.org/msg09824.html
> 
> So what is the canonical way to hash unicode?
>  * convert unicode to local
>  * hash in current local
> ???
> but what if local has ordinals outside of 128?
> 
> Is this just a problem for md5 hashes that I would not encounter using
> a different method?  i.e. Should I just use the built-in hash function?
 >
It can handle bytestrings, but if you give it unicode it performs a 
default encoding to ASCII, but that fails if there's a codepoint >= 
U+0080. Personally, I'd recommend encoding unicode to UTF-8.



More information about the Python-list mailing list