unicode and hashlib

Jeff H dundeemt at gmail.com
Sat Nov 29 09:51:33 EST 2008


On Nov 29, 8:27 am, Jeff H <dunde... at gmail.com> wrote:
> On Nov 28, 2:03 pm, Terry Reedy <tjre... at udel.edu> wrote:
>
>
>
> > Jeff H wrote:
> > > hashlib.md5 does not appear to like unicode,
> > >   UnicodeEncodeError: 'ascii' codec can't encode character u'\xa6' in
> > > position 1650: ordinal not in range(128)
>
> > It is the (default) ascii encoder that does not like non-ascii chars.
> > I suspect that is you encode to bytes first with an encoder that does
> > work (latin-???), md5 will be happy.
>
> > Reports like this should include Python version.
>
> > > After googling, I've found BDFL and others on Py3K talking about the
> > > problems of hashing non-bytes (i.e. buffers)
> > > http://www.mail-archive.com/python-3...@python.org/msg09824.html
>
> > > So what is the canonical way to hash unicode?
> > >  * convert unicode to local
> > >  * hash in current local
> > > ???
> > > but what if local has ordinals outside of 128?
>
> > > Is this just a problem for md5 hashes that I would not encounter using
> > > a different method?  i.e. Should I just use the built-in hash function?
> > > --
> > >http://mail.python.org/mailman/listinfo/python-list
>
> Python v2.52 -- however, this is not really a bug report because your
> analysis is correct. I am converting cp1252 strings to unicode before
> I persist them in a database.  I am looking for advice/direction/
> wisdom on how to sling these strings<g>
>
> -Jeff

Actually, what I am surprised by, is the fact that hashlib cares at
all about the encoding.  A md5 hash can be produced for an .iso file
which means it can handle bytes, why does it care what it is being
fed, as long as there are bytes.  I would have assumed that it would
take whatever was feed to it and view it as a byte array and then hash
it.  You can read a binary file and hash it
  print md5.new(file('foo.iso').read()).hexdigest()
What do I need to do to tell hashlib not to try and decode, just treat
the data as binary?




More information about the Python-list mailing list