unicode and hashlib

Fri Nov 28 15:23:11 EST 2008

On 28 Nov, 21:03, Terry Reedy <tjre... at udel.edu> wrote:
>
> It is the (default) ascii encoder that does not like non-ascii chars.
> I suspect that is you encode to bytes first with an encoder that does
> work (latin-???), md5 will be happy.

I know that the "Python roadmap" answer to such questions might refer
to Python 3.0 and its "strings are Unicode" features, and having seen
this mentioned a lot recently, I'm surprised that no-one has done so
at the time of writing, but I do wonder whether good old Python 2.x
wouldn't benefit from a more explicit error message in these
situations.

Since the introduction of Unicode in Python 1.6/2.0, I've always tried
to make the distinction between what I call "plain strings" or "byte
strings" and "Unicode objects" or "character strings", and perhaps the
UnicodeEncodeError message should be enhanced to say what is actually
going on: that an attempt is being made to convert characters into
byte values and that the chosen way of doing so (which often involves
the default, ASCII encoding) cannot manage the job.

Paul