[Python-Dev] accept string in a2b and base64?

"Martin v. Löwis" martin at v.loewis.de
Wed Feb 22 10:35:01 CET 2012


> It seems to me that part of the point of the byte/string split (and the
> lack of automatic coercion) is to make the programmer be explicit about
> converting between unicode and bytes.  Having these functions, which
> convert between binary formats (ASCII-only representations of binary data
> and back) accept unicode strings is reintroducing automatic coercions,
> and I think it will lead to the same kind of bugs that automatic string
> coercions yielded in Python2: a program works fine until the input
> turns out to have non-ASCII data in it, and then it blows up with an
> unexpected UnicodeError. 

I agree with the change in principle, but I also agree in the choice of
error with you:

py> binascii.a2b_hex("MURRAY")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
binascii.Error: Non-hexadecimal digit found

py> binascii.a2b_hex("VLÖWIS")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: string argument should contain only ASCII characters

I think it should give binascii.Error in both cases: Ö is as much
a non-hexadecimal digit as M.

With that changed, I'd have no issues with the patch: these functions
are already fairly strict in their input, whether it's bytes or Unicode.
So the chances that non-ASCII characters get it to fall over in a way
that never causes problems in pure-ASCII communities are very low.

> If most people agree with Antoine I won't fight it, but it seems to me
> that accepting unicode in the binascii and base64 APIs is a bad idea.

No - it's only the choice of error that is a bad idea.

Regards,
Martin


More information about the Python-Dev mailing list