[I18n-sig] Strawman Proposal: Binary Strings

Toby Dickenson tdickenson@geminidataloggers.com
Fri, 09 Feb 2001 09:46:12 +0000


On Thu, 08 Feb 2001 12:24:49 -0800, Paul Prescod
<paulp@ActiveState.com> wrote:

>> What if string.encode() returned a binary string.... would we need a
>> 'binary()' builtin at all?
>
>I guess not. But the encode method might already be in use. If we
>combine your restrictive coercion suggestion with this suggestion we
>might break some (admittedly newish) code. How about
>"str.binencode(encoding)".
>
>Also, it isn't entirely unbelievable that someone might want to encode
>from a string to a string. e.g. base64 (do we call that an encoding??)
>So having an binencode() seperate from encode() might be a good idea.
>Alternate names are "binary", "asbinary", "tobinary", "getbinary" and
>any underscore-separated variant.

Yes, the type of value returned from string.encode(x) depends on x. I
intended to suggest that string.encode('latin1') would be the best way
to convert from string to binary. However, I now see that wont work
for plain strings: their .encode() method always goes via unicode,
using the default encoding.

So: Im happy with you .binary() method on strings. Add it bstrings too
(as a 'return self'), but not unicode strings.

>> I agree any explicit coecion should follow the same rules as Unicode.
>> Im not sure we agree on whether that coercion happens automatically
>> and implicitly, as it does with Unicode strings; I feel fairly
>> strongly that it shouldnt. (Ill justify that tomorrow if we do
>> disagree).
>
>If we were inventing something from whole cloth I would agree with you.
>But I want people to quickly port their string-using applications over
>to binary-strings and if we require a bunch more explicit conversions
>then they will move more slowly.
>
>Nevertheless, I'm not willing to fight about the issue. There are two
>votes against coercion already and if the response is similarly
>anti-coercion then I'll agree.

Waaaaaah. There are some backward-compatability issues that complicate
my comparison proposal.....

Consider some old code that

print md5('some stuff').digest() =3D=3D 'reference'

We want this to do the right thing after:
* changing .digest() to return a string
* changing 'reference' to b'reference'
* changing both

Therefore we have to allow string/bstring comparisons. However,
raising an exception on unicode/bstring comparison still makes sense.


Toby Dickenson
tdickenson@geminidataloggers.com