[Python-Dev] bytes.from_hex()

Stephen J. Turnbull stephen at xemacs.org
Sat Feb 25 19:05:38 CET 2006


>>>>> "Greg" == Greg Ewing <greg.ewing at canterbury.ac.nz> writes:

    Greg> Stephen J. Turnbull wrote:

    >> the kind of "text" for which Unicode was designed is normally
    >> produced and consumed by people, who wll pt up w/ ll knds f
    >> nnsns.  Base64 decoders will not put up with the same kinds of
    >> nonsense that people will.

    Greg> The Python compiler won't put up with that sort of nonsense
    Greg> either. Would you consider that makes Python source code
    Greg> binary data rather than text, and that it's inappropriate to
    Greg> represent it using a unicode string?

The reason that Python source code is text is that the primary
producers/consumers of Python source code are human beings, not
compilers.

There are no such human producers/consumers of base64.  Unless you
prefer that I expressed that last sentence as "VGhlIHJlYXNvbiB0aG
F0IFB5dGhvbiBzb3VyY2UgY29kZSBpcyB0ZXh0IGlzIGJlY2F1c2UgdGhlIHByaW1
hcnkKcHJvZHVjZXJzL2NvbnN1bWVycyBvZiBQeXRob24gc291cmNlIGNvZGUgYXJl
IGh1bWFuIGJlaW5ncywgbm90CmNvbXBpbGVycy4="?

    >> You're basically assuming that the person who implements the
    >> code that processes a Unicode string is the same person who
    >> implemented the code that converts a binary object into base64
    >> and inserts it into a string.

    Greg> No, I'm assuming the user of base64 knows the
    Greg> characteristics of the channel he's using.

Yes, which implies that you assume he has control of the data all the
way to the channel that actually requires base64.

Use case: the Gnus MUA supports the RFC that allows non-ASCII names in
MIME headers that take file names.  The interface was written for
message-at-a-time use, which makes sense for composition.  Somebody
else added "save and strip part" editing capability, but this only
works one MIME part at a time.  So if you have a message with four
MIME parts and you save and strip all of them, the first one gets
encoded four times.

The reason for *this* bug, and scores like it over the years, is that
somebody made it convenient to put wire protocols into a text
document.  Shouldn't Python do better than that?  Shouldn't Python
text be for humans, rather than be whatever had the tag "character"
attached to it for convenience of definition of a protocol for
communication of data humans can't process without mechanical
assistance?

    >> I don't think it's a good idea to gratuitously introduce wire
    >> protocols as unicode codecs,

    Greg> I am *not* saying that base64 is a unicode codec!  If that's
    Greg> what you thought I was saying, it's no wonder we're
    Greg> confusing each other.

I know you don't think that it's a duck, but it waddles and quacks.
Ie, the question is not what I think you're saying.  It's "what is the
Python compiler/interpreter going to think?"  AFAICS, it's going to
think that base64 is a unicode codec.

    Greg> The only time I need to use something like base64 is when I
    Greg> have something that will only accept text. In Py3k, "accepts
    Greg> text" is going to mean "takes a character string as input",

Characters are inherently abstract, as a class they can't be
instantiated as input or output---only derived (ie, encoded)
characters can.  I don't believe that "takes a character string as
input" has any intrinsic meaning.

    Greg> Does that make it clearer what I'm getting at?

No.<wink>  I already understood what you're getting at.  As I said, I'm
sympathetic in principle.  In practice, I think it's a loaded gun
aimed at my foot.  And yours.


-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.


More information about the Python-Dev mailing list