[I18n-sig] Pre-PEP: Proposed Python Character Model

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Fri, 9 Feb 2001 09:24:08 +0100


> So for example:
> 
> sys_version = "Python/" + string.split(sys.version)[0]
> 
> Nobody would ever expect sys_version to have anything other than Unicode
> characters in it. 

My point is that sys_version is used in

        self.send_header('Server', self.version_string())

That is, it is sent following a specific transfer syntax of the
underlying protocol (HTTP), and that transfer syntax is defined in
terms of byte sequences. There is a constraint in the protocol that
most of the bytes must be restricted to the printable characters of
ASCII, though.

Suppose we raise exceptions at some time if something other than bytes
are written into a byte stream which has no associated encoding. Then,
I suspect, that fragment should rewritten as

sys_version = b"Python/" + string.split(sys.version)[0].encode("ASCII")

The Server: header that we send will be a byte sequence, not a text
message.

> According to your definition, an XML document comprising a SOAP message
> is "binary" rather than "text" despite what the XML specification says.
> After all, what could be more "protocol" than SOAP.

It depends. If it goes through an encoding before being transmitted,
then it should be represented as a character string.

If it is written to a socket directly, e.g. with

msg = "<soap:body>some SOAP specific elements I don't know</soap:body>"
s.write(msg)

Then certainly, yes, that document is represented in a binary
string. Please note that some XML document can be represented in many
ways: character strings, binary strings, DOM trees, SAX event
sequences, etc. 

The "XML document comprising a SOAP message", in itself, has no
inherent representation; whether a specific representation ought to be
treated as text or binary primarily depends on whether there is
encoding or not.

Regards,
Martin