sockets and encryption

Martin v. Loewis martin at v.loewis.de
Mon Nov 25 05:25:01 EST 2002


Paul Nilsson <p.nilsson at xtra.co.nz> writes:

> Since HTML is ascii and XML is unicode I thought this may put
> some limitations on what raw data could be sent. 

This is a misconception. Neither is HTML ascii, nor is XML unicode.
When transmitted over the wire, both are byte strings.

By "XML is Unicode", people usually mean that an XML document is
*conceptually* a sequence of Unicode characters. The same is true for
HTML (formally atleast since HTML 3.0 or so; conceptually, you can
apply this view for all HTML versions).

When represented in a byte-oriented medium, an encoding has to be
applied to the sequence of Unicode characters. Both HTML and XML allow
usage of arbitrary encodings: "ASCII", "iso-8859-1", "utf-8", you name
them, we have them.

> I had suspected that SSL incorporated a unicode layer which could
> cause problems if I wanted to send raw bytes (or I would have to
> converrt them to CDATA).

Another misconception: In a CDATA section, you cannot put arbitrary
bytes. You must put characters there, according to the declared
document encoding.

Regards,
Martin



More information about the Python-list mailing list