How to display unicode with the CGI module?

greg greg at cosc.canterbury.ac.nz
Mon Nov 26 03:21:20 EST 2007


coldpizza wrote:
> I am always confused as to which one to use: encode() or decode();

In unicode land, an "encoding" is a method of representing
unicode data in an external format. So you encode unicode
data in order to send it into the outside world, and you
decode it in order to turn it back into unicode data.

It'll be easier to get right in py3k, because bytes will only have
a decode() method and str will only have an encode() method.

> It is funny that encode() and decode() omit the name of the other
> encoding (Unicode ucs2?),

Unicode objects don't *have* an encoding. UCS2 is not an encoding,
it's an internal storage format. You're not supposed to need to know
or care about it, and it could be different between different
Python builds.

> Another wierd thing is that by default Python converts internal
> Unicode to ascii.

It's the safest assumption. Python is refusing the temptation
to guess the encoding of anything outside the range 0-127 if you
don't tell it.

--
Greg



More information about the Python-list mailing list