[Python-Dev] Python 1.5.2 modules need porting to 2.0 because of unicode - comments please

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Tue, 19 Sep 2000 10:13:16 +0200


> The smtplib problem may be easily explained -- AFAIK, the SMTP
> protocol doesn't support Unicode, and the module isn't
> Unicode-aware, so it is probably writing garbage to the socket.

I've investigated this somewhat, and noticed the cause of the problem.
The send method of the socket passes the raw memory representation of
the Unicode object to send(2). On i386, this comes out as UTF-16LE.

It appears that this behaviour is not documented anywhere (where is
the original specification of the Unicode type, anyway).

I believe this behaviour is a bug, on the grounds of being
confusing. The same holds for writing a Unicode string to a file in
binary mode. Again, it should not write out the internal
representation. Or else, why doesn't file.write(42) work? I want that
it writes the internal representation in binary :-)

So in essence, I suggest that the Unicode object does not implement
the buffer interface. If that has any undesirable consequences (which
ones?), I suggest that 'binary write' operations (sockets, files)
explicitly check for Unicode objects, and either reject them, or
invoke the system encoding (i.e. ASCII). 

In the case of smtplib, this would do the right thing: the protocol
requires ASCII commands, so if anybody passes a Unicode string with
characters outside ASCII, you'd get an error.

Regards,
Martin