[Python-Dev] Python 1.5.2 modules need porting to 2.0 because of unicode - comments please

Guido van Rossum guido@beopen.com
Tue, 19 Sep 2000 17:00:34 -0500


> > I doubt that we can fix all Unicode related bugs in the 2.0
> > stdlib before the final release... let's make this a project 
> > for 2.1.
> 
> Exactly my feelings. Since we cannot possibly fix all problems, we may
> need to change the behaviour later.
> 
> If we now silently do the wrong thing, silently changing it to the
> then-right thing in 2.1 may break peoples code. So I'm asking that
> cases where it does not clearly do the right thing produces an
> exception now; we can later fix it to accept more cases, should need
> occur.
> 
> In the specific case, dropping support for Unicode output in binary
> files is the right thing. We don't know what the user expects, so it
> is better to produce an exception than to silently put incorrect bytes
> into the stream - that is a bug that we still can fix.
> 
> The easiest way with the clearest impact is to drop the buffer
> interface in unicode objects. Alternatively, not supporting them in
> for s# also appears reasonable. Users experiencing the problem in
> testing will then need to make an explicit decision how they want to
> encode the Unicode objects.
> 
> If any expedition of the issue is necessary, I can submit a bug report,
> and propose a patch.

Sounds reasonable to me (but I haven't thought of all the issues).

For writing binary Unicode strings, one can use

  f.write(u.encode("utf-16"))		# Adds byte order mark
  f.write(u.encode("utf-16-be"))	# Big-endian
  f.write(u.encode("utf-16-le"))	# Little-endian

--Guido van Rossum (home page: http://www.pythonlabs.com/~guido/)