[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

Tue Feb 14 00:15:23 CET 2006

On 2/13/06, Phillip J. Eby <pje at telecommunity.com> wrote:
> Actually, I thought we were talking about adding bytes() in 2.5.

I was.

> However, now that you've brought this up, it actually makes perfect sense
> to just use latin-1 as the effective encoding for both strings and
> unicode.  In Python 2.x, strings are byte strings by definition, so it's
> only in 3.0 that an encoding would be required.  And again, latin1 is a
> reasonable, roundtrippable default encoding.
>
> So, it sounds like making the encoding default to latin-1 would be a
> reasonably safe approach in both 2.x and 3.x.

I disagree. IMO the same reasons why we don't do this now for the
conversion between str and unicode stands for bytes.

> >While we're at it: I'd suggest that we remove the auto-conversion
> >from bytes to Unicode in Py3k and the default encoding along with
> >it. In Py3k the standard lib will have to be Unicode compatible
> >anyway and string parser markers like "s#" will have to go away
> >as well, so there's not much need for this anymore.

I don't know yet what the C API will look like in 3.0. But it may well
have to support auto-conversion from Unicode to char* using some
system default encoding (e.g. the Windows default code page?) in order
to be able to conveniently wrap OS APIs that use char* instead of some
sort of Unicode (and each OS has its own way of interpreting char* as
Unicode -- I believe Apple uses UTF-8?).

> I thought all this was already in the plan for 3.0, but maybe I assume too
> much.  :)

In Py3k, I can see two reasonable approaches to conversion between
strings (Unicode) and bytes: always require an explicit encoding, or
assume ASCII. Anything else is asking for trouble IMO.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)