[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

Mon Feb 13 23:15:05 CET 2006

At 10:55 PM 2/13/2006 +0100, M.-A. Lemburg wrote:
>Guido van Rossum wrote:
> > On 2/13/06, Phillip J. Eby <pje at telecommunity.com> wrote:
> >> At 09:55 AM 2/13/2006 -0800, Guido van Rossum wrote:
> >>> One recommendation: for starters, I'd much rather see the bytes type
> >>> standardized without a literal notation. There should be are lots of
> >>> ways to create bytes objects from string objects, with specific
> >>> explicit encodings, and those should suffice, at least initially.
> >>>
> >>> I also wonder if having a b"..." literal would just add more confusion
> >>> -- bytes are not characters, but b"..." makes it appear as if they
> >>> are.
> >> Why not just have the constructor be:
> >>
> >>      bytes(initializer [,encoding])
> >>
> >> Where initializer must be either an iterable of suitable integers, or a
> >> unicode/string object.  If the latter (i.e., it's a basestring), the
> >> encoding argument would then be required.  Then, there's no need for
> >> special codec support for the bytes type, since you call bytes on the 
> thing
> >> to be encoded.  And of course, no need for a 'b' literal.
> >
> > It'd be cruel and unusual punishment though to have to write
> >
> >   bytes("abc", "Latin-1")
> >
> > I propose that the default encoding (for basestring instances) ought
> > to be "ascii" just like everywhere else. (Meaning, it should really be
> > the system default encoding, which defaults to "ascii" and is
> > intentionally hard to change.)
>
>We're talking about Py3k here: "abc" will be a Unicode string,
>so why restrict the conversion to 7 bits when you can have 8 bits
>without any conversion problems ?

Actually, I thought we were talking about adding bytes() in 2.5.

However, now that you've brought this up, it actually makes perfect sense 
to just use latin-1 as the effective encoding for both strings and 
unicode.  In Python 2.x, strings are byte strings by definition, so it's 
only in 3.0 that an encoding would be required.  And again, latin1 is a 
reasonable, roundtrippable default encoding.

So, it sounds like making the encoding default to latin-1 would be a 
reasonably safe approach in both 2.x and 3.x.

>While we're at it: I'd suggest that we remove the auto-conversion
>from bytes to Unicode in Py3k and the default encoding along with
>it. In Py3k the standard lib will have to be Unicode compatible
>anyway and string parser markers like "s#" will have to go away
>as well, so there's not much need for this anymore.

I thought all this was already in the plan for 3.0, but maybe I assume too 
much.  :)