[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

Wed Feb 15 06:11:49 CET 2006

On 2/14/06, Guido van Rossum <guido at python.org> wrote:
> On 2/13/06, Adam Olsen <rhamph at gmail.com> wrote:
> > If I understand correctly there's three main candidates:
> > 1. Direct copying to str in 2.x, pretending it's latin-1 in unicode in 3.x
>
> I'm not sure what you mean, but I'm guessing you're thinking that the
> repr() of a bytes object created from bytes('abc\xf0') would be
>
>   bytes('abc\xf0')
>
> under this rule. What's so bad about that?

See below.

> > 2. Direct copying to str/unicode if it's only ascii values, switching
> > to a list of hex literals if there's any non-ascii values
>
> That works for me too. But why hex literals? As MvL stated, a list of
> decimals would be just as useful.

PEBKAC.  Yeah, decimals are simpler and shorter even.

> > 3. b"foo" literal with ascii for all ascii characters (other than \
> > and "), \xFF for individual characters that aren't ascii
> >
> > Given the choice I prefer the third option, with the second option as
> > my runner up.  The first option just screams "silent errors" to me.
>
> The 3rd is out of the running for many reasons.
>
> I'm not sure I understand your "silent errors" fear; can you elaborate?

I think it's that someone will create a unicode object with real
latin-1 characters and it'll get passed through without errors, the
code assuming it's 8bit-as-latin-1.  If they had put other unicode
characters in they would have gotten an exception instead.

However, at this point all the posts on latin-1 encoding/decoding have
become so muddled in my mind that I don't know what they're
suggesting.  I think I'll wait for the pep to clear that up.

--
Adam Olsen, aka Rhamphoryncus