[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

Adam Olsen rhamph at gmail.com
Tue Feb 14 08:04:32 CET 2006


On 2/13/06, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> M.-A. Lemburg wrote:
> > We're talking about Py3k here: "abc" will be a Unicode string,
> > so why restrict the conversion to 7 bits when you can have 8 bits
> > without any conversion problems ?
>
> YAGNI. If you have a need for byte string in source code, it will
> typically be "random" bytes, which can be nicely used through
>
>   bytes([0x73, 0x9f, 0x44, 0xd2, 0xfb, 0x49, 0xa3, 0x14,  0x8b, 0xee])
>
> For larger blocks, people should use base64.string_to_bytes (which
> can become a synonym for base64.decodestring in Py3k).
>
> If you have bytes that are meaningful text for some application
> (say, a wire protocol), it is typically ASCII-Text. No protocol
> I know of uses non-ASCII characters for protocol information.

What would that imply for repr()?  To support eval(repr(x)) it would
have to produce whatever format the source code includes to begin
with.

If I understand correctly there's three main candidates:
1. Direct copying to str in 2.x, pretending it's latin-1 in unicode in 3.x
2. Direct copying to str/unicode if it's only ascii values, switching
to a list of hex literals if there's any non-ascii values
3. b"foo" literal with ascii for all ascii characters (other than \
and "), \xFF for individual characters that aren't ascii

Given the choice I prefer the third option, with the second option as
my runner up.  The first option just screams "silent errors" to me.


--
Adam Olsen, aka Rhamphoryncus


More information about the Python-Dev mailing list