[Python-3000] Does bytes() need to support bytes(<str>, <encoding>)?

Tue Aug 28 05:20:15 CEST 2007

On 8/27/07, Barry Warsaw <barry at python.org> wrote:
> On Aug 27, 2007, at 4:38 PM, Guido van Rossum wrote:
>
> > I'm still working on stricter enforcement of the "don't mix str and
> > bytes" rule. I'm finding a lot of trivial problems, which are
> > relatively easy to fix but time-consuming.
> >
> > While doing this, I realize there are two idioms for converting a str
> > to bytes: s.encode(e) or bytes(s, e). These have identical results. I
> > think we can't really drop s.encode(), for symmetry with b.decode().
> > So is bytes(s, e) redundant?
>
> I think it might be.  I've hit this several time while working on the
> email package and it's certainly confusing.  I've also run into
> situations where I did not like the default e=utf-8 argument for bytes
> ().  Sometimes I am able to work around failures by doing this: "bytes
> (ord(c) for c in s)" until I found "bytes(s, 'raw-unicode-escape')"
>
> I'm probably doing something really dumb to need that, but it does
> get me farther along.  I do intend to go back and look at those
> (there are only a few) when I get the rest of the package working again.
>
> Getting back to the original question, I'd like to see "bytes(s, e)"
> dropped in favor of "s.encode(e)" and maayyybeee (he says bracing for
> the shout down) "bytes(s)" to be defined as "bytes(s, 'raw-unicode-
> escape')".

I see a consensus developing for dropping bytes(s, e). Start avoiding
it like the plague now to help reduce the work needed once it's
actually gone.

But I don't see the point of defaulting to raw-unicode-escape --
what's the use case for that? I think you should just explicitly say
s.encode('raw-unicode-escape') where you need that. Any reason you
can't?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)