[Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.

Fri May 27 10:46:48 CEST 2011

Nick Coghlan writes:
 > On Fri, May 27, 2011 at 12:02 PM, INADA Naoki <songofacandy at gmail.com> wrote:

 > > Then, I hope bytes has a fast and efficient "format" method like:

I still don't see a use case for a fast and efficient bytes.format()
method.  The latin-1 codec is O(n) with a very small coefficient.

It seems to me this is "really" all about TOOWTDI: we'd like to be
able to interpolate data received as arguments into a data stream
using the same idiom everywhere, whether the stream consists of text,
bytes, or class Froooble instances.  (I admit I don't offhand know how
you'd spell "{0}" in a Froooble stream.)  OK, so at present only bytes
is a plausible application, but I'm willing to go there.  Then, if it
turns out that the latin-1 codec imposes too high overhead on .format()
in some application, the concerned parties can optimize it.

 > >>>> b'{0} {1}'.format(23, b'foo')  # accepts int, float, bytes, bool, None

I don't see a use case for accepting bool or None.  I hadn't thought
about float, but are you really gonna need it?  On-the-fly generation
of CSS "'{0}em'.format(0.5)" or something like that, I guess?

 > > 23 foo
 > >>>> b'{0}'.format('foo')  # raises TypeError for other types.

Philip Eby has a use case for accepting str as long as the ascii codec
in strict error mode works on the particular instances of str.
Although I'm not sure he would consider a .format() method efficient
enough, ISTR he wanted the compiler to convert literals.

 > > TypeError
 > 
 > What method is invoked to convert the numbers to text? What encoding
 > is used to convert those numbers to text? How does this operation
 > avoid also converting the *bytes* object to text and then reencoding
 > it?

OTOH, Nick, aren't you making this harder than it needs to be?  After
all,

 > Bytes are not text.

Precisely.  So bytes.format() need not handle *all* text-like
manipulations, just protocol magic that puns ASCII-encoded text.

If a bytes object is displayed sorta like text, then it *is* *all*
bytes in the ASCII repertoire (not even the right half of Latin-1 is
allowed).  In bytes.format(), bytes are bytes, they don't get encoded,
they just get interpolated into the bytes object being created.  For
other stuff, especially integers, if there is a conventional
represention for it in ASCII, it *might* be an appropriate conversion
for bytes.format() (but see above for my reservations about several
common Python types).

str (Unicode) might be converted via the ascii codec in strict errors
mode, although the purist in me really would rather not go there.

AFAICS, this handles all use cases presented so far.

 > The pedagogic cost of making it even harder than it already is to
 > convince people that bytes are not text would also need to be
 > considered.

This bothers me quite a bit, but my sense is that practicality is
going to beat purity (into a bloody pulp :-P) once again.