[Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.

Fri May 27 12:21:50 EDT 2011

On 27/05/2011 10:27, Nick Coghlan wrote:
> On Fri, May 27, 2011 at 6:46 PM, Stephen J. Turnbull<stephen at xemacs.org>  wrote:
>>   >  What method is invoked to convert the numbers to text? What encoding
>>   >  is used to convert those numbers to text? How does this operation
>>   >  avoid also converting the *bytes* object to text and then reencoding
>>   >  it?
>>
>> OTOH, Nick, aren't you making this harder than it needs to be?  After
>> all,
>
> To me, the defining feature of str.format() over str.__mod__() is the
> ability for types to provide their own __format__ methods, rather than
> being limited to a predefined set of types known to the interpreter.
> If bytes were to reuse the same name, then I'd want to see similar
> flexibility.
>
> Now, a *different* bytes method (bytes.interpolate, perhaps?), limited
> to specific types may make sense, but such an alternative *shouldn't*
> be conflated with the text formatting API.
>
> However, proponents of such an addition need to clearly articulate
> their use cases and proposed solution in a PEP to make it clear that
> they aren't merely trying to perpetuate the bytes/text confusion that
> plagues 2.x 8-bit strings.
>
> We can almost certainly do better when it comes to constructing byte
> sequences from component parts, but simply saying "oh, just add a
> format() method to bytes objects" doesn't cut it, since the associated
> magic methods for str.format are all string based, and bytes
> interpolation also needs to address encoding issues for anything that
> isn't already a byte sequence.
>
I would've thought that a "format" (or equivalent) method for bytes
would work like struct.pack, so b"{0}".format(23) wouldn't return
b'23', but you could have:

 >>> b'{0:b}'.format(23)
b'\x17'