[Python-Dev] PEP 460 reboot

Mon Jan 13 10:49:01 CET 2014

On 13/01/14 09:19, Glenn Linderman wrote:
> On 1/13/2014 12:46 AM, Mark Shannon wrote:
>> On 13/01/14 03:47, Guido van Rossum wrote:
>>> On Sun, Jan 12, 2014 at 6:24 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
>>>> On 01/12/2014 06:16 PM, Ethan Furman wrote:
>>>>>
>>>>>
>>>>> If you do :
>>>>>
>>>>> --> b'%s' % 'some text'
>>>>
>>>>
>>>> Ignore what I previously said.  With no encoding the result would be:
>>>>
>>>> b"'some text'"
>>>>
>>>> So an encoding should definitely be specified.
>>>
>>> Yes, but the encoding is no business of %s or %. As far as the
>>> formatting operation cares, if the argument is bytes they will be
>>> copied literally, and if the argument is a str (or anything else) it
>>> will call ascii() on it.
>>
>> It seems to me that what people want from '%s' is:
>> Convert to a str then encode as ascii for non-bytes
>> or copy directly for bytes.
>
> Maybe. But it only takes a small tweak to the parameter to get what they want... a tweak that works in both Python 2.7 and Python 3.whatever-version-gets-this.
>
> Instead of
>
> b"%s" % foo
>
> they must use
>
> b"%s"  % foo.encode( explicitEncoding )
>
> which is what they should have been doing in Python 2.7 all along, and if they were, they need make no change.
>
> Oh, foo was a Python 2.7 str? Converted to Python 3.x str, by default conversion rules? Already in ASCII? No harm.
> Oh, foo was a literal? Add b prefix, instead of the .encode("ASCII"), if you prefer.
>
>> So why not replace '%s' with '%a' for the ascii case and
>> with '%b' for directly inserting bytes.
>
> Because %a and %b don't exist in Python 2.7?

I thought this was about 3.5, not 2.7 ;)
'%s' can't work in 3.5, as we must differentiate between
strings which meed to be encoded and bytes which don't.

>
>> That way, the encoding is explicit.
>
> The encoding is already explicit.  If it is bytes encoded from str, that transformation had an explicit encoding.  If it is "%s" % str(...), then there is no encoding, but rather a transformation into
> an ASCII representation of the Unicode code points, using escape sequences. Which isn't likely to be what they want, but see the parameter tweak above.
>
>> I think it is vital that the encoding is explicit in all cases where
>> bytes <-> str conversion occurs.
>
> Since it is explicit, you have no concerns in this area.
>
>
> Regarding the concern about implicit use of ASCII by certain bytes methods and proposed interpolations, I'm curious how many standard encodings exist that do not have an ASCII subset. I can enumerate
> a starting list, but if there are others in actual use, I'm unaware of them.
>
> EBCDIC
> UTF-16 BE & LE
> UTF-32 BE & LE
>
> Wikipedia: The vast majority of code pages in current use are supersets of ASCII <http://en.wikipedia.org/wiki/ASCII>, a 7-bit code representing 128 control codes and printable characters.
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/mark%40hotpy.org
>