[Python-Dev] PEP 3101 implementation vs. documentation

Ben Wolfson wolfson at gmail.com
Tue Jun 14 03:46:51 CEST 2011


On Mon, Jun 13, 2011 at 5:36 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Ben Wolfson wrote:
>
>> If by "item selector" you mean (using the names from the grammar in
>> the docs) the element_index, I don't see why this should be the case;
>> dictionaries can contain non-identified keys, after all.
>
> Of course they can, but that's not the point. The point is
> that putting arbitrary strings between [...] in a format
> spec without any form of quoting or requirement for bracket
> matching leads to something that's too confusing for humans
> to read.

But there is a requirement for bracket matching: the "[" that opens
the element_index is matched by the next "]". Arguably (as Terry Reedy
said) this is also a form of quoting, in which the square brackets are
the quotation operators. It seems no more confusing to me than
allowing arbitrary strings between in '"..."'; those quotation marks
aren't even oriented. (Admittedly, syntax highlighting helps there.)

Compared to this: "{0: ^+#10o}", a string like this: "this is normal
text, but {e.underline[this text is is udnerlined {sic}!]}---and we're
back to normal now" is pretty damn readable to this human, nor do I
see what about the rule "when you see a [, keep going until you see a
]" is supposed to be insuperably confusing. (Compare---not that it
helps my case in regard to readability---grouping in regular
expressions, where you don't usually have the aid of special syntax
highlighting inside the string; you see a '(', you know that you've
encountered a group which continues until the next (unescaped!) ')'.
The stuff that comes inside the parentheses might look like line
noise---and the whole thing might look like line noise---but *that*
rule about the structure of a regexp is pretty straightforward.)

> IMO the spec should be designed so that the format string
> can be parsed using the same lexical analysis rules as
> Python code. That means anything that is meant to "hang
> together" as a single unit, such as an item selector,
> needs to look like a single Python token, e.g. an integer
> or identifier.

If that's the rationale, why not change the spec so that instead of this:

"{0[spam]}"

You do this:

"{0['spam']}"

? Hangs together; single Python token. Bonus: it would make it
possible for this to work:

(a) "{0['12']}".format({'12': 4})

whereas currently this:

"{0[12]}".format(...)

passes the integer 12 to __getitem__, and (a) passes the string "'12'".

(Discovery: the "abuse" of the format mechanism I want to perpetrate
via element_index can also be perpetrated with a custom __format__
method:

>>> class foo:
...        def __format__(self, a): return a * 2
...
>>> "hello {0::![} world".format(foo())
'hello :![:![ world'

. So any reform to make it impossible to use str.format creatively
will have to be fairly radical. I actually think that my intended
abuse is actually a perfectly reasonable use, but it would be
disallowed if only integers and identifiers can be in the
element_index field.)

-- 
Ben Wolfson
"Human kind has used its intelligence to vary the flavour of drinks,
which may be sweet, aromatic, fermented or spirit-based. ... Family
and social life also offer numerous other occasions to consume drinks
for pleasure." [Larousse, "Drink" entry]


More information about the Python-Dev mailing list