[Python-Dev] PEP 3101 implementation vs. documentation

Sat Jun 11 12:32:13 CEST 2011

Nick Coghlan wrote:
[snip]
> The rules for name fields would then become:
> 
> 1. Numeric fields start with a digit and are terminated by any
> non-numeric character.
> 
> 2. An identifier name field is terminated by any one of:
>     '}' (terminates the replacement field)
>     '!' (terminates identifier field, starts conversion specifier)
>     ':' (terminates identifier field, starts format specifier)
>     '.' (terminates identifier field, starts new identifier field for
> subattribute)
>     '[' (terminates identifier field, starts index field)
> 
> 3. An index field is terminated by ']' (subsequent character will
> determine next field)

+1

> That second set of rules is *far* more in line with the behaviour of
> the rest of the language than the status quo, so unless the difficulty
> of making the str.format mini-language parser work that way is truly
> prohibitive, it certainly seems worthwhile to tidy up the semantics.
> 
> The index field behaviour should definitely be fixed, as it poses no
> backwards compatibility concerns. The brace matching behaviour should
> probably be left alone, as changing it would potentially break
> currently valid format strings (e.g. "{a{0}}".format(**{'a{0}':1})
> produces '1' now, but would raise an exception if the brace matching
> rules were changed).

-1 for leaving the brace matching behavior alone, as it's very
unintuitive for *the user*. For the implementor it may make sense to
count matching braces, but definitely not for the user. I don't
believe that "{a{0}}" is a real use case that someone might already
use, as it's a hard violation of what the documentation currently
says.

I'd rather disallow braces in the replacement field before the format
specifier altogether. Or closing braces at the minimum. Furthermore,
the double-escaping sounds reasonable in the format specifier, but not
elsewhere.

My motivation is that the user should be able to have a quick glance
on the format string and see where the replacement fields are. This is
probably what the PEP intends to say when disallowing braces inside
the replacement field. In my opinion, it's easy to write the parser in
a way that braces are parsed in any imaginable manner. Or maybe not
easy, but not harder than any other way of handling braces.

> So +1 on making the str.format parser accept anything other than ']'
> inside an index field and turn the whole thing into an ordinary
> string, -1 on making any other changes to the brace-matching
> behaviour.
> 
> That would leave us with the following set of rules for name fields:
> 
> 1. Numeric fields start with a digit and are terminated by any
> non-numeric character.
> 
> 2. An identifier name field is terminated by any one of:
>     '}' (terminates the replacement field, unless preceded by a
> matching '{' character, in which case it is ignored and included in
> the string)
>     '!' (terminates identifier field, starts conversion specifier)
>     ':' (terminates identifier field, starts format specifier)
>     '.' (terminates identifier field, starts new identifier field for
> subattribute)
>     '[' (terminates identifier field, starts index field)
> 
> 3. An index field is terminated by ']' (subsequent character will
> determine next field)
> 
> Note that brace-escaping currently doesn't work inside name fields, so
> that should also be fixed:
> 
> >>> "{0[{{]}".format({'{':1})
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ValueError: unmatched '{' in format
> >>> "{a{{}".format(**{'a{':1})
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ValueError: unmatched '{' in format

-1. Why do we need braces inside replacement fields at all (except for
inner replacements in the format specier)? I strongly believe that the
PEP's use case is the simple one:

    '{foo}'.format(foo=10)

In my opinoin, these '{!#%}'.format(**{'!#%': 10}) cases are not real.
The current documentation requires field_name to be a valid
identifier, an this is a sane requirement. The only problem is that
parsing identifiers correctly is very hard, so it can be made simpler
by allowing some non-identifiers. But we still don't have to accept
braces.

---

As a somewhat another issue, I'm confused about this:

  >>> '{a[1][2]}'.format(a={1:{2:3}})
  '3'

and even more about this:

  >>> '{a[1].foo[2]}'.format(a={1:namedtuple('x', 'foo')({2:3})})
  '3'

Why does this work? It's against the current documentation. The
documented syntax only allows zero or one attribute names and zero or
one element index, in this order. Is it intentional that we allow
arbitrary chains of getattr and __getitem__? If we do, this should be
documented, too.

Petri