[Python-Dev] Parsing f-strings from PEP 498 -- Literal String Interpolation

Nick Coghlan ncoghlan at gmail.com
Sat Nov 5 08:36:06 EDT 2016


On 5 November 2016 at 04:03, Fabio Zadrozny <fabiofz at gmail.com> wrote:
> On Fri, Nov 4, 2016 at 3:15 PM, Eric V. Smith <eric at trueblade.com> wrote:
>> Using PyParser_ASTFromString is the easiest possible way to do this. Given
>> a string, it returns an AST node. What could be simpler?
>
>
> I think that for implementation purposes, given the python infrastructure,
> it's fine, but for specification purposes, probably incorrect... As I don't
> think f-strings should accept:
>
>  f"start {import sys; sys.version_info[0];} end" (i.e.:
> PyParser_ASTFromString doesn't just return an expression, it accepts any
> valid Python code, even code which can't be used in an f-string).

f-strings use the "eval" parsing mode, which starts from the
"eval_input" node in the grammar (which is only a couple of nodes
higher than 'test', allowing tuples via 'testlist' as well as trailing
newlines and EOF):

    >>> ast.parse("import sys; sys.version_info[0];", mode="eval")
    Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/usr/lib64/python3.5/ast.py", line 35, in parse
       return compile(source, filename, mode, PyCF_ONLY_AST)
     File "<example>", line 1
       import sys; sys.version_info[0];
            ^
    SyntaxError: invalid syntax

You have to use "exec" mode to get the parser to allow statements,
which is why f-strings don't do that:

    >>> ast.dump(ast.parse("import sys; sys.version_info[0];", mode="exec"))
    "Module(body=[Import(names=[alias(name='sys', asname=None)]),
Expr(value=Subscript(value=Attribute(value=Name(id='sys', ctx=Load()),
attr='version_info', ctx=Load()), slice=Index(value=Num(n=0)),
ctx=Load()))])"

The unique aspect for f-strings that means they don't permit some
otherwise valid Python expressions is that it also does the initial
pre-tokenisation based on:

1. Look for an opening '{'
2. Look for a closing '!', ':' or '}'  accounting for balanced string
quotes, parentheses, brackets and braces

Ignoring the surrounding quotes, and using the `atom` node from
Python's grammar to represent the nesting tracking, and TEXT to stand
in for arbitrary text, it's something akin to:

    fstring: (TEXT ['{' maybe_pyexpr ('!' | ':' | '}')])+
    maybe_pyexpr: (atom | TEXT)+

That isn't quite right, since it doesn't properly account for brace
nesting, but it gives the general idea - there's an initial really
simple tokenising pass that picks out the potential Python
expressions, and then those are run through the AST parser's
equivalent of eval().

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list