[Python-Dev] PEP 498 (interpolated f-string) tweak

Nick Coghlan ncoghlan at gmail.com
Mon Sep 21 10:31:24 CEST 2015


On 21 September 2015 at 05:22, Eric V. Smith <eric at trueblade.com> wrote:
>
>> On Sep 20, 2015, at 11:15 AM, Serhiy Storchaka <storchaka at gmail.com> wrote:
>>
>>> On 20.09.15 16:51, Eric V. Smith wrote:
>>>> On 9/20/2015 8:37 AM, Nick Coghlan wrote:
>>>>> On 19 September 2015 at 21:03, Eric V. Smith <eric at trueblade.com> wrote:
>>>>> Instead of calling __format__, I've changed the code generator to call
>>>>> format(expr1, spec1). As an optimization, I might add special opcodes to
>>>>> deal with this and string concatenation, but that's for another day (if
>>>>> ever).
>>>>
>>>> Does this mean overriding format at the module level or in builtins
>>>> will affect the way f-strings are evaluated at runtime? (I don't have
>>>> a strong preference one way or the other, but I think the PEP should
>>>> be explicit as to the expected behaviour rather than leaving it as
>>>> implementation defined).
>>>
>>> Yes, in the current implementation, if you mess with format(), str(),
>>> repr(), or ascii() you can break f-strings. The latter 3 are used to
>>> implement !s, !r, and !a.
>>>
>>> I have a plan to change this, by adding one or more opcodes to implement
>>> the formatting and string joining. I'll defer a decision on updating the
>>> PEP until I can establish the feasibility (and desirability) of that
>>> approach.
>>
>> I propose to add internal builting formatter type. Instances should be marshallable and callable. The code generated for f-strings should just load formatter constant and call it with arguments. The formatter builds resulting string by concatenating literal strings and results of formatting arguments with specified specifications.
>>
>> Later we could change compiler (just peephole optimizer?) to replace literal_string.format(*args) and literal_string % args with calling precompiled formatter.
>>
>> Later we could rewrite str.format, str.__mod__ and re.sub to create temporary formatter object and call it.
>>
>> Later we could expose public API for creating formatter object. It can be used by third-party template engines.
>>
>
> I think this is InterpolationTemplate from PEP 501.

It's certainly a similar idea, although PEP 501 just proposed storing
strings and tuples on the code object, with the interpolation template
itself still being a mutable object constructed at runtime. Serhiy's
suggestion goes a step further to suggest making the template itself
immutable, and passing in all the potentially mutable data as method
arguments.

I think there's a simpler approach available though, which is to go
the way we went in introducing first the __import__ builtin and later
the __build_class__ builtin to encapsulate some of the complexity of
their respective statements without requiring a raft of new opcodes.

The last draft of PEP 501 before I deferred it proposed the following
for interpolation templates, since it was able to rely on having
f-strings available as a primitive and wanted to offer more
flexibility than string formatting needs:

    _raw_template = "Substitute {names} and {expressions()} at runtime"
    _parsed_template = (
        ("Substitute ", "names"),
        (" and ", "expressions()"),
        (" at runtime", None),
    )
    _field_values = (names, expressions())
    _format_specifiers = (f"", f"")
    template = types.InterpolationTemplate(_raw_template,
                                          _parsed_template,
                                          _field_values,
                                          _format_specifiers)

A __format__ builtin (or a dedicated opcode) could use a simpler data
model that consisted of the following constant and variable elements:

Compile time constant: tuple of (<leading_text>,
<tuple_of_leading_specifier_elements>) pairs
Runtime variable: tuple of (<substitution_field_value>,
<tuple_of_specifier_substitution_field_values>) pairs

If the format string didn't end with a substitution field, then the
runtime variable tuple would be 1 element shorter than the constant
tuple.

With that approach, then __format__ (or an opcode that popped these
two tuples directly off the stack) could be defined as something like:

    def __format__(constant_parts, variable_parts):
        num_fields = len(variable_parts)
        segments = []
        for idx, (leading_text, specifier_constants) in constant_parts:
            segments.append(leading_text)
            if idx < num_fields:
                field_value, specifier_variables = variable_parts[idx]
                if specifier_variables:
                    specifier = __format__(specifier_constants,
specifier_variables)
                else:
                    assert len(specifier_constants) == 1
                    specifier = specifier_constants[0]
                if specifier.startswith("!"):
                    # Handle "!a", "!r", "!s" by modifying field_value
*and* specifier
                if specifier:
                    segments.append(format(field_value, specifier)
    return "".join(segments)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list