Strange behavior in string interpolation of constants

Ned Batchelder ned at nedbatchelder.com
Mon Oct 16 20:18:22 EDT 2017


On 10/16/17 7:39 PM, מיקי מונין wrote:
> Hello, I am working on an article on python string formatting. As a part of
> the article I am researching the different forms of python string
> formatting.
>
> While researching string interpolation(i.e. the % operator) I noticed
> something weird with string lengths.
>
> Given two following two functions:
>
> def simple_interpolation_constant_short_string():
>      return "Hello %s" % "World!"
>
> def simple_interpolation_constant_long_string():
>      return "Hello %s. I am a very long string used for research" % "World!"
>
>
> Lets look at the bytecode generated by them using the dis module
>
> The first example produces the following bytecode:
>    9           0 LOAD_CONST               3 ('Hello World!')
>                2 RETURN_VALUE
>
> It seems very normal, it appears that the python compiler optimizes the
> constant and removes the need for the string interpolation
>
> However the output of the second function caught my eye:
>
>   12          0 LOAD_CONST               1 ('Hello %s. I am a very long
> string used for research')
>                2 LOAD_CONST                2 ('World!')
>                4 BINARY_MODULO
>                6 RETURN_VALUE
>
> This was not optimized by the compiler! Normal string interpolation was
> used!
>
> Based on some more testing it appears that for strings that would result in
> more than 20 characters no optimization is done, as evident by these
> examples:
>
> def expected_result():
>      return "abcdefghijklmnopqrs%s" % "t"
>
> Bytecode:
>   15          0 LOAD_CONST               3 ('abcdefghijklmnopqrst')
>                2 RETURN_VALUE
>
> def abnormal_result():
>      return "abcdefghijklmnopqrst%s" % "u"
>
> Bytecode:
>
>   18          0 LOAD_CONST               1 ('abcdefghijklmnopqrst%s')
>                2 LOAD_CONST                 2 ('u')
>                4 BINARY_MODULO
>                6 RETURN_VALUE
>
> I am using Python 3.6.3
> I am curios as to why this happens. Can anyone shed further light on this
> behaviour?

Optimizers have plenty of heuristics.  This one seems to avoid the 
constant folding if the string is larger than 20.  The code seems to 
bear this out 
(https://github.com/python/cpython/blob/master/Python/peephole.c#L305):

     } else if (size > 20) {
         Py_DECREF(newconst);
         return -1;
     }

As to why they chose 20?  There's no clue in the code, and I don't know.

--Ned.



More information about the Python-list mailing list