[Python-ideas] Correct way for writing Python code without causing interpreter crashes due to parser stack overflow

Fiedler Roman Roman.Fiedler at ait.ac.at
Wed Jun 27 11:33:25 EDT 2018


> Von: Guido van Rossum [mailto:guido at python.org]
> 
> I consider this is a bug -- a violation of Python's (informal) promise to the user
> that when CPython segfaults it is not the user's fault.

Strictly it is not a segfault, just a parser exception that cannot be caught (at least I failed to catch it in a quick test). Seems that the catch block is parsed after parsing the problematic code, so any "except" in the code itself is useless. Apart from that: even when caught, what to do? Your program partially refuses to load - only benefit is that you can die gracefully.

> Given typical Python usage patterns, I don't consider this an important bug,
> but maybe someone is interested in trying to fix it.

Acknowledged: I do not know of any software, where this has high relevance, but my knowledge is quite limited, so asked PSRT before to be sure.

> As far as your application is concerned, I'm not sure that generating code like
> that is the right approach. Why don't you generate a data structure and a little
> engine that walks the data structure?

That's what I told the colleague asking me to assist in analysis of the crash too. I guess that the "simple generator" was just easier to write, thus used as a starting point. And now by chance a model was generated hitting the Python limit of 50 instantiations/lists per statement or whatsoever. So there is not much "why" to be explained, it just happened.

Kind regards,
Roman


> On Wed, Jun 27, 2018 at 12:05 AM Fiedler Roman <Roman.Fiedler at ait.ac.at
> <mailto:Roman.Fiedler at ait.ac.at> > wrote:
> 
> 
> 	Hello List,
> 
> 	Context: we are conducting machine learning experiments that
> generate some kind of nested decision trees. As the tree includes specific
> decision elements (which require custom code to evaluate), we decided to
> store the decision tree (result of the analysis) as generated Python code. Thus
> the decision tree can be transferred to sensor nodes (detectors) that will then
> filter data according to the decision tree when executing the given code.
> 
> 	Tracking down a crash when executing that generated code, we came
> to following simplified reproducer that will cause the interpreter to crash (on
> Python 2/3) when loading the code before execution is started:
> 
> 	#!/usr/bin/python2 -BEsStt
> 	A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A
> ([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A(No
> ne)])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])
> 
> 	The error message is:
> 
> 	s_push: parser stack overflow
> 	MemoryError
> 
> 	Despite the machine having 16GB of RAM, the code cannot be loaded.
> Splitting it into two lines using an intermediate variable is the current
> workaround to still get it running after manual adapting.
> 
> 	As discussed on Python security list, crashes when loading such
> decision trees or also mathematical formulas (see bug report [1]) should not
> be a security problem. Even when not directly covered in the Python security
> model documentation [2], this case comes too close to "arbitrary code
> execution", where Python does not attempt to provide any protection. There
> might be only some border cases of affected software,  e.g. Python sandbox
> systems like Zope/Plone or maybe even Python based smart contract
> blockchains like Etherereum (do not know if/where the use/derived work
> from the default Python interpreter for their use). But in both cases they
> would also be too close violating the security model, thus no changes to
> Python required from this side. Thus Python security suggested that the
> discussion should be continued on this list.
> 
> 
> 	Even when no security problem involved, the crash is still quite an
> annoyance. Development of code generators can be a tedious tasks. It is then
> somehow frustrating, when your generated code is not accepted by the
> interpreter, even when you do not feel like getting close to some system-
> relevant limits, e.g. 50 elements in a line like above on a 16GB machine. You
> may adapt the generator, but as the error does not include any information,
> which limit you really violated (number of brackets, function calls, list
> definitions?) you can only do experiments or look on the Python compiler
> code to figure that out. Even when you fix it, you have no guarantee to hit
> some other obscure limit the next day or that those limits change from one
> Python minor version to the next causing regressions.
> 
> 	Questions:
> 
> 	* Do you deem it possible/sensible to even attempt to write a Python
> language code generator that will produce non-malicious, syntactically valid
> decision tree code/mathematical formulas and still having a sufficiently high
> probability that the Python interpreter will also run that code now and in near
> future (regressions)?
> 
> 	* Assuming yes to the question above, when generating code, what
> should be the maximal nesting depth a code generator can always expect to
> be compiled on Python 2.7 and 3.5 on? Are there any other similar
> restrictions that need to be considered by the code generator? Or is
> generating code that way not the preferred solution anyway - the code
> generator should generate e.g. binary python code immediately? Note: in the
> end the exact same logic code will run as Python process, it seems it is only
> about how it is loaded into the Python interpreter.
> 
> 	* If not possible/recommended/sensible, we might generate Java-
> bytecode or native x86-code instead, where the likelihood of the (virtual) CPU
> really executing code that is compliant to the language specification (even
> with CPU errata like FDIV-bug et al) might be magnitudes higher than with the
> Python interpreter.
> 
> 	Any feedback appreciated!
> 
> 	Roman
> 
> 	[1] https://bugs.python.org/issue3971)
> 	[2] http://python-security.readthedocs.io/security.html#security-
> model
> 	_______________________________________________
> 	Python-ideas mailing list
> 	Python-ideas at python.org <mailto:Python-ideas at python.org>
> 	https://mail.python.org/mailman/listinfo/python-ideas
> 	Code of Conduct: http://python.org/psf/codeofconduct/
> 
> 
> 
> 
> --
> 
> --Guido van Rossum (python.org/~guido <http://python.org/~guido> )


More information about the Python-ideas mailing list