[Python-ideas] Correct way for writing Python code without causing interpreter crashes due to parser stack overflow

Fiedler Roman Roman.Fiedler at ait.ac.at
Wed Jun 27 11:47:15 EDT 2018


> Von: Nick Coghlan [mailto:ncoghlan at gmail.com]
>
> On 27 June 2018 at 17:04, Fiedler Roman <Roman.Fiedler at ait.ac.at> wrote:
> > Hello List,
> >
> > Context: we are conducting machine learning experiments that generate
> some kind of nested decision trees. As the tree includes specific decision
> elements (which require custom code to evaluate), we decided to store the
> decision tree (result of the analysis) as generated Python code. Thus the
> decision tree can be transferred to sensor nodes (detectors) that will then
> filter data according to the decision tree when executing the given code.
> >
> > Tracking down a crash when executing that generated code, we came to
> following simplified reproducer that will cause the interpreter to crash (on
> Python 2/3) when loading the code before execution is started:
> >
> > #!/usr/bin/python2 -BEsStt
> >
> A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([
> A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A(None)])])])
> ])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])
> >
> > The error message is:
> >
> > s_push: parser stack overflow
> > MemoryError
> >
> > Despite the machine having 16GB of RAM, the code cannot be loaded.
> Splitting it into two lines using an intermediate variable is the current
> workaround to still get it running after manual adapting.
>
> This seems like it may indicate a potential problem in the pgen2
> parser generator, since the compilation is failing at the original
> parse step, but checking the largest version of this that CPython can
> parse on my machine gives a syntax tree of only ~77kB:
>
>     >>> tree =
> parser.expr("A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A(
> [A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A(None)])])]
> )])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])")
>     >>> sys.getsizeof(tree)
>     77965
>
> Attempting to print that hints more closely at the potential problem:
>
>     >>> tree.tolist()
>     Traceback (most recent call last):
>      File "<stdin>", line 1, in <module>
>     RecursionError: maximum recursion depth exceeded while getting the
> repr of an object
>
> As far as I'm aware, the CPython parser is using the actual C stack
> for recursion, and is hence throwing MemoryError because it ran out of
> stack space to recurse into, not because it ran out of memory in
> general (RecursionError would be a more accurate exception).

That seems conclusive. Knowing the cause but fearing regressions, maybe the code should not be changed regarding the limits (thus opening a can of worms) but something like that might be nice:

* Raise RecursionError('Maximum supported compile time parser recursion depth of [X] exceeded, see [docuref]')
* With the python-warn-all flag, issue a warning if a file reaches half or 75% of the limit during parsing?

> Trying your original example in PyPy (which uses a different parser
> implementation) suggests you may want to try using that as your
> execution target before resorting to switching languages entirely:
>
>     >>>> tree2 =
> parser.expr("A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A(
> [A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([
> A(None)])])])])])])])])])]]))])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])
> ])")
>     >>>> len(tree2.tolist())
>     5
>
> Alternatively, you could explore mimicking the way that scikit-learn
> saves its trained models (which I believe is a variation on "use
> pickle", but I've never actually gone and checked for sure).

Thank you for your very informative post, both solutions/workaround seem appropriate. Apart from that, the "scikit-learn" might also have the advantage to use something more "standardizes", thus easing cooperation in scientific community. I will pass this information on to my colleague.

LG Roman


More information about the Python-ideas mailing list