[Python-ideas] Correct way for writing Python code without causing interpreter crashes due to parser stack overflow

Wed Jun 27 12:18:54 EDT 2018

> Von: Michael Selik [mailto:mike at selik.org]
> 
> On Wed, Jun 27, 2018 at 12:04 AM Fiedler Roman <Roman.Fiedler at ait.ac.at
> <mailto:Roman.Fiedler at ait.ac.at> > wrote:
> 
> 	Context: we are conducting machine learning experiments that
> generate some kind of nested decision trees. As the tree includes specific
> decision elements (which require custom code to evaluate), we decided to
> store the decision tree (result of the analysis) as generated Python code. Thus
> the decision tree can be transferred to sensor nodes (detectors) that will then
> filter data according to the decision tree when executing the given code.
> 
> How do you write tests for the sensor nodes? Do they use code as data for
> test cases?

We have two approaches for test data generation: as we are processing log data, we may use adaptive, self-learning log data generators that can then be spiked with anomalies. In other tests we used armored zero day exploits on production-like test systems to get more realistic data.

The big picture: When finally everything is working, distributed sensor nodes shall pre-process machine log data streams for security analysis in real time and report findings back to a central instance. Findings also include data, that does not make sense to the sensor node (cannot be classified). This central instance updates its internal model attempting to learn how to classify the new data and then creates new model-evaluation-code (that is the one that caused the crash) that is sent to the sensors again. The sensor replaces the model with the generated code, thus altering the log data analysis behaviour.

The current implementation uses https://packages.debian.org/search?keywords=logdata-anomaly-miner to run the sensor nodes, the central instance is experimental code creating configuration for the nodes. When the detection methods get more mature, the way of model distribution is likely to change to a more robust scheme. We try to apply those mining approaches to various domains, e.g. for attack detection based on log data without known structure (proprietary systems, no SIEM-regexes available yet, no rules), but also e.g. for detecting vulnerable code before it is exploited (zero-day discovery of LXC container escape vulnerabilites) but also to detect execution of zeroday exploits itself, that we wrote for demonstration purposes. See https://itsecx.fhstp.ac.at/wp-content/uploads/2016/11/06_RomanFiedler_SyscallAuditLogMining-V1.pdf (sorry, German slides only)