Notice: While Javascript is not essential for this website, your interaction with the content will be limited. Please turn Javascript on for the full experience.

PEP 578 -- Python Runtime Audit Hooks

PEP:578
Title:Python Runtime Audit Hooks
Author:Steve Dower <steve.dower at python.org>
Status:Draft
Type:Standards Track
Created:16-Jun-2018
Python-Version:3.8
Post-History:

Abstract

This PEP describes additions to the Python API and specific behaviors for the CPython implementation that make actions taken by the Python runtime visible to auditing tools. Visibility into these actions provides opportunities for test frameworks, logging frameworks, and security tools to monitor and optionally limit actions taken by the runtime.

This PEP proposes adding two APIs to provide insights into a running Python application: one for arbitrary events, and another specific to the module import system. The APIs are intended to be available in all Python implementations, though the specific messages and values used are unspecified here to allow implementations the freedom to determine how best to provide information to their users. Some examples likely to be used in CPython are provided for explanatory purposes.

See PEP-551 for discussion and recommendations on enhancing the security of a Python runtime making use of these auditing APIs.

Background

Python provides access to a wide range of low-level functionality on many common operating systems in a consistent manner. While this is incredibly useful for "write-once, run-anywhere" scripting, it also makes monitoring of software written in Python difficult. Because Python uses native system APIs directly, existing monitoring tools either suffer from limited context or auditing bypass.

Limited context occurs when system monitoring can report that an action occurred, but cannot explain the sequence of events leading to it. For example, network monitoring at the OS level may be able to report "listening started on port 5678", but may not be able to provide the process ID, command line or parent process, or the local state in the program at the point that triggered the action. Firewall controls to prevent such an action are similarly limited, typically to a process name or some global state such as the current user, and in any case rarely provide a useful log file correlated with other application messages.

Auditing bypass can occur when the typical system tool used for an action would ordinarily report its use, but accessing the APIs via Python do not trigger this. For example, invoking "curl" to make HTTP requests may be specifically monitored in an audited system, but Python's "urlretrieve" function is not.

Within a long-running Python application, particularly one that processes user-provided information such as a web app, there is a risk of unexpected behavior. This may be due to bugs in the code, or deliberately induced by a malicious user. In both cases, normal application logging may be bypassed resulting in no indication that anything out of the ordinary has occurred.

Additionally, and somewhat unique to Python, it is very easy to affect the code that is run in an application by manipulating either the import system's search path or placing files earlier on the path than intended. This is often seen when developers create a script with the same name as the module they intend to use - for example, a random.py file that attempts to import the standard library random module.

Overview of Changes

The aim of these changes is to enable both application developers and system administrators to integrate Python into their existing monitoring systems without dictating how those systems look or behave.

We propose two API changes to enable this: an Audit Hook and Verified Open Hook. Both are available from Python and native code, allowing applications and frameworks written in pure Python code to take advantage of the extra messages, while also allowing embedders or system administrators to deploy "always-on" builds of Python.

Only CPython is bound to provide the native APIs as described here. Other implementations should provide the pure Python APIs, and may provide native versions as appropriate for their underlying runtimes.

Audit Hook

In order to observe actions taken by the runtime (on behalf of the caller), an API is required to raise messages from within certain operations. These operations are typically deep within the Python runtime or standard library, such as dynamic code compilation, module imports, DNS resolution, or use of certain modules such as ctypes.

The following new C APIs allow embedders and CPython implementors to send and receive audit hook messages:

# Add an auditing hook
typedef int (*hook_func)(const char *event, PyObject *args,
                         void *userData);
int PySys_AddAuditHook(hook_func hook, void *userData);

# Raise an event with all auditing hooks
int PySys_Audit(const char *event, PyObject *args);

# Internal API used during Py_Finalize() - not publicly accessible
void _Py_ClearAuditHooks(void);

The new Python APIs for receiving and raising audit hooks are:

# Add an auditing hook
sys.addaudithook(hook: Callable[[str, tuple]])

# Raise an event with all auditing hooks
sys.audit(str, *args)

Hooks are added by calling PySys_AddAuditHook() from C at any time, including before Py_Initialize(), or by calling sys.addaudithook() from Python code. Hooks cannot be removed or replaced.

When events of interest are occurring, code can either call PySys_Audit() from C (while the GIL is held) or sys.audit(). The string argument is the name of the event, and the tuple contains arguments. A given event name should have a fixed schema for arguments, which should be considered a public API (for a given x.y version release), and thus should only change between feature releases with updated documentation.

For maximum compatibility, events using the same name as an event in the reference interpreter CPython should make every attempt to use compatible arguments. Including the name or an abbreviation of the implementation in implementation-specific event names will also help prevent collisions. For example, a pypy.jit_invoked event is clearly distinguised from an ipy.jit_invoked event.

When an event is audited, each hook is called in the order it was added with the event name and tuple. If any hook returns with an exception set, later hooks are ignored and in general the Python runtime should terminate. This is intentional to allow hook implementations to decide how to respond to any particular event. The typical responses will be to log the event, abort the operation with an exception, or to immediately terminate the process with an operating system exit call.

When an event is audited but no hooks have been set, the audit() function should include minimal overhead. Ideally, each argument is a reference to existing data rather than a value calculated just for the auditing call.

As hooks may be Python objects, they need to be freed during Py_Finalize(). To do this, we add an internal API _Py_ClearAuditHooks() that releases any Python hooks and any memory held. This is an internal function with no public export, and we recommend it should raise its own audit event for all current hooks to ensure that unexpected calls are observed.

Below in Suggested Audit Hook Locations, we recommend some important operations that should raise audit events. In PEP 551, more audited operations are recommended with a view to security transparency.

Python implementations should document which operations will raise audit events, along with the event schema. It is intended that sys.addaudithook(print) be a trivial way to display all messages.

Verified Open Hook

Most operating systems have a mechanism to distinguish between files that can be executed and those that can not. For example, this may be an execute bit in the permissions field, or a verified hash of the file contents to detect potential code tampering. These are an important security mechanism for preventing execution of data or code that is not approved for a given environment. Currently, Python has no way to integrate with these when launching scripts or importing modules.

The new public C API for the verified open hook is:

# Set the handler
typedef PyObject *(*hook_func)(PyObject *path, void *userData)
int PyImport_SetOpenForImportHook(hook_func handler, void *userData)

# Open a file using the handler
PyObject *PyImport_OpenForImport(const char *path)

The new public Python API for the verified open hook is:

# Open a file using the handler
importlib.util.open_for_import(path : str) -> io.IOBase

The importlib.util.open_for_import() function is a drop-in replacement for open(str(pathlike), 'rb'). Its default behaviour is to open a file for raw, binary access. To change the behaviour a new handler should be set. Handler functions only accept str arguments.

A custom handler may be set by calling PyImport_SetOpenForImportHook() from C at any time, including before Py_Initialize(). However, if a hook has already been set then the call will fail. When open_for_import() is called with a hook set, the hook will be passed the path and its return value will be returned directly. The returned object should be an open file-like object that supports reading raw bytes. This is explicitly intended to allow a BytesIO instance if the open handler has already had to read the file into memory in order to perform whatever verification is necessary to determine whether the content is permitted to be executed.

Note that these hooks can import and call the _io.open() function on CPython without triggering themselves. They can also use _io.BytesIO to return a compatible result using an in-memory buffer.

If the hook determines that the file should not be loaded, it should raise an exception of its choice, as well as performing any other logging.

All import and execution functionality involving code from a file will be changed to use open_for_import() unconditionally. It is important to note that calls to compile(), exec() and eval() do not go through this function - an audit hook that includes the code from these calls is the best opportunity to validate code that is read from the file. Given the current decoupling between import and execution in Python, most imported code will go through both open_for_import() and the log hook for compile, and so care should be taken to avoid repeating verification steps.

There is no Python API provided for changing the open hook. To modify import behavior from Python code, use the existing functionality provided by importlib.

API Availability

While all the functions added here are considered public and stable API, the behavior of the functions is implementation specific. Most descriptions here refer to the CPython implementation, and while other implementations should provide the functions, there is no requirement that they behave the same.

For example, sys.addaudithook() and sys.audit() should exist but may do nothing. This allows code to make calls to sys.audit() without having to test for existence, but it should not assume that its call will have any effect. (Including existence tests in security-critical code allows another vector to bypass auditing, so it is preferable that the function always exist.)

importlib.util.open_for_import(path) should at a minimum always return _io.open(path, 'rb'). Code using the function should make no further assumptions about what may occur, and implementations other than CPython are not required to let developers override the behavior of this function with a hook.

Suggested Audit Hook Locations

The locations and parameters in calls to sys.audit() or PySys_Audit() are to be determined by individual Python implementations. This is to allow maximum freedom for implementations to expose the operations that are most relevant to their platform, and to avoid or ignore potentially expensive or noisy events.

Table 1 acts as both suggestions of operations that should trigger audit events on all implementations, and examples of event schemas.

Table 2 provides further examples that are not required, but are likely to be available in CPython.

Refer to the documentation associated with your version of Python to see which operations provide audit events.

Table 1: Suggested Audit Hooks
API Function Event Name Arguments Rationale
PySys_AddAuditHook sys.addaudithook   Detect when new audit hooks are being added.
PyImport_SetOpenForImportHook setopenforimporthook   Detects any attempt to set the open_for_import hook.
compile, exec, eval, PyAst_CompileString, PyAST_obj2mod compile (code, filename_or_none) Detect dynamic code compilation, where code could be a string or AST. Note that this will be called for regular imports of source code, including those that were opened with open_for_import.
exec, eval, run_mod exec (code_object,) Detect dynamic execution of code objects. This only occurs for explicit calls, and is not raised for normal function invocation.
import import (module, filename, sys.path, sys.meta_path, sys.path_hooks) Detect when modules are imported. This is raised before the module name is resolved to a file. All arguments other than the module name may be None if they are not used or available.
PyEval_SetProfile sys.setprofile   Detect when code is injecting trace functions. Because of the implementation, exceptions raised from the hook will abort the operation, but will not be raised in Python code. Note that threading.setprofile eventually calls this function, so the event will be audited for each thread.
PyEval_SetTrace sys.settrace   Detect when code is injecting trace functions. Because of the implementation, exceptions raised from the hook will abort the operation, but will not be raised in Python code. Note that threading.settrace eventually calls this function, so the event will be audited for each thread.
_PyObject_GenericSetAttr, check_set_special_type_attr, object_set_class, func_set_code, func_set_[kw]defaults object.__setattr__ (object, attr, value) Detect monkey patching of types and objects. This event is raised for the __class__ attribute and any attribute on type objects.
_PyObject_GenericSetAttr object.__delattr__ (object, attr) Detect deletion of object attributes. This event is raised for any attribute on type objects.
Unpickler.find_class pickle.find_class (module_name, global_name) Detect imports and global name lookup when unpickling.
Table 2: Potential CPython Audit Hooks
API Function Event Name Arguments Rationale
_PySys_ClearAuditHooks sys._clearaudithooks   Notifies hooks they are being cleaned up, mainly in case the event is triggered unexpectedly. This event cannot be aborted.
code_new code.__new__ (bytecode, filename, name) Detect dynamic creation of code objects. This only occurs for direct instantiation, and is not raised for normal compilation.
func_new_impl function.__new__ (code,) Detect dynamic creation of function objects. This only occurs for direct instantiation, and is not raised for normal compilation.
_ctypes.dlopen, _ctypes.LoadLibrary ctypes.dlopen (module_or_path,) Detect when native modules are used.
_ctypes._FuncPtr ctypes.dlsym (lib_object, name) Collect information about specific symbols retrieved from native modules.
_ctypes._CData ctypes.cdata (ptr_as_int,) Detect when code is accessing arbitrary memory using ctypes.
new_mmap_object mmap.__new__ (fileno, map_size, access, offset) Detects creation of mmap objects. On POSIX, access may have been calculated from the prot and flags arguments.
sys._getframe sys._getframe (frame_object,) Detect when code is accessing frames directly.
sys._current_frames sys._current_frames   Detect when code is accessing frames directly.
socket.bind, socket.connect, socket.connect_ex, socket.getaddrinfo, socket.getnameinfo, socket.sendmsg, socket.sendto socket.address (address,) Detect access to network resources. The address is unmodified from the original call.
member_get, func_get_code, func_get_[kw]defaults object.__getattr__ (object, attr) Detect access to restricted attributes. This event is raised for any built-in members that are marked as restricted, and members that may allow bypassing imports.
urllib.urlopen urllib.Request (url, data, headers, method) Detects URL requests.

Performance Impact

The important performance impact is the case where events are being raised but there are no hooks attached. This is the unavoidable case - once a distributor begins adding audit hooks they have explicitly chosen to trade performance for functionality. Performance import with hooks added are not of interest here, since this is considered opt-in functionality.

Analysis using the Python Performance Benchmark Suite [1] shows no significant impact, with the vast majority of benchmarks showing between 1.05x faster to 1.05x slower.

In our opinion, the performance impact of the set of auditing points described in this PEP is negligible.

Rejected Ideas

Separate module for audit hooks

The proposal is to add a new module for audit hooks, hypothetically audit. This would separate the API and implementation from the sys module, and allow naming the C functions PyAudit_AddHook and PyAudit_Audit rather than the current variations.

Any such module would need to be a built-in module that is guaranteed to always be present. The nature of these hooks is that they must be callable without condition, as any conditional imports or calls provide opportunities to intercept and suppress or modify events.

Given its nature as one of the most core modules, the sys module is somewhat protected against module shadowing attacks. Replacing sys with a sufficiently functional module that the application can still run is a much more complicated task than replacing a module with only one function of interest. An attacker that has the ability to shadow the sys module is already capable of running arbitrary code from files, whereas an audit module can be replaced with a single line in a .pth file anywhere on the search path:

import sys; sys.modules['audit'] = type('audit', (object,),
    {'audit': lambda *a: None, 'addhook': lambda *a: None})

Multiple layers of protection already exist for monkey patching attacks against either sys or audit, but assignments or insertions to sys.modules are not audited.

This idea is rejected because it makes substituting audit calls throughout all callers trivial.

Flag in sys.flags to indicate "audited" mode

The proposal is to add a value in sys.flags to indicate when Python is running in a "secure" or "audited" mode. This would allow applications to detect when some features are enabled or when hooks have been added and modify their behaviour appropriately.

Currently, we are not aware of any legitimate reasons for a program to behave differently in the presence of audit hooks.

Both application-level APIs sys.audit and importlib.util.open_for_import are always present and functional, regardless of whether the regular python entry point or some alternative entry point is used. Callers cannot determine whether any hooks have been added (except by performing side-channel analysis), nor do they need to. The calls should be fast enough that callers do not need to avoid them, and the program is responsible for ensuring that any added hooks are fast enough to not affect application performance.

The argument that this is "security by obscurity" is valid, but irrelevant. Security by obscurity is only an issue when there are no other protective mechanisms; obscurity as the first step in avoiding attack is strongly recommended (see this article for discussion).

This idea is rejected because there are no appropriate reasons for an application to change its behaviour based on whether these APIs are in use.

Relationship to PEP 551

This API was originally presented as part of PEP 551 Security Transparency in the Python Runtime.

For simpler review purposes, and due to the broader applicability of these APIs beyond security, the API design is now presented separately.

PEP 551 is an informational PEP discussing how to integrate Python into a secure or audited environment.

References

[1]Python Performance Benchmark Suite https://github.com/python/performance
Source: https://github.com/python/peps/blob/master/pep-0578.rst