[New-bugs-announce] [issue30657] Unsafe arithmetic in PyString_DecodeEscape

Jay Bosamiya report at bugs.python.org
Tue Jun 13 11:35:29 EDT 2017


New submission from Jay Bosamiya:

In Python 2.7, there is a possible integer overflow in
PyString_DecodeEscape function of the file stringobject.c, which can
be abused to gain a heap overflow, possibly leading to arbitrary code
execution.

The relevant parts of the code are highlighted below:

    PyObject *PyString_DecodeEscape(const char *s,
                                    Py_ssize_t len,
                                    const char *errors,
                                    Py_ssize_t unicode,
                                    const char *recode_encoding)
    {
        int c;
        char *p, *buf;
        const char *end;
        PyObject *v;
(1)     Py_ssize_t newlen = recode_encoding ? 4*len:len;
(2)     v = PyString_FromStringAndSize((char *)NULL, newlen);
        if (v == NULL)
            return NULL;
(3)     p = buf = PyString_AsString(v);
        end = s + len;
        while (s < end) {
            if (*s != '\\') {
              non_esc:
    #ifdef Py_USING_UNICODE
    [...]
    #else
                *p++ = *s++;
    #endif
                continue;
    [...]
            }
        }
(4)     if (p-buf < newlen)
            _PyString_Resize(&v, p - buf); /* v is cleared on error */
        return v;
      failed:
        Py_DECREF(v);
        return NULL;
    }


(1) If recode_encoding is true (i.e., non-null), we have an integer
      overflow here which can set newlen to be some very small value
(2) This allows a small string to be created into v
(3) Now p (and buf) use that small string
(4) The small string is copied into with a larger string, thereby
      giving a heap buffer overflow

In the highly unlikely but definitely possible situation that we pass
it a very large string (in the order of ~1GB on a 32-bit Python
install), one can reliably get heap corruption. It is possible to
access this function (and condition in line(1)) through function
parsestr from ast.c, when the file encoding of an input .py file is
something apart from utf-8 and iso-8859-1. This can be trivially done
using the following at the start of the file:
    # -*- coding: us-ascii -*-

The attached file (poc-gen.py) produces a poc.py file which satisfies
these constraints and shows the vulnerability.

Note: To see the vulnerability in action, it is necessary to have an
ASAN build of Python, compiled for 32 bit on a 64 bit machine.
Additionally, the poc.py file generated can take an extremely long
time to load (over a few hours), and finally crash. Instead, if one
wishes to see the proof of vulnerability quicker, then it might be
better to change the constant 4 in line (1) to 65536 (just for
simplicity sake), and change the multiplication_constant in poc-gen.py
file to be the same (i.e. 65536).

Proposed fix: Confirm that the multiplication will not overflow,
before actually performing the multiplication and depending on the
result.

----------
components: Interpreter Core
files: poc-gen.py
messages: 295930
nosy: jaybosamiya
priority: normal
severity: normal
status: open
title: Unsafe arithmetic in PyString_DecodeEscape
type: security
versions: Python 2.7
Added file: http://bugs.python.org/file46950/poc-gen.py

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue30657>
_______________________________________


More information about the New-bugs-announce mailing list