[New-bugs-announce] [issue27946] issues in elementtree and elsewhere due to PyDict_GetItem

tehybel report at bugs.python.org
Fri Sep 2 17:48:01 EDT 2016


New submission from tehybel:

I would like to describe an issue in the _elementtree module, and then
propose a fix which would prevent this type of bug everywhere in the
codebase.

The issue exists in _elementtree_Element_get_impl in
/Modules/_elementtree.c. Here is the code:

    static PyObject *
    _elementtree_Element_get_impl(ElementObject *self, PyObject *key,
                                  PyObject *default_value)
    {
        PyObject* value;

        if (...)
            value = default_value;
        else {
            value = PyDict_GetItem(self->extra->attrib, key);
            ...
        }
        ...
    }

We look up "key" in the dictionary, that is, in self->extra->attrib. This is 
done with the call to PyDict_GetItem(self->extra->attrib, key).

We need to hash the "key" object in order to find it in the dictionary. This
sometimes requires calling the key's __hash__ function, i.e., it might call on
to python code.

What happens if the python code gets the dictionary (self->extra->attrib) freed?
Then PyDict_GetItem will use it after it has been freed.

I've attached a proof-of-concept script which causes a use-after-free through
_elementtree_Element_get_impl due to this issue.

We could fix this by calling Py_INCREF on self->extra->attrib before calling
PyDict_GetItem. But grepping for "PyDict_GetItem.*\->" shows that there are many
places in the codebase where this situation occurs. Some of these may not have
much impact, but some of them likely will.

All these bugs could be fixed at once by changing PyDict_GetItem to call 
Py_INCREF on the dictionary before it hashes the key.



Here's a PoC for the _elementtree module.

--- begin script ---

import _elementtree as et

state = {
    "tag": "tag",
    "_children": [],
    "attrib": "attr",
    "tail": "tail",
    "text": "text",
}

class X:
    def __hash__(self):
        e.__setstate__(state) # this frees e->extra->attrib
        return 13

e = et.Element("elem", {1: 2})
e.get(X())

--- end script ---

Running it (64-bits Python 3.5.2, --with-pydebug) causes a use-after-free with
control of the program counter:

(gdb) r ./poc13.py
Starting program: /home/xx/Python-3.5.2/python ./poc13.py

Program received signal SIGSEGV, Segmentation fault.
0x00000000004939af in PyDict_GetItem (op=0x7ffff6d5b1a8, key=0x7ffff6d528e0) at Objects/dictobject.c:1079
1079            ep = (mp->ma_keys->dk_lookup)(mp, key, hash, &value_addr);
(gdb) p mp->ma_keys->dk_lookup
$8 = (dict_lookup_func) 0x7b7b7b7b7b7b7b7b

----------
components: Extension Modules
messages: 274276
nosy: tehybel
priority: normal
severity: normal
status: open
title: issues in elementtree and elsewhere due to PyDict_GetItem
versions: Python 3.5, Python 3.6

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue27946>
_______________________________________


More information about the New-bugs-announce mailing list