[Python-ideas] pytaint: taint tracking in python

Nick Coghlan ncoghlan at gmail.com
Tue Oct 15 23:56:45 CEST 2013


On 15 Oct 2013 19:59, "Felix Gröbert" <felix at groebert.org> wrote:
>
> 1. Please correct me if I misunderstand the Python project, but if the
idea is deemed 'good' by this list, a PEP can follow and the feature can be
included in Python 3? It is not necessary to have a Python 3 implementation
beforehand?

Sure. I was just pointing out that the significantly different str and
bytes types and the removal of the implicit conversions between them in 3.x
could complicate the eventual forward porting process. (Although GPS has
indicated it shouldn't be a major problem in this case).

> The existing Python 2.7.5 pytaint implementation is intended to be run by
users who need tainting in Python 2 but can also serve as a reference /
benchmark / proof-of-concept implementation for this discussion.
>
> 2. I haven't had the time to publish benchmarks yet but I plan to. Also,
of course, the cpython tests pass and we added additional taint tracking
tests. We also ran the internal tests of our python codebase with the
pytaint interpreter. This had negligible fails, mostly because some C
extensions haven't had been recompiled to work with the redefined string
objects.
>
> Regarding taint tracking as a feature for python:
>
> First of all, taint tracking is a general language feature and can be
considered for additional applications besides security. When it comes to
the security community, taint tracking is certainly controversial.
Nevertheless, my pytaint announcement received 50 retweets and 30 favs from
a part of the security community, if that counts for something ;)

If you can provide a way to taint strings with an encoding assumption such
that combining strings with conflicting encoding assumptions fails, that
would be a big point in favour of the system.

A way to track the origins of tainted objects would also be a big winner.
While I assume tracking that would be too expensive to do by default,
tracing the origin of bad data can be a genuinely hard debugging problem,
so being able to fire up failing unit tests or vulnerability scans in a
taint tracing mode could be very interesting.

>
> As Andrew and Bruce mention, there are other solutions to XSS and SQLi:
template systems and parameterized queries. Another library solution exists
to shell injection: pipes.quote. However, all these solutions require the
developer to pick the correct library and method. We have empirical
indicators that this works, but maybe only in 70% of cases. The rest of the
developers are introducing new vulnerabilities. Thus, an additional
language-based feature can help to mitigate the remaining 30% of cases. A
web app framework (or a python-developing company) can maintain and ship a
pytaint configuration which will throw a TaintError exception in those 30%
of cases and prevent the vulnerability from being exploited.
>
> This argument follows along the principle of defense-in-depth: why just
have one security feature (e.g. pipes.quote) if we can offer several
security features to the developer? This has previously worked well for
system security: ALSR, DEP, etc.

Yes, the idea sounds interesting to me in principle. If it can be adapted
to help with the "where did the bad string data come from?" problem more
generally, then it becomes genuinely compelling :)

> Regarding the relation to typing:
>
> We are using Mertis on purpose to be able to distinguish between
different forms of string cleaning. Today, most HTML template systems don't
even make a distinction between different escaping contexts. However, with
a pytaint Merit configuration for raw HTML, URLs, HTML attribution
contents, CSS attributes and JS strings, you would be able to make sure
that your string is cleaned for the specific context you're using it in.
This can be implemented for each template system individually but it would
be easier to just write a pytaint config.
> If you don't clean strings based on browser context, you will run into
problems: a string is cleaned with HTML-entity encoding but used in a
<iframe src> attribute. An attacker could trigger a XSS by suppling
javascript:alert(document.cookie).

It seems to me that viewing this as a parallel typing system for data
strings is a potentially useful way of looking at things.

Cheers,
Nick.

>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131016/466d2f91/attachment-0001.html>


More information about the Python-ideas mailing list