[Python-Dev] Fuzzing the Python standard library

Damian Shaw damian.peter.shaw at gmail.com
Tue Jul 17 13:26:42 EDT 2018


I'm not a core Python Dev, but quick question, why would you expect "
fractions.Fraction("1.64E6646466664")" not to take 100s of megabytes and
hours to run?

Simply evaluating: 164 * 10**664646666 will take hundreds of megabytes by
definition.

Regards
Damian


On Tue, Jul 17, 2018, 12:54 Jussi Judin <jjudin+python at iki.fi> wrote:

> Hi,
>
> I have been fuzzing[1] various parts of Python standard library for Python
> 3.7 with python-afl[2] to find out internal implementation issues that
> exist in the library. What I have been looking for are mainly following:
>
> * Exceptions that are something else than the documented ones. These
> usually indicate an internal implementation issue. For example one would
> not expect an UnicodeDecodeError from netrc.netrc() function when the
> documentation[3] promises netrc.NetrcParseError and there is no way to pass
> properly sanitized file object to the netrc.netrc().
> * Differences between values returned by C and Python versions of some
> functions. quopri module may have these.
> * Unexpected performance and memory allocation issues. These can be
> somewhat controversial to fix, if at all, but at least in some cases from
> end-user perspective it can be really nasty if for example
> fractions.Fraction("1.64E6646466664") results in hundreds of megabytes of
> memory allocated and takes very long to calculate. I gave up waiting for
> that function call to finish after 5 minutes.
>
> As this is going to result in a decent amount of bug reports (currently I
> only filed one[4], although that audio processing area has much more issues
> to file), I would like to ask your opinion on filing these bug reports.
> Should I report all issues regarding some specific module in one bug
> report, or try to further split them into more fine grained reports that
> may be related? These different types of errors are specifically noticeable
> in zipfile module that includes a lot of different exception and behavioral
> types on invalid data <
> https://github.com/Barro/python-stdlib-fuzzers/tree/master/zipfile/crashes>
> . And in case of sndhdr module, there are multiple modules with issues
> (aifc, sunau, wave) that then show up also in sndhdr when they are used. Or
> are some of you willing to go through the crashes that pop up and help with
> the report filing?
>
> The code and more verbose description for this is available from <
> https://github.com/Barro/python-stdlib-fuzzers>. It works by default on
> some GNU/Linux systems only (I use Debian testing), as it relies on
> /dev/shm/ being available and uses shell scripts as wrappers that rely on
> various tools that may not be installed on all systems by default.
>
> As a bonus, as this uses coverage based fuzzing, it also opens up the
> possibility of automatically creating a regression test suite for each of
> the fuzzed modules to ensure that the existing functionality (input files
> under <fuzz-target>/corpus/ directory) does not suddenly result in
> additional exceptions and that it is more easy to test potential bug fixes
> (crash inducing files under <fuzz-target>/crashes/ directory).
>
> As a downside, this uses two quite specific tools (afl, python-afl) that
> have further dependencies (Cython) inside them, I doubt the viability of
> integrating this type of testing as part of normal Python verification
> process. As a difference to libFuzzer based fuzzing that is already
> integrated in Python[5], this instruments the actual (and only the) Python
> code and not the actions that the interpreter does in the background. So
> this should result in better fuzzer coverage for Python code that is used
> with the downside that when C functions are called, they are complete black
> boxes to the fuzzer.
>
> I have mainly run these fuzzer instances at most for several hours per
> module with 4 instances and stopped running no-issue modules after there
> have been no new coverage discovered after more than 10 minutes. Also I
> have not really created high quality initial input files, so I wouldn't be
> surprised if there are more issues lurking around that could be found with
> throwing more CPU and higher quality fuzzers at the problem.
>
> [1]: https://en.wikipedia.org/wiki/Fuzzing
> [2]: https://github.com/jwilk/python-afl
> [3]: https://docs.python.org/3/library/netrc.html
> [4]: https://bugs.python.org/issue34088
> [5]: https://github.com/python/cpython/tree/3.7/Modules/_xxtestfuzz
>
> --
> Jussi Judin
> https://jjudin.iki.fi/
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/damian.peter.shaw%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20180717/82c8f47f/attachment.html>


More information about the Python-Dev mailing list