[Python-ideas] add a hash to .pyc to don't mess between .py and .pyc

Wes Turner wes.turner at gmail.com
Sun Aug 14 20:45:23 EDT 2016


You can add a `make clean` build step:

  pyclean:
      find . -name '*.pyc' -delete

You can delete all .pyc files

- $ find . -name '*.pyc' -delete
- http://manpages.ubuntu.com/manpages/precise/man1/pyclean.1.html #.pyc,
.pyo

You can rebuild all .pyc files (for a given directory):

- $ python -m compileall -h
- https://docs.python.org/2/library/compileall.html
- https://docs.python.org/3/library/compileall.html



You can, instead of building .pyc, build .pyo

- https://docs.python.org/2/using/cmdline.html#envvar-PYTHONOPTIMIZE
- https://docs.python.org/2/using/cmdline.html#cmdoption-O

You can not write .pyc or .pyo w/ PYTHONDONTWRITEBYTECODE / -B

-
https://docs.python.org/2/using/cmdline.html#envvar-PYTHONDONTWRITEBYTECODE
- https://docs.python.org/2/using/cmdline.html#cmdoption-B
- If the files exist though,
  - https://docs.python.org/3/reference/import.html

You can build a PEX (which rebuilds .pyc files) and test/deploy that:

- https://github.com/pantsbuild/pex#integrating-pex-into-your-workflow
- https://pantsbuild.github.io/python-readme.html#more-about-python-tests

How .pyc files currently work:

- http://nedbatchelder.com/blog/200804/the_structure_of_pyc_files.html
- https://www.python.org/dev/peps/pep-3147/#flow-chart (*.pyc ->
./__pycache__)
- http://raulcd.com/how-python-caches-compiled-bytecode.html

You could add a hash of the .py source file in the header of the .pyc/.pyo
object (as proposed)

- The overhead of this hashing would be a significant performance regression
- Instead, today, the build step can just pyclean or build a .zip/.WHL/.PEX
which is expected to be a fresh build

On Sun, Aug 14, 2016 at 6:23 PM, Chris Angelico <rosuav at gmail.com> wrote:

> On Mon, Aug 15, 2016 at 9:05 AM, Xavier Combelle
> <xavier.combelle at gmail.com> wrote:
> > I know that some use case makes a use of just using .pyc and not keeping
> > .py  around, for example by not distribute the source file.
> > But in my vision, this uses case should be solved per opt-in decision
> > and not as a default. Several opt-in mechanisms could be envisioned:
> > environment variables, command line switches, special compilation of
> > .pyc which explicitly ask to not check for the hash.
>
> Of those, only the last one is truly viable - the application
> developer isn't necessarily the one choosing to make a sourceless
> module (it could be any library module anywhere in the tree, including
> the CPython standard library -  sometimes that's distributed without
> .py files, to reduce interpreter on-disk size). So what this would
> mean is that a sourceless distro is not simply "delete the .py files
> and stuff keeps working", but "run this script and it'll recompile the
> .py files to stand-alone .pyc files".
>
> As such, I think the idea has merit; but it won't close the backdoor
> that you mentioned (anyone who wants to make that kind of attack would
> simply make a file that's marked as stand-alone). That said, though -
> anyone who can maliciously write to your file system has already won,
> whether they're writing pyc or py files. The only difference is how
> easily it's detected. Fully loading and hashing the .py file seems
> like a paranoia option, and if you want that, just blow away all .pyc
> files, have your PYTHONPATH point to a read-only file system, and
> force the interpreter to compile everything fresh every time.
>
> How does this interact with the __pycache__ directory?
>
> ChrisA
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160814/579c0fc3/attachment.html>


More information about the Python-ideas mailing list