[Python-ideas] add a hash to .pyc to don't mess between .py and .pyc

Sun Aug 14 22:56:11 EDT 2016

On Sun, Aug 14, 2016 at 9:35 PM, Xavier Combelle <xavier.combelle at gmail.com>
wrote:

> On 15/08/2016 02:45, Wes Turner wrote:
> >
> > You can add a `make clean` build step:
> >
> >   pyclean:
> >       find . -name '*.pyc' -delete
> >
> > You can delete all .pyc files
> >
> > - $ find . -name '*.pyc' -delete
> > - http://manpages.ubuntu.com/manpages/precise/man1/pyclean.1.html
> > #.pyc, .pyo
> >
> > You can rebuild all .pyc files (for a given directory):
> >
> > - $ python -m compileall -h
> > - https://docs.python.org/2/library/compileall.html
> > - https://docs.python.org/3/library/compileall.html
> >
> >
> >
> > You can, instead of building .pyc, build .pyo
> >
> > - https://docs.python.org/2/using/cmdline.html#envvar-PYTHONOPTIMIZE
> > - https://docs.python.org/2/using/cmdline.html#cmdoption-O
> >
> > You can not write .pyc or .pyo w/ PYTHONDONTWRITEBYTECODE / -B
> >
> > - https://docs.python.org/2/using/cmdline.html#envvar-
> PYTHONDONTWRITEBYTECODE
> > - https://docs.python.org/2/using/cmdline.html#cmdoption-B
> > - If the files exist though,
> >   - https://docs.python.org/3/reference/import.html
> >
> > You can build a PEX (which rebuilds .pyc files) and test/deploy that:
> >
> > - https://github.com/pantsbuild/pex#integrating-pex-into-your-workflow
> > - https://pantsbuild.github.io/python-readme.html#more-about-
> python-tests
> >
> > How .pyc files currently work:
> >
> > - http://nedbatchelder.com/blog/200804/the_structure_of_pyc_files.html
> > - https://www.python.org/dev/peps/pep-3147/#flow-chart (*.pyc ->
> > ./__pycache__)
> > - http://raulcd.com/how-python-caches-compiled-bytecode.html
> >
> > You could add a hash of the .py source file in the header of the
> > .pyc/.pyo object (as proposed)
> >
> > - The overhead of this hashing would be a significant performance
> > regression
> > - Instead, today, the build step can just pyclean or build a
> > .zip/.WHL/.PEX which is expected to be a fresh build
> >
> The problem is not the option of you have to prevent the problem, the
> simplest way being
> to delete the .pyc file, It is easy to do once you spot it. The problem
> is that it randomly happen in
> normal workflow.
>

IIUC, the timestamp in the .pyc header is designed to prevent this
ocurrence?

Reasons that the modification timestamp comparison could be off:

- Time change
  - Daylight savings time
  - NTP drift adjustment?

> To have an idea of the overhead of the whole hashing procedure I run the
> following script
>
> import sys
>
> from time import time
> from zlib import adler32 as h
> t2 =time()
> import decimal
> print(decimal.__file__)
> c1 = time()-t2
> t1=time()
> r=h(open(decimal.__file__,'rb').read())
> c2= time()-t1
> print(c2,c1,c2/c1)
>
> decimal was chosen because it was the biggest file of the standard library.
> on 20 runs, the overhead was always between 1% and 1.5%
> So yes the overhead on the import process is measurable but very small.
> By consequence, I would not call it significant.
> Moreover the import process is only a part (and not the biggest one) of
> a whole.
>

I agree that 1 to 1.5% is not significant.

> At the difference of my first mail I now consider only a non
> cryptographic hash/checksum
> as the only aim is to prevent accidental unmatch between .pyc and .py file.
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160814/30874044/attachment.html>