[Python-ideas] add a hash to .pyc to don't mess between .py and .pyc

Victor Stinner victor.stinner at gmail.com
Mon Aug 15 04:31:56 EDT 2016


The purpose of .pyc is to optmize python. With your proposed change, the
number of syscalls is doubled (open, read, close) and you add extra work
(compute hash) when .pyc is used.

If your filesystem works correctly, you should not have to bother.

Victor

Le 15 août 2016 01:06, "Xavier Combelle" <xavier.combelle at gmail.com> a
écrit :

> I have stumbled upon several time with the following problem.
> I delete a module and the .pyc stay around. and by "magic", python still
> use the .pyc
> A similar error happen (but less often) when by some file system
> manipulation the .pyc happen to be
> newer than the .py but correspond to an older version of .py. It is not
> a major problem but it is still an existing problem.
>
> I'm not the first one to have this problem. A stack overflow search lead
> to quite a lot of relevant answers
> http://stackoverflow.com/search?q=old+pyc and google search too
> https://www.google.fr/search?q=old+pyc
> moreover several result of google result in bug tracking of various
> project. (There is also in these result the fact that .pyc
> are stored in VCS repositories but this is another problem not related)
> I even found a blog post using .pyc as a backdoor
> http://secureallthethings.blogspot.fr/2015/11/
> backdooring-python-via-pyc-pi-wa-si_9.html
>
> My idea to kill both bird in one stone would be to add a hash (likely to
> be cryptographic) of the .py file in the .pyc file and read the .py file
> and check the hash
> The additional cost of first startup cost will be just the hash
> calculation which I think is cheap comparing to other factors
> (especially input output)
> The additional second startup cost of a program the main cost will be
> the additional read of .py files and the cheap hash calculations.
>
> I believe the removing of the bugs would worth the performance cost.
>
> I know that some use case makes a use of just using .pyc and not keeping
> .py  around, for example by not distribute the source file.
> But in my vision, this uses case should be solved per opt-in decision
> and not as a default. Several opt-in mechanisms could be envisioned:
> environment variables, command line switches, special compilation of
> .pyc which explicitly ask to not check for the hash.
>
> --
> Xavier
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160815/6d56d2c8/attachment.html>


More information about the Python-ideas mailing list