[issue29708] support reproducible Python builds

Brett Cannon report at bugs.python.org
Sun Jan 14 13:19:03 EST 2018


Brett Cannon <brett at python.org> added the comment:

As Eli's comments are coming off as negative to/at me, I feel like I have
to defend myself here. If you look at the commit there was actually two
places where the timestamp was checked; one did an equality comparison and
one did a >= comparison. It's quite possible the semantics accidentally
changed as part of the refactoring due to the check being done in different
places and a different one was copied, although no one has even noticed
until now.

If there is a desire to change the semantics of how timestamps are checked
then that should be done in a separate issue as at this point we have lived
with the current semantics for several releases -- all releases of Python 3
still receiving security updates -- so it's passed being a bug and is now
the semantics in Python 3.

On Sat, Jan 13, 2018, 16:57 Eli Schwartz, <report at bugs.python.org> wrote:

>
> Eli Schwartz <eschwartz93 at gmail.com> added the comment:
>
> So, a couple of things.
>
> It seems to me, that properly supporting SOURCE_DATE_EPOCH means using
> exactly that and nothing else. To that end, I'm not entirely sure why
> things like --clamp-mtime even exist, as the original timestamp of a source
> file doesn't seem to have a lot of utility and it is better to be entirely
> predictable. But I'm not going to argue that, except insomuch as it seems
> IMHO to fit better for python to just keep things simple and override the
> timestamp with the value of SOURCE_DATE_EPOCH
>
> That being said, I see two problems with python implementing something
> analogous to --clamp-mtime rather than just --mtime.
>
>
> 1) Source files are extracted by some build process, and remain untouched.
> Python generates bytecode pinned to the original time, rather than
> SOURCE_DATE_EPOCH. Later, the build process packages those files and
> implements --mtime, not --clamp-mtime. Because Python and the packaging
> software disagree about which one to use, the bytecode fails.
>
> 2) Source files are extracted, and the build process even tosses all
> timestamps to the side of the road, by explicitly `touch`ing all of them to
> the date of SOURCE_DATE_EPOCH just in case. Then for whatever reason
> (distro patches, 2to3, the use of `cp`) the timestamps get updated to
> $currentime. But SOURCE_DATE_EPOCH is in the future, so the timestamps get
> downdated. Python bytecode is generated by emulating --clamp-mtime. The
> build process then uses --mtime to package the files. Again, because Python
> and the packaging software disagree about which one to use, the bytecode
> fails.
>
> Of course, in both those cases, blindly respecting SOURCE_DATE_EPOCH will
> seemingly break everything for people who use --clamp-mtime instead. I'm
> not happy with reproducible-builds.org for allowing either one.
>
> I don't think python should rely on --mtime users manually overriding the
> filesystem metadata of the source files outside of py_compile, as that is a
> hack that I think we'd like to remove if possible... that being said, Arch
> Linux will, on second thought, not be adversely affected even if py_compile
> tries to be clever and emulate --clamp-mtime to decide on its own whether
> to respect SOURCE_DATE_EPOCH.
>
> Likewise, I don't really expect people to try to reproduce builds using a
> future date for SOURCE_DATE_EPOCH. On the other hand, the reproducible
> builds spec doesn't forbid it AFAICT.
>
> But... neither of those mitigations seem "clean" to me, for the reasons
> stated above.
>
> There is something that would solve all these issues, though. From reading
> the importlib code (I haven't actually tried smoketesting actual imports),
> it appears that Python 2 accepts any bytecode that is dated at or later
> than the timestamp of its source .py, while Python 3 requires the
> timestamps to perfectly match. This seems bizarre to behave differently,
> especially as until @bmwiedemann mentioned it on the GitHub PR I blindly
> assumed that Python would not care if your bytecode is somehow dated later
> than your sources. If the user is playing monkey games with mismatched
> source and byte code, while backdating the source code to *trick* the
> interpreter into loading it... let them? They can break their stuff if they
> want to!
>
> On looking through the commit logs, it seems that Python 3 used to do the
> same, until
> https://github.com/python/cpython/commit/61b14251d3a653548f70350acb250cf23b696372
> refactored the general vicinity and modified this behavior without warning.
> In a commit that seems to be designed to do something else entirely. This
> really should have been two separate commits, and modifying the import code
> to more strictly check the timestamp should have come with an explanatory
> justification. Because I cannot think of a good reason for this behavior,
> and the commit isn't giving me an opportunity to understand either. As it
> is, I am completely confused, and have no idea whether this was even
> supposed to be deliberate.
> In hindsight it is certainly preventing nice solutions to supporting
> SOURCE_DATE_EPOCH.
>
> ----------
> nosy: +eschwartz
>
> _______________________________________
> Python tracker <report at bugs.python.org>
> <https://bugs.python.org/issue29708>
> _______________________________________
>

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue29708>
_______________________________________


More information about the Python-bugs-list mailing list