[Python-ideas] Re: Enhancing Zipapp

Andrew Barnert abarnert at yahoo.com
Wed Jan 8 12:29:51 EST 2020


On Jan 8, 2020, at 01:09, Abdur-Rahmaan Janhangeer <arj.python at gmail.com> wrote:
> 
> But now, a malicious program might try to modify the info file
> and modify the hash. One way to protect even the metadata is
> to hash the entire content
> 
> folder/
>     file.py # we can add those in a folder if needed
>     __main__.py
>    infofile
> 
> Then after zipping it, we hash the zipfile then append the hash to the zip binary
> 
> [zipfile binary][hash value]

How does this solve the problem? A malicious program that could modify the hash inside the info file could even more easily modify the hash at the end of the zip.

Existing systems deal with this by recognizing that you can’t prevent anyone from hashing anything they want, so you either have to store the hashes in a trusted central repo, or (more commonly–there are multiple advantages) sign them with a trustable key. If a malicious app modified the program and modified the hash, it’s going to be a valid hash; there’s nothing you can do about that. But it won’t be the hash in the repo, or it’ll be signed by the untrusted author of the malicious program rather than the trusted author of the app, and that’s why you don’t let it run. And this works just as well for hashes embedded inside an info file inside the zip as for hashes appended to the zip.

And there are advantages to putting the hash inside. For example, if you want to allow downstream packagers or automated systems to add distribution info (this is important if you want to be able to pass a second code signing requirement, e.g., Apple’s, as well as the zipapp one), you just have a list of escape patterns that say which files are allowed to be unhashed. Anything that appears in the info file must match its hash or the archive is invalid. Anything that doesn’t appear in the info file but does match the escape patterns is fine, but if it doesn’t match the escape patterns, the archive is invalid. So now downstream distributors can add extra files that match the escape patterns. (The escape patterns can be configurable—you just need them to be specified by something inside the hash. But you definitely want a default that works 99% of the time, because if developers and packagers have to think it through in every case instead of only in exceptional cases, they’re going to get it wrong, and nobody will have any idea who to trust to get it right.)




More information about the Python-list mailing list