[Catalog-sig] [Draft] Package signing and verification process

Wed Feb 6 21:38:59 CET 2013

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

W dniu 06.02.2013 19:05, Giovanni Bajo pisze:

> Il giorno 06/feb/2013, alle ore 18:20, Zygmunt Krynicki
> <zygmunt.krynicki at canonical.com> ha scritto:

Meta note: I suspect we've covered enough ground here to focus on some
proof-of-concept implementation. Just talking will not help us or
everyone else. I suspect we can agree to disagree on some topics. I
also want to re-affirm that I will gladly take _all_ of the general
security features that were discussed in this thread (like https for
pypi, some keyring / key id managed by pypi so that I can verify stuff
to some degree if I choose to do so).

>>> Can you please describe the UX you are devising? Say I start
>>> blank, I want to install 3-4 packages that globally bring
>>> another 10 packages as dependencies (made by different
>>> developers). What would be your suggested workflow? What should
>>> an user do?
>> 
>> You would first download django (either signed or not) and get 
>> prompted if you want to trust the signer for that project (or if
>> the file was not signed, to trust this particular file for django
>> in the future).
>> 
>> As you go you would see many prompts like that, roughly identical
>> to what you get when SSH-ing to a new system.
>> 
>> Note: At each step you could stop to manually examine the
>> freshly downloaded file. The user interface should be brief but
>> to the point.
> 
> You are not describing what an user should do when asked about this
> confirmation. For instance:
> 
> $  bin/pip install geventconnpool Downloading/unpacking
> geventconnpool Downloading geventconnpool-0.2.tar.gz WARNING: the
> package was signed with the following GPG key Giovanni Bajo
> <rasky at develer.com> E870D9A369B8348A Do you want to continue and
> trust this key (y/n)?

Sorry, I implicitly thought of the distrust example I wrote elsewhere.
Here's a copy of that:

$ distrust trust person "Bob Developer <bob at example.com>" with project
"pypi:useful-tool"
[distrust] Looking up "Bob Developer <bob at example.com>"
[distrust] Found 1 personality:
[distrust] pub   2048R/FA519244 2013-02-03 [expires: 2015-02-03]
[distrust] uid                  Bob Developer <bob at example.com>
[distrust] sub   2048R/A63BAB03 2013-02-03 [expires: 2015-02-03]
[distrust] Are you sure you want to trust:
[distrust] "Bob Developer <bob at example.com>" with project
"pypi:useful-tool"?
[distrust] [yes or no] > yes
[distrust] Creating signed trust record...
// GPG invoked to sign the record
[distrust] Adding trust record to the database...
[disturst] Done

Distrust (the executable) would be invoked by pip (or any other tool)
to "check" each download.

> What should an user do now?

The user may either:

1) Trust the developer and blindly accept. This is not so bad as we at
least know that that particular developer creates the given package
and will keep trusting him with that package until a malware outbreak
forces us to revoke that trust. (this is obviously not true if we are
installing the package for the first time and are being attacked at
the same time via MITM)

2) Inspect the package and make an educated decision or seek qualified
help. Ironically this is the best choice for security.

3) Seek peer knowledge, perhaps among their colleagues or friends,
online etc. This may allow the user to connect their trust chain to a
peer and grow the subset of software that can be installed without prompt.

4) Use some trust chain mandated by their organization (it is an
interesting use case that I just though of) where the administrator
can centrally manage trusted pip packages that users/developers
frequently install.

> Most users will just tap "yes" to get on with their task and ignore
> this prompt.

I have no solution to that. Ignorance is not something that can be
fixed with technology. Luckily the python community is so far rather
safe in their niche so attacks are rare (I've yet to hear about one)

>> The most important aspect is what happens the second time you
>> install those packages -- you never get prompted (unless a
>> package was unsigned and got updated in the mean time).
>> 
>> I realize this interface is not perfect but it solves practically
>> all of the current issues. Most importantly it can be applied to
>> all existing software today, so we get the benefits without
>> asking everyone to fix their story.
> 
> I disagree. You are asking users to manually go verifying a GPG
> fingerprint (actually, many more than one, in case of dependencies)
> without giving any indication on how to do it. What will happen is
> that they will just tap "yes" without confirming, making the whole
> security castle invalid.

It is important to point out that I don't require the user to verify
the fingerprint. The system will just remember which fingerprint was
initially trusted by the user for a particular package. This is far
easier to do in practice as the person can be long gone from this
world but their signature on a package (or the actual exact released
package) still be valid and trustworthy.

You are right that the "next, next, next" risk is real but it all
depends on wording and some semi-usable interface. We should encourage
users to verify that software is not malware and share that database.

> Compared to SSH, there are many important differences here:
> 
> 1) The number of servers you need to SSH into is far more limited,
> and they are all entities with which you have a direct connection.
> It might be a server of your company, a VM you rented, a friend's
> computer, a website like github. In all of these cases, you have a
> clear, direct communication channel already in place in case you
> want to be paranoid and double-check the SSH host key. When you
> install a package from PyPI you have zero connection with the
> author/maintainers; in fact, that packages might bring in
> dependencies you don't even know *what* they are used for, let
> alone who maintain them. There is no clear path here to check a GPG
> fingerprint, and one user might have to check 5-10 of them just to
> install a framework to hack away a simple program.

I suspect the actual use case for pip depends on the audience and that
my particular usage is not representative or may differ widely from
other people. In my usage I would typically install less than 20
"root" packages and would be fine to manually look at their code the
first time I install them (if I don't know the upstream author or the
code is not signed). Given that this is one time job, it is
interesting (for me) and I can share the result (by offering my signed
trust records to everyone else, particularly my peers / colleagues / etc).

Still I agree that not everyone may find that acceptable.

> 2) A SSH host key on a server is almost never updated, so
> whitelisting it after the first time is a good compromise because
> it is unlikely that you're being MITM the first time you connect it
> (let's say, far more unlikely than being MITM'd in one of all the
> other times you connect to it). The only real reason to update it
> is in case of a compromise, or when the IP address changes (in that
> case the host key is technically not changed, but SSH will still
> bail). On the contrary, the GPG key used to sign a package might
> change more often (not *daily*, but still more often than a SSH
> hostkey): different maintainers might have different keys (and all
> of them might be valid for that package); moreover, maintainers
> change over time for the same packages. You have absolutely no side
> channel to double-check this.
> 
> 3) SSH has a per-usage storage for such a whitelist
> (~/.ssh/known_hosts). On the contrary, it is common to have a
> different pip installation (even different pip versions!) per each
> virtualenv, and virtualenv is usually thought to be sort-of a
> chroot for pip. That might bring to the point where the known_hosts
> for pip would be per-virtualenv; but that would be a disaster for
> the UX, because it would mean that one would have to re-approve GPG
> fingerprints per each virtualenv (or copy known_hosts between
> different virtualenvs).

I strongly believe that a user should have only one trust database per
system and a easy way to replicate that to other systems. Otherwise
the system would be far to tedious to be practical.

> NOTE: I would like to stress that also the solution in which you
> trust PyPI "solves practically all of the current issues". It is
> just that you need to trust PyPI. I don't think this classifies as
> an issue per-se. As I said, I think that most people implicitly
> trust PyPI if they "go there" to download a file. I understand that
> you're striving to remove this trust, but I don't think it's an
> issue.

As noted earlier, it only solves the issue where you trust the
software anyway and for that https is probably just as good without
all the hassle. It's not exactly as good but the attack surface is
mostly limited to people that can exploit the pypi archive or
malicious administrators.

Pypi is not a trust source of _any_ kind unless we want to react to
malware by taking things down. Thus becoming a gatekeeper.

> The amount of code effort required to automatize the above features
> in a way to make them usable by users that don't even have a GPG
> key in their keychain (as a normal pip user is) is quite big.

I agree this is not easy. I cannot know who the typical pip user is,
barring any stats. Of _all_ the developers that I know only one does
not use GPG or have their private key. Still, my usage is probably not
representative.

>>> I think that what we are suggesting here instead is a good 
>>> compromise between security and usability, just like the
>>> current CAs in SSL have been for many years a good compromise
>>> between the final solution (yet to be devised) and using only
>>> self-signed certificates. In fact, CAs are still a very good
>>> compromise that

If javascript ran as your user on your local account, without any
sandbox, would you be still browsing the web?

If pypi grows out of the niche that it currently is the security
system, as described here, instantly fall apart (full of signed
spyware, malware and trojan software). I hope that never happens.

>>> work for 99.9999% of people and websites. I understand that we
>>> will need a final solution for SSL, but I think that for PyPI,
>>> forcing your suggestion is basically an instance of making
>>> perfect the enemy of good.
>> 
>> I agree that the proposed solutions protect against many class
>> of attacks and I will welcome them with open arms.
>> 
>> In any case, nothing here prevents both approaches from being 
>> developed. When combined (if someone chooses to do so) they would
>> give even stronger protection.
> 
> My suggestion at this point is to have an option to remove trust on
> PyPI. This means that pip will not ask PyPI for a list of
> per-package trusted GPG fingerprints, and will just bail if the GPG
> fingerprint is not mentioned in its configuration file.

I lost track of multiple discussions and don't know which
configuration file you are referring to here.

Thanks for the discussion.

I'll try to devote a few hours to finishing alpha version of distrust
this week (busy busy week, sorry, I really wanted to do this earlier).

Actually getting feedback on how distrust can be improved, integrated
(optionally) with pip (actually probably distribute/setuptools more
than pip) will allow us to progress.

Best regards
ZK

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJREr9jAAoJECiU6TooxntHrbQP/1Z6C6uVz6TZSWmrzoTL2zzi
du9byjYEFF05DZXlDEY7AmQ6lA8d+CAseKbrU5CzPqyHvDwGbGO5P49Xb8qmMfbO
+gangssMUEOr1V67aaolHS5t67MavUjImXkHVjx7iAThTgBcNpPnQiBrKsGrgPZQ
hwFWB0F3ozzvKPO3zDv0jcX/XSTwd8cBG1eaxd8OXEDxQ885B0t7dIANse6QmyAw
yvmEentLqfvFilWjyH5aKpoGjCskfRPEetUyqZlAwwg9pbcezwyqklxgZeRGmchF
AQ2Yx7f53e5jc8p4W6yVuvGRqyNhXWAyipw0/bNnuuX7SpeQEz5KPiAiSqvbQZ10
09kGnVG0OhtB8yb8EmKqhivA6XWXyMZtk19D1mmWUsbx80K63n96csw7sY2gEmxk
ExT3xWX1UNkTcg1uSVn8GAJ9uRkivjMC1AzIOz+ffYJQp6aeAd7+5zrn4ffyAxas
InshN7QFQbhlwEaBl2Pl+yoB1DVNXBijORL+ClaPkWjz8Iq2eOfi+XS7Ue4Gfs/K
4Ia2Ntk+EGAB3ZeSZugb6lZuJO1S80Gl2jz3ISPSX2Ub88ZEAuU5rdKein0EOmqb
Cd8huwqFvrDDTLujIlF5pmbMVIe9LIsJhAG48PNOOQnLM0DS+CvUzkJoGJf4yLtM
xE3JEjzaklJhH+uRf8pD
=Y8Xs
-----END PGP SIGNATURE-----