[Distutils] PEP 438, pip and --allow-external (was: "pip: cdecimal an externally hosted file and may be unreliable" from python-dev)

Donald Stufft donald at stufft.io
Fri May 9 23:33:59 CEST 2014


On May 9, 2014, at 1:41 PM, Paul Moore <p.f.moore at gmail.com> wrote:

> On 9 May 2014 16:56, Donald Stufft <donald at stufft.io> wrote:
>> Right, but I think a similar win can be had just by folding —allow-external
>> into —allow-unverifiable and make it —allow-off-pypi (needs a better name,
>> maybe just keep it as --allow-external?). This would effectively mean that
>> an end user cannot say "allow safe downloading X externally but disallow
>> downloading it unsafely externally".
> 
> I still find this hard to understand. If I get what you're saying, you
> would rather have a single flag that claims to be to allow externally
> hosted files to be downloaded, regardless of whether they are safe or
> not than have a clean security model that says you need to opt into
> downloading unverifiable files simply to avoid allowing users to
> download argparse (or any of the other 0.x% of files that are safe but
> external) by default?
> 
> Once again, I'm struggling to see why *safe* externally hosted files
> are such a bad thing.

- Introduces the chance of package specific random failures that the Python
  Infra team have no ability to fix (We have someone on call all the time).

- Makes it harder for people to install some packages in restricted
  environments where they need to ask special permission to add individual
  hosts to a firewall.

- Even when it does work, there is a good chance people who use projects that
  are hosted externally are going to experience slower more latent downloads
  as it's unlikely that they are going to be hosted behind a geo distributed
  CDN like Fastly.

- Makes it harder for people to host their own mirrors of PyPI, if it's hosted
  on PyPI people can legally download it and distribute it however if it's
  hosted externally they may or may not be able to do that. This means that
  people must manually mirror packages that are not hosted on PyPI instead of
  having software like bandersnatch able to handle it all completely.

- Is surprising behavior to most people.

- Is complicated to explain and implement.

- Is useful to practically nobody.

> 
>> I'm normally someone who advocates towards better decisions on the security
>> side of things, however if most people are going to need to use the
>> --allow-unverifiable flag anyways then I think the benefits of having the
>> two separated isn't very large. There is still a benefit to not installing
>> externally hosted things by default which is why I think that just rolling
>> the two options together is better.
> 
> This is what bothers me about your position. I would expect you to be
> insisting that unverifiable downloads *have* to be opt-in, and that's
> why I've never advocated removing or changing the meaning of the
> --allow-unverifiable flag. I agree with that position, and want things
> to stay as they are for unverifiable links. And yet you seem to be in
> favour of diluting that straightforward, strong security message just
> to make users opt into a tiny minority of files that are completely
> safe to download, but which are not hosted on PyPI.

So I do unequivocally believe that unsafe downloads *must* be opt in by
default. 

I also believe that external downloads  *should* be opt in by default.

In the current situation we have two knobs that control these independently.
The paranoid security person in me loves this because it means that for some
set of projects I can still opt in to the reliability hit but not the security
hit. However the UX person in me hates this because more knobs is more
confusion, especially in this situation because the line between what is
external+safe and what is external+unsafe isn’t very easy to explain.

So In my mind I've had to reconcile between these two viewpoints and when I
look at the set of projects which are utilizing the external+safe hosting
option I cannot find anything that tells me that many people are ever going
utilize the external+safe option because the fact is projects simply are not
using it in any meaningful numbers.

There are currently 0.06% (23 total) of projects on PyPI that have *all* of
their files hosted off of PyPI but done so safely. Looking closer at them I can
see that the number that have files that will actually be installed by pip
specifically that number drops down to 0.04% (15).

Originally I had pointed out that 0.2% of projects host *any* files externally
but safely. Looking again closer at them and removing projects which have also
uploaded all files to PyPI, or which the file(s) that are safely hosted
externally are not otherwise suitable I've determined that only 0.08% (32) of
projects which I was able to discover any files for would be helped *in any
way* by the external+safe option. And looking even closer at those, only
0.07% (26) of them will have the outcome of ``pip install whatever`` change
(in other words, the latest version requires external+safe).

So when I look at the data, I cannot make a very good claim to the UX side of
me that external+safe deserves it's own option when the number of projects
which would ever use it instead of external+unsafe is minuscule, and of those
projects I can only point to argparse and mysql-connector-python which are
likely to affect many people at all.

So my beliefs are (in order of priority/conviction):

1. Unsafe downloads *MUST* but opt in.
2. External downloads *SHOULD* be opt in.
3. There is not enough potential users for separate knobs that allow
   external+safe or external+unsafe individually and they should be collapsed
   into a single option.

So following those beliefs lead me to conclude that the best result for these
options are (in order of preference):

A. Collapse the two options into a single option and have it off by default.
   - Satisfies #1 and #2 because they are opt in.
   - Satisfies #3 because users don't have to futz with two different options.
   - Slightly makes me sad that people can't install externally things safely,
     but the UX win makes up for it.
B. Leave the situation as, two options and one off by default.
  - Satisfies #1 and #2 because they are opt in.
  - Throws away #3.
C. Keep the options separate, but enable --allow-all-external by default and
   re-add the --no-allow-external option.
   - Satisfies #1 because it is opt in.
   - Makes #2 sad because it is opt out.
   - Throws away #3
D. Remove the --allow-external family of options and enable them by default and
   always.
   - Satisfies #1 because it is opt in.
   - Throws away #2.
   - Throws away #3.

There are bother benefits to option (A) of course. I'm someone of a public
figure with packaging so people often come to me with questions, concerns,
comments, etc. In that capacity I've had a number of people confused about
the difference between an external file and an unverifiable file. In these
cases I've had some difficulty in explaining what the difference is, especially
since a lot of people have zero idea how the installer API even works. In the
words of the Zen -> "If the implementation is hard to explain, it's a bad idea."
and I can tell you that it is quite difficult to explain it to people. The
rules are:

    If there is a <meta api-version="2"> tag:
        Trust all URLs with rel="internal" on the simple page
        Require --allow-external for any URL on the simple page that links
           directly to something that looks installable to pip and that also
           includes a hash fragment like #<hash_name>=<hash_value> which can
           be either md5, sha1, or in the sha-2 family.
        Require --allow-unverifiable for any URL on that simple page that
           directly links to something that looks installable to pip and that
           does not include a hash fragment with a hash in it. Also any URL
           that can be found by looking for URLs on the simple page that are
           linked with a rel=download or rel=homepage, fetching that page,
           and processing it's HTML looking for direct links to files that
           look installable to pip.
    else:
        Trust all URLs that directly link to a file on the simple page
        Trust all URLs that can be found by looking for links with rel=download
        or rel=homepage, fetching that page, and processing it's HTML looking
        for direct links that look installable to pip.

That's complex, and quite often when I explain that the first response is
"what's a simple index?". Although I mostly only need to explain the first part
and the "else" part I don't because most people don't install from not-PyPI.

On the flip side option (A) allows us to make this much simpler overall. We
can simply do:

    If it's hosted on PyPI:
        Trust it.
    else if it's not hosted on PyPI:
        Require a --allow-external-and-unverifiable [*]

This is *much*, *much*, *much* easier to explain, and I think it may be a good
idea ala the Zen.

> 
> I'm genuinely concerned here that I'm missing a glaringly obvious
> reason why off-PyPI safe files are such a bad thing. You (and Nick,
> and the authors of PEP 438) seem to be willing to accept a lot of
> negative feeling and user unhappiness to defend making pip a
> PyPI-only-by-default tool. I'd much rather that PyPI stand on its own
> merits (which are many and compelling) rather than need a "use us or
> pip will make your life inconvenient" crutch, which is what the
> current behaviour feels like.

Actually my opinion is that allowing external+safe files by default is not
going to have any meaningful impact to *any* (or at the very least, 99.9%) of
pip's users.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20140509/f9237257/attachment-0001.sig>


More information about the Distutils-SIG mailing list