[Distutils] PEP470, backward compat is a ...

holger krekel holger at merlinux.eu
Sun May 18 08:20:21 CEST 2014


On Sat, May 17, 2014 at 20:20 -0400, Donald Stufft wrote:
> On May 17, 2014, at 1:51 PM, holger krekel <holger at merlinux.eu> wrote:
> 
> > On Sat, May 17, 2014 at 11:32 -0400, Donald Stufft wrote:
> >> More conclusions!
> >> 
> >> In that same time period PyPI received a total of ~16463209 hits to a page on
> >> the simple installer API. This means that in total these projects represent
> >> a combined 0.56% of the simple installer traffic on PyPI. However looking at
> >> the numbers you can see that PIL is an obvious outlier with the hits dropping
> >> drastically after that. PIL on it's own represents 0.44% of the hits on PyPI
> >> during that time period leaving only 0.12% for anything not PIL.
> > 
> > So the current numbers roughly mean that around 92193 end-user sites per
> > day depend on crawling currently, right?  Do you know if these are also
> > unique IPs (they might indicate duplicates although companies also have NATting
> > firewalls)?
> > 
> > holger
> 
> Here’s the number of IP addresses that accessed each /simple/ page per day.
> 
> https://gist.github.com/dstufft/347112c3bcc91220e4b2
> 
> Unique IPs: 95541
> Unique IPs for Only Hosted off PyPI: 8248 (8.63%)
> Unique IPs for Only Hosted off PyPI w/o PIL: 2478 (2.59%)
> 
> It's important to remember when looking at these numbers that almost all of
> them represent something downloading a package unsafely which will generally
> contain Python code that they will then be executed. Breaking the unsafe thing
> is, in my opinion, non optional and the only thing needed to be discussed about
> it is how to go about doing it exactly. The safe thing I think *should* be
> removed for the various other reasons that have been outlined and it only
> represents a tiny fraction of uses.
> 
> The numbers to be specific are, 8248 of the above 8248 IPs downloaded something
> unsafely, while 214 of them also downloaded something safely. That means that
> 100% of the 8248 addresses could have been attacked through their use of PyPI
> and only 2.59% downloaded anything that was safely hosted off of PyPI.
> 
> Looking at the same numbers for projects which have *any* files hosted off of
> PyPI (the numbers thus far have been projects which have *only* files hosted
> off of PyPI) I see that 35046 IP addresses accessed a project that had any
> unsafely hosted off of PyPI files while only 2852 IP addresses accessed a
> project that had any safely hosted off of PyPI files.
> 
> That means that roughly a minimum floor of ~36% of the users of PyPI were
> vulnerable to a MITM attack on 2014-05-14 unless they were using pip 1.5
> without any --allow-unverified flags or they were using pip 1.4 with
> --allow-no-insecure and even in that case they could still be vulnerable if
> there is any use of setup_requires. I say that's a minimum because that only
> counts the projects where I happened to find a file hosted unsafely externally.
> It does not count at all any projects which I did not find a file like that but
> which still has locations on their simple page like that. This is especially
> troublesome for projects where they have old domain names in those links that
> point to domains that are no longer registered.
> 
> Also just FYI I've removed pyPDF from both lists as I've contacted the author
> and there are packages now hosted on PyPI for it. I've also contacted PIL and a
> few other authors (of which I've just heard back from cx_Oracle and they appear
> to be willing to upload as well).

Thanks Donald for both the numbers and contacting some key authors which
i think is a very good move!  I suggest to now wait a week or so to see
where we stand then, update the numbers and then try to settle on
crawl-deprecation paths.

Also, let's please just talk about "checksummed" packages or integrity.  
Even all pypi hosted packages are unsafe in the sense that they 
might contain bad code from malicious uploaders or http-interceptors 
that executes on end-user machines during installation.  Thus the term
"safe" is misleading and should not be used when communicating to
end-users.  Currently, we can only say or improve anything related to
integrity: what people download is what was uploaded by whoever happened
to have the credentials (*) or MITM access on http upload.  Speaking of the
latter, maybe we should also think about moving to https uploads and
certificate-pinning, and that also for installers.  And also, as Marius
pointed out, pypi is currently using the relatively weak MD5 hash.

Without resolving these issues we can not even truthfully declare
integrity as something that the pypi-hosted packages themselves are providing.

best,
holger

(*) did you happen to have run some password crackers against
the pypi database?  Might be a larger attack vector than highjacking
DNS entries.



More information about the Distutils-SIG mailing list