[Catalog-sig] Deprecate External Links

Aaron Meurer asmeurer at gmail.com
Thu Feb 28 00:16:05 CET 2013


On Wed, Feb 27, 2013 at 2:31 PM, PJ Eby <pje at telecommunity.com> wrote:
> On Wed, Feb 27, 2013 at 4:04 PM, Lennart Regebro <regebro at gmail.com> wrote:
>> On Wed, Feb 27, 2013 at 8:49 PM, Monty Taylor <mordred at inaugust.com> wrote:
>>>> But wouldn't this only be a change in pip/easy_install, not PyPI
>>>> itself? I suppose you could explicitly break the external links by
>>>> having them point to nothing if you are worried about the security or
>>>> if it's some performance issue (that would indeed be a bad
>>>> compatibility break, in case people are using those for other
>>>> purposes).  Otherwise, if it's a problem, then just use the old
>>>> version of pip.
>>>
>>> If we don't remove the feature from pypi itself
>>
>> It isn't a feature of PyPI. PyPI doesn't require you to upload the
>> files to PyPI. For that reason, easy_install and PIP will scrape
>> external sites to be able to download the files.
>>
>> What we should do is agree that this should stop,
>
> So far, I don't think anybody's talking to the right "we" for stopping
> it.  It's the tools that control this, not PyPI.  (PyPI can't actually
> stop the tools from using this information without also making itself
> a lot less useful to *humans* at the same time.)
>
> As far as my personal position on the matter, I think that it's
> reasonable to deprecate the scraping of home page and download links.
> As somebody pointed out, expired domains are a potentially nasty
> problem there.
>
> OTOH, I currently make development snapshots of setuptools and other
> projects available by dumping them in a directory that's used as an
> external download URL.  Replacing that would be a PITA because PyPI
> only lets you upload and register new releases from distutils' command
> line.  Basically, I'd need to use a download link that pointed to a
> "latest" URL that redirected to the final download.
>
> Anyway, I'm not seeing much discussion here about how to help authors
> make changes to their release processes.  Note that many popular and
> long-lived projects (pywin32, PIL, etc.) have similar issues.  (Not to
> mention the newer projects that host directly from revision control.)

As far as I'm concerned, this is all about helping package
maintainers.  The way pip works now, every time I do a release
candidate, pip automatically installs it, even though I only upload it
to Google Code.  I don't want it to do this, but the only way around
it would be either 1. give it some weird name so that pip doesn't
think it is newer 2. upload it somewhere else or 3. go in to PyPI and
remove all mentions of Google Code from the index.

And by the way, this hasn't been mentioned, but I really mean *all*
mentions of Google Code on PyPI.  pip crawls Google Code not just
because Google Code listed as an official site for my package or
because the latest release is there, but because a single old release
points there.  So to get pip to not crawl there, I would have to go
through and remove all old mentions of Google Code, even from releases
that were made in 2006.  So you can see why the expired domain
scenario is a very real issue. And combined with the fact that
everyone uses pip with sudo that was discussed on this list a while
back, you have a hackers dream for installing malicious code on
everyone's computers.

I also had the issue where pip was trying to install our
documentation, because I named it sympy-0.7.1-doc, which it thought
was newer than sympy-0.7.1.  Again, I only uploaded that file to
Google Code, not PyPI.

And currently we have the issue where it tries to install the Python 2
tarball in Python 3, which is partially related to all this (it's all
part of the "gathering metadata from the filename instead of the PyPI
classifiers").  If we require that people upload files, we can
additionally only gather metadata from classifiers.  If pip installs
Python 2 code in Python 3, the solution isn't to try to trick it by
some filename mangling (which won't work in easy_install, but oh
well), but rather, just set the classifier for the download like you
were supposed to in the first place, and it will just work.  With this
change if I (the package maintainer) do the right thing, pip does the
right thing.  The way it is now, if I do the right thing, pip does the
wrong thing, and to make pip do the right thing, I have to trick it
into do so.  So for me at least, the "change to the release process"
is "stop wasting my time figuring out how to trick pip, and just do
things according to the PyPI classifier API (which I'm already doing
anyway, just pip ignores it), and everything will work".

Aaron Meurer

>
> Given that easy_install was deliberately designed so that those guys
> would *not* need to change their hosting strategies to get automated
> downloads, I'd like to see more talk about how we're going to help
> people change their releasing and hosting strategies.
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


More information about the Catalog-SIG mailing list