[Catalog-sig] an immutable mirror of PyPI

Sat Jul 16 12:54:53 CEST 2011

On 07/16/2011 01:08 AM, Ben Finney wrote:
> Martijn Faassen<faassen at startifact.com>  writes:
>
>> I don't work in a vacuum. I share code with others. This code has
>> dependencies on other code. So how do people obtain this other code?
>
> By depending on other code, you have a choice to make: you either take
> the maintenance burden on yourself, or you delegate the maintenance
> burden (usually to the developers of that code).
>
> By delegating the maintenance burden of that code elsewhere, that
> entails delegating the responsibility for future availability of that
> code.

There is maintenance burden and there is the package actually existing 
for download. When I depend on Foo 1.1, I am not delegating maintenance 
burden to the original developer, unless I go and ask questions about 
Foo 1.1. The answer can then be: Foo 1.1 is not maintained, sorry. Only 
when I am interested in upgrades does the original developer come in again.

I don't see why these two should be the same: the future availability of 
an existing release of a package is not identical to continued 
development of that code.

>> PyPI I thought was among other things central place where people can
>> download and install packages from so that they can resolve
>> dependencies, but you seem to be arguing against doing that.
>
> I find it strange that I'm defending PyPI in this instance, since I am
> quite sympathetic to complaints that it has poor policies on package
> availability and many other complaints.
>
> But you seem to expect that PyPI must guarantee that any package version
> ever available will be available forever. That's not reasonable, I
> think.

I am not barging in here with expectations. I'm coming in here with use 
cases and proposals. It seems my use cases are rejected as goals of 
PyPI. In that case I want to get a better understanding of the goals of 
PyPI.

You say that the goal of perpetual availability of packages is an 
unreasonable goal of PyPI or related services. You don't seem to explain 
why.

So I have use cases: I can release code that relies on releases that can 
disappear or can be replaced. I think this is bad for repeatability and 
security. I'd like to see some improvements made. How would we make 
these improvements? I've so far proposed three ideas:

* PyPI not throwing away things after a grace period. Almost universally 
rejected idea

* an additional service, a mirror, that offers some repeatability 
guarantees. Removal would need to go through channels, implying some 
kind of custodianship I think people here are wary about.

* better communication channels: a list of what's been removed, a list 
of what's been deprecated. I can then write tools that help me maintain 
my projects. It's not the same as the above ideas: old projects can 
still break at the whim of people whose code I depend on, but it'll at 
least help manage this issue.

But perhaps you have better ideas on how to better help manage this.

I am getting a bit tired of hearing "you can do this yourself", as this 
ties into to heart of collaboration, and PyPI if anything is at least 
supports collaboration.

> Instead, you need to choose packages considering whether you trust the
> package to remain available, which is a social issue between you and the
> people developing that work.

> If you think there is a significant risk the people responsible for that
> package will remove a version on which you depend from PyPI, you should
> engage in dialogue with those people to resolve that.

And how exactly am I supposed to read people's minds, possibly years 
into the future? I had absolutely no expectation that this would happen 
with the release that disappeared on over a month ago. The developer one 
day just decided to clean up old, unsupported releases. Of course I 
contacted the developer after it happened. Several others did too. I 
then started thinking about how to reduce this risk in a more broad sense.

> I don't think PyPI has any business requiring package developers to keep
> a version available at PyPI beyond when they want it available there.
> The risks inherent in that need to be addressed as a social issue, not a
> technical limitation.

Yes, this is a social issue. But tools can support social issues. If 
people tell me to keep my own private mirror, that's a tool solution 
too, but not a very social one.

>> At most it's some kind of showcase for packages that peoples should
>> take into their consideration. Taking this point to the extreme, it's
>> *never* something that you can automate downloading from.
>
> There are points that can be made toward that view; but I don't find
> this specific case (wanting guaranteed availability of every version
> forever at PyPI) supports it.
>
>> Instead you should be giving a giant tarball of packages to everybody,
>> always, if they use your code at all.
>
> This is indeed a terrible option, and I lament it whenever I see it.
>
> I prefer supporting the efforts of those who *do* provide reasonable
> guarantees of package selection and availability and integration
> testing. We call them “operating system distributions”.

The requirements for developers concerning library availability are not 
identical to those of users. Operating system distributions focus on a 
stable platform for users. Some developers need to develop 
cross-platform code. Some developers need to develop different versions 
of a project, or different project that rely on different versions of 
dependencies. Some developers need to depend on libraries or library 
versions not (yet) available in distributions.

These developers effectively create a stable distribution of 
dependencies that they have tested together. It's useful to have tools 
to support this and allow these developers to share their code with others.

Regards,

Martijn