[Catalog-sig] A 90% Solution

M.-A. Lemburg mal at egenix.com
Tue Mar 12 10:50:23 CET 2013


On 12.03.2013 10:20, Jesse Noller wrote:
> 
> 
> On Mar 12, 2013, at 3:57 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:
> 
>> On 12.03.2013 03:46, PJ Eby wrote:
>>> On Mon, Mar 11, 2013 at 8:28 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>> On 12.03.2013 00:39, Donald Stufft wrote:
>>>>>
>>>>> On Mar 11, 2013, at 7:04 PM, PJ Eby <pje at telecommunity.com> wrote:
>>>>>
>>>>>> Just a thought, but...
>>>>>>
>>>>>> If 90% of PyPI projects do not have any external files to download,
>>>>>> then, wouldn't it make sense to:
>>>>>
>>>>> To be accurate it's 90% don't have any files/release available *only* externally. Most have external  files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't.
>>>>
>>>> How are you going to verify that disabling the links
>>>> on those projects won't make certain release versions of
>>>> those packages unavailable for pip/easy_install ?
>>>
>>> I'm not sure if you're asking Donald or me here. 
>>
>> I was asking Donald, since he came up with the list. Given that
>> he was using the pip PackageFinder, it is not clear whether this
>> actually covers all easy_install'able packages as well (most likely
>> not, since pip doesn't support e.g. egg files).
>>
>>> My proposal was to
>>> only automatically disable the rel attributes for links to pages that
>>> do *not* contain any easy_install or pip-able download links.  So, by
>>> definition, this would not make any releases unavailable.
>>
>> Ok.
>>
>>> As for what Donald is proposing, I honestly have no idea what he's
>>> talking about, or whether the 90% statistic actually applies for what
>>> I'm proposing.
>>>
>>> So it's possible that it might be a lot less than 90% that my proposal
>>> would be able to affect *instantly*, without contacting any authors.
>>
>> We'd still need to inform authors that we changed a setting
>> in their package, since they may want to use the feature
>> to host packages or releases off-PyPI again in the future.
>>
>>>> How are you planing to inform the package authors of that
>>>> change, so that they can take corrective action ?
>>>>
>>>> Which options would be available for authors ?
>>>
>>> Do see my proposal again, which was simply that there be a switch to
>>> enable or disable the rel attributes, that it default off for new
>>> packages, and be switched to off for exactly that set of packages
>>> which would not result in the loss of access to any download files.
>>
>> Yes, I saw that, but was putting up the questions in the context
>> of Donald's idea to remove the links altogether.
>>
>>> There is, at this point, the question of how to handle projects that
>>> have some of their releases hosted externally, or with some of the
>>> files external and some not.  I would prefer that any automated
>>> changeover apply only to packages where the set of discoverable links
>>> is exactly equal to the links found on the project's /simple page.
>>
>> That would be safer, yes.
>>
>>>> Regarding the links, it's probably better to not
>>>> remove the rel="" attributes but instead change them
>>>> from rel="download" to e.g. rel="external-download";
>>>> or to keep the old index semantics around as /simple-v1/.
>>>> This keeps the valuable semantic relation available for
>>>> tools that want to use it.
>>>
>>> For what?  If you must keep them, rel="disabled-homepage" etc. would
>>> get the message across.  But I really don't see the point, and I
>>> *invented* the bloody things.
>>
>> True, but they are now part of the PyPI API and thus cannot be
>> changed or removed easily.
>>
>> The rel="" attributes provide extra information to tools
>> using the /simple/ index as (static) API and losing such
>> information would break the API.
>>
>> You're only thinking about installers using the /simple/
>> API, but there may very well also be e.g. researchers interested
>> in scanning the index for homepages to find out where Python
>> software lives, how the community is connected, which
>> preferences for hosting and developing Python software
>> there are, etc.
>>
>> That's a different context and in that context, the rel=""
>> attributes play a different role.
>>
>> Removing them would make such research impossible to implement
>> using the /simple/ index and researchers would have to either go
>> with the XML-RPC API (which is slow compared to /simple/, puts a
>> lot of load on the PyPI server and cannot be placed on a CDN)
>> or revert to the old-style scanning of the PyPI package pages.
>>
> 
> So because of hypothetical researchers we can't make the system better.

Of course we can, but just like with Python itself, we have to
pay attention to backwards compatibility.

Not hard to do: we'd just need to keep the old index in place
using a different URL, e.g. /simple-v1/.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 12 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Catalog-SIG mailing list