[Catalog-sig] hash tags

M.-A. Lemburg mal at egenix.com
Fri Mar 8 14:32:20 CET 2013



On 08.03.2013 14:09, Donald Stufft wrote:
> Accidentally sent this to only MAL so resending!
> 
> On Mar 8, 2013, at 7:50 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:
> 
>> On 08.03.2013 13:15, Christian Heimes wrote:
>>> Am 08.03.2013 12:49, schrieb M.-A. Lemburg:
>>>> Together with the added hash tag on the download file URLs (*),
>>>> this would solve the availability and the security aspects.
>>>> Instead of deprecating external links altogether, we could then
>>>> deprecate non-compliant download links and get an overall
>>>> very flexible system for Python package distribution.
>>>>
>>>> (*) Yes, I know, I still have to deliver the updated proposal -
>>>> been working on getting our indexes ready to serve as example :-)
>>>
>>> How does your proposal look like? 
>>
>> Here's the first version with the basic idea:
>>
>> http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal
>>
>> After the feedback I got from Holger and Phillip, I'm currently
>> writing a new version, which drops some of the unneeded
>> requirements and spells out a few more things.
>>
>> Here's a very short version...
>>
>> Installers are modified:
>>
>> * to only follow rel="download" links from the /simple/ index page,
>>  which have a hash tag (e.g. #md5=…)
> 
> Sounds like a pretty serious break in backwards compat. Only 29 releases out of 144493 currently have a #md5= in their download_url. Either PyPI will be expected to download url and compute a hash (DoS vector, will need to be coded properly) which is error prone and is likely to break in non obvious ways for maintainers.
> 
> While I'm obviously not against breaking backwards compatibility, I think if we're going to do that we might as well go whole hog and kill external links completely.

This was just the main new download theme. If the new scheme
doesn't work, they should revert back to the old scheme,
after a BIG warning the user.

Later on they could switch to requiring users to use an
option to reenable the old scheme.

In any case, I'll have to put all this into proper words and
will then post it for another review cycle.

>> * will only use the fetched download page if its contents match
>>  the hash tag
>> * scan that page for rel="download" links, which again have to
>>  have a hash tag to be taken into account
>> * only install files for which the hash tag matches the
>>  downloaded content
>>
>> This should provide a good way to make sure that the downloaded
>> files are indeed under control of the package maintainer.
>>
>> So far the only practical problem I've found with the approach
>> is that the download page may not contain dynamic data, e.g.
>> a date or timestamp, since that causes the hash tag not to
>> verify.
>>
>> The package maintainer will also have to reregister the
>> package whenever changes to the download page are made -
>> but that's actually intended :-)
>>
>>> I like to propose query string-like
>>> key/value pairs. key/value pairs are more flexible and allow us to
>>> add/remove new information in the future.
>>
>> Good idea. I'll add that as extension mechanism.
>>
>>> I also propose that we add the file size in octets (bytes with 8bits in
>>> each byte) to the fragment identifier. File size validation prohibits
>>> e.g. length extension attacks. It is useful to download tools. I know
>>> that HTTP servers usually set a Content-Length header for static files.
>>> But the header is set by the CDN while the information in the fragment
>>> identifier shall come from PyPI's internal database.
>>>
>>> Example:
>>>
>>> defusedxml-0.4.tar.gz#md5=09873c31ce773d48b8a4759571655a2c&sha1=33821e6891e3fc3829f5a238a93490f939533d62&octets=48324
>>
>> Minor nit: s/octets/size
>>
>> We could probably even add GPG sigs to the link.
>>
>> The only problem with the extension mechanism is that the currently
>> available installers only support "#md5=…".
> 
> pip works just fine with any of the algorithms from hashlib. The installers all
> also support #egg=, and there might be some others I can't recall offhand.

Ah, good to know. Thanks.

>>
>> Perhaps there's some way to trick them into still working with
>> the query-style fragment links ?!

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 07 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Catalog-SIG mailing list