[Distutils] a plea for backward-compatibility / smooth transitions (was: Re: Migrating Hashes from MD5 to SHA256)

Donald Stufft donald at stufft.io
Mon Jul 29 20:30:42 CEST 2013


On Jul 29, 2013, at 1:28 PM, holger krekel <holger at merlinux.eu> wrote:

> On Mon, Jul 29, 2013 at 10:30 -0400, Donald Stufft wrote:
>> On Jul 29, 2013, at 7:58 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>>> 
>>>> Actually, i strongly object further backward-incompatible changes.
>>>> 
>>>> Please (generally) find a way to introduce improvements without breaking
>>>> existing installation processes at the same time.
>>>> 
>>>> For example, in this case pip/easy_install could indicate to PYPI what
>>>> kind of hashes it accepts (through a header or query param or whatever)
>>>> and PyPI could serve it but we'd default to MD5 for now if nothing else
>>>> was requested.  Please also consider the PEP438 vetted registration of
>>>> externals+hashses in this context.  Once things and tools are working
>>>> nicely we can switch to serving a non-MD5 hash as default after a
>>>> sufficient grace period.
>>> 
>>> Having the improved hashes be opt-in (by the client) strikes me as a
>>> reasonable request.
>>> 
>>> Yes, this means nothing will actually happen until easy_install/pip
>>> are updated to request those improved hashes and those versions see
>>> significant uptake, but as Holger says, we need to ensure we put
>>> sufficient effort into smoothing out the roller coaster ride that has
>>> been the recent experience of packaging system users.
>> 
>> There's basically zero way for this to fail closed in any of the
>> current installers. The failure mode is unverified packages not
>> uninstallable packages. I am not aware of a single installer that
>> mandates the use of a hash. Crate.io has never used md5 hashes and has
>> always used sha256 and I've never received a single report of an
>> installer being unable to install because of it, which is exactly what
>> I expect.
> 
> So you think the worst case for forcing SHA256 hashes is that installers
> who don't yet support sha256 hashes would just ignore it (and thus wouldn't
> do hash verification)?

Yes. I've been using sha256 on simple.crate.io for over a year and zero people have
ever stated it didn't work for them. This also fits in with my knowledge of how
setuptools and pip works. I know zc.buildout less well but to my knowledge
they simple allow setuptools to handle the downloading.

> 
>> Indicating via Header or query param pretty much destroys the
>> effectiveness of the CDN's cache in order to fix a problem with a
>> theoretical (as far as I am aware) installer that requires a md5 hash
>> (and thus has never worked for any of the externally hosted packages.
>> Additionally it doesn't account for external urls which need to be
>> registered *with* a hash.
> 
> Currently there is no hash-type enforcement on registered externals, is there?

Registered externals must register with a md5 hash, scraped links and download
urls etc do not require it because they are indirectly added. There is no
verification by PyPI that the given hash matches the package at the
end of the url.

> 
>> As far as available attacks, *today* an author could upload a package
>> that has been created so as to have a sister package with malicious
>> code that has the same hash allowing them to have a malicious package
>> they can substitute at will without the hashes changing at all. In the
>> future it's possible that a pre-image attack on MD5 will be found and
>> then we'll be dealing with this problem then when we've lost all
>> verification on external urls instead of now when we have time to get
>> external urls to switch.
> 
> So the attack is a malicious author or someone else modifying an external
> release file (either directly on the server or via MITM) while maintaining
> the pre-registered MD5 hash, right?
> 
> I am currently merely trying to understand more exactly what
> you are worried about.

For any hash function there are two major types of attacks you worry about. The
first is a collision attack which is the ability to generate two arbitrary inputs that
hash to the same thing. The second is a pre-image attack (either first or second
pre-image) which essentially means given an already existing input generate
another input that hashes to the same thing. So basically the difference between
the two attacks are wether you have a hash you're trying to match or if you
just need two inputs that hash to the same thing.

MD5 is currently broken for collision resistance. This means that an author can
generate two packages that hash to the same thing. Once package might be
benign and one might be malicious. Given those two packages people using
the md5 hashes will not be able to differentiate between the benign and the
malicous package.

MD5 is currently *not* broken for pre image resistance. This means that as of
right now someone can not take an already existing package on PyPI and generate
a second package that hashes to the same thing (besides via brute forcing).

So right now, collision attacks possible == yes, pre image attacks possible == no.

However designing secure systems is a practice of building in safety margins. If
someone, for instance, can break 5 rounds of a function you use 15 rounds. With
cryptographic hashes collision attacks are easier than pre-image attacks, so if you
have two functions, one that has a collision attack and one that doesn't you can
generally assume that the one without a collision attack is stronger and has a
longer shelf life.

So the problem with MD5 (ignoring for a second the fact that a collision attack can
be bad on it's own) is that there are no more safety nets. If it gets broken for a pre-image
then there's not likely to be any warning (we've already *had* the warning). It will
just be broken and we will be scrambling to update things then (and hopefully nobody
gets attacked in the meantime).

And I do say *if* because as zooko pointed out, it's not a guarantee that MD5 will
ever lose it's pre-image resistance (which just means that brute forcing is the quickest
way to generate a hash).

> 
> best,
> holger
> 
> 
>> So by all means I will not migrate us if that's what you want. Old
>> versions of the installation clients stick around far to long for the
>> opt in mechanism to be much use. The point of switching was to cover
>> the existing clients as well to narrow the gap until a new API is
>> developed.
>> 
>> Hopefully no one is relying on these hashes to prevent an
>> author from maliciously injecting a sister package and hopefully the
>> strength of MD5 holds and no new research is found that blows it's
>> pre-image attack residence to pieces.
>> 
>> As far as not breaking things goes backwards compatibility has been an
>> important concern however progress forward *requires* breakage. It is
>> required because there is a vast array of available ways to have your
>> package and/or hosting configured many of them horrible practices
>> which need to be killed. Killing them requires breaking backwards
>> compatibility. You cite SSL, yes SSL has caused a number of errors for
>> people mostly related to older versions of OpenSSL being unable to use
>> a SSL certificate but downloading code you're going to execute over
>> plaintext isn't just bad, it's downright negligent on the part of the
>> toolchain. So that was a required breakage.
>> 
>> You also mention the pip 1.4 *not* installing pre-releases by default.
>> Yes that broke a handful of packages Supervisor and pytz being the
>> major ones that I've seen anyone complain about. It was also known
>> ahead of time that this was a backwards incompatible change (and it
>> was noted as such in the release notes). It wasn't a surprising
>> outcome. The pip developers "drew a line in the sand" to quote Paul
>> Moore and I expect pip 1.5 where PEP438 becomes default to break even
>> more packages from people who just haven't bothered to change their
>> practices until it's forced on them.
>> 
>> -----------------
>> Donald Stufft
>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>> 
> 
> 


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20130729/5042958e/attachment.pgp>


More information about the Distutils-SIG mailing list