[Python-Dev] Hash randomization for which types?

Maciej Fijalkowski fijall at gmail.com
Wed Feb 17 02:34:29 EST 2016


Note that hashing in python 2.7 and prior to 3.4 is simply broken and
the randomization does not do nearly enough, see
https://bugs.python.org/issue14621

On Wed, Feb 17, 2016 at 4:45 AM, Shell Xu <shell909090 at gmail.com> wrote:
> I thought you are right. Here is the source code in python 2.7.11:
>
> long
> PyObject_Hash(PyObject *v)
> {
>     PyTypeObject *tp = v->ob_type;
>     if (tp->tp_hash != NULL)
>         return (*tp->tp_hash)(v);
>     /* To keep to the general practice that inheriting
>      * solely from object in C code should work without
>      * an explicit call to PyType_Ready, we implicitly call
>      * PyType_Ready here and then check the tp_hash slot again
>      */
>     if (tp->tp_dict == NULL) {
>         if (PyType_Ready(tp) < 0)
>             return -1;
>         if (tp->tp_hash != NULL)
>             return (*tp->tp_hash)(v);
>     }
>     if (tp->tp_compare == NULL && RICHCOMPARE(tp) == NULL) {
>         return _Py_HashPointer(v); /* Use address as hash value */
>     }
>     /* If there's a cmp but no hash defined, the object can't be hashed */
>     return PyObject_HashNotImplemented(v);
> }
>
> If object has hash function, it will be used. If not, _Py_HashPointer will
> be used. Which _Py_HashSecret are not used.
> And I checked reference of _Py_HashSecret. Only bufferobject, unicodeobject
> and stringobject use _Py_HashSecret.
>
> On Wed, Feb 17, 2016 at 9:54 AM, Steven D'Aprano <steve at pearwood.info>
> wrote:
>>
>> On Tue, Feb 16, 2016 at 11:56:55AM -0800, Glenn Linderman wrote:
>> > On 2/16/2016 1:48 AM, Christoph Groth wrote:
>> > >Hello,
>> > >
>> > >Recent Python versions randomize the hashes of str, bytes and datetime
>> > >objects.  I suppose that the choice of these three types is the result
>> > >of a compromise.  Has this been discussed somewhere publicly?
>> >
>> > Search archives of this list... it was discussed at length.
>>
>> There's a lot of discussion on the mailing list. I think that this is
>> the very start of it, in Dec 2011:
>>
>> https://mail.python.org/pipermail/python-dev/2011-December/115116.html
>>
>> and continuing into 2012, for example:
>>
>> https://mail.python.org/pipermail/python-dev/2012-January/115577.html
>> https://mail.python.org/pipermail/python-dev/2012-January/115690.html
>>
>> and a LOT more, spread over many different threads and subject lines.
>>
>> You should also read the issue on the bug tracker:
>>
>> http://bugs.python.org/issue13703
>>
>>
>> My recollection is that it was decided that only strings and bytes need
>> to have their hashes randomized, because only strings and bytes can be
>> used directly from user-input without first having a conversion step
>> with likely input range validation. In addition, changing the hash for
>> ints would break too much code for too little benefit: unlike strings,
>> where hash collision attacks on web apps are proven and easy, hash
>> collision attacks based on ints are more difficult and rare.
>>
>> See also the comment here:
>>
>> http://bugs.python.org/issue13703#msg151847
>>
>>
>>
>> > >I'm not a web programmer, but don't web applications also use
>> > >dictionaries that are indexed by, say, tuples of integers?
>> >
>> > Sure, and that is the biggest part of the reason they were randomized.
>>
>> But they aren't, as far as I can see:
>>
>> [steve at ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))"
>> 1071302475
>> [steve at ando 3.6]$ ./python -c "print(hash((23, 42, 99, 100)))"
>> 1071302475
>>
>> Web apps can use dicts indexed by anything that they like, but unless
>> there is an actual attack, what does it matter? Guido makes a good point
>> about security here:
>>
>> https://mail.python.org/pipermail/python-dev/2013-October/129181.html
>>
>>
>>
>> > I think hashes of all types have been randomized, not _just_ the list
>> > you mentioned.
>>
>> I'm pretty sure that's not actually the case. Using 3.6 from the repo
>> (admittedly not fully up to date though), I can see hash randomization
>> working for strings:
>>
>> [steve at ando 3.6]$ ./python -c "print(hash('abc'))"
>> 11601873
>> [steve at ando 3.6]$ ./python -c "print(hash('abc'))"
>> -2009889747
>>
>> but not for ints:
>>
>> [steve at ando 3.6]$ ./python -c "print(hash(42))"
>> 42
>> [steve at ando 3.6]$ ./python -c "print(hash(42))"
>> 42
>>
>>
>> which agrees with my recollection that only strings and bytes would be
>> randomized.
>>
>>
>>
>> --
>> Steve
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/shell909090%40gmail.com
>
>
>
>
> --
> 彼節者有間,而刀刃者無厚;以無厚入有間,恢恢乎其於游刃必有餘地矣。
> blog: http://shell909090.org/blog/
> twitter: @shell909090
> about.me: http://about.me/shell909090
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
>


More information about the Python-Dev mailing list