[Python-Dev] Make str/bytes hash algorithm pluggable?

Guido van Rossum guido at python.org
Thu Oct 3 21:05:15 CEST 2013


Hm. I would like to stick to the philosophy that Python's hash should be as
fast as it possibly can be, and should not be mistaken for a cryptographic
hash. The point is to optimize dict lookups, nothing more, given typical
(or even atypical) key distribution, not to thwart deliberate attacks. We
already have adopted a feature that plugged most viable attacks on web
apps, I think that's enough. I also agree with Antoine's response.


On Thu, Oct 3, 2013 at 11:42 AM, Christian Heimes <christian at python.org>wrote:

> Hi,
>
> some of you may have seen that I'm working on a PEP for a new hash API
> and new algorithms for hashing of bytes and str. The PEP has three major
> aspects. It introduces DJB's SipHash as secure hash algorithm, chances
> the hash API to process blocks of data instead characters and it adds an
> API to make the algorithm pluggable. A draft is already available [1].
>
> Now I got some negative feedback on the 'pluggable' aspect of the new
> PEP on Twitter [2]. I like to get feedback from you before I finalize
> the PEP.
>
> The PEP proposes a pluggable hash API for a couple of reasons. I like to
> give users of Python a chance to replace a secure hash algorithm with a
> faster hash algorithm. SipHash is about as fast as FNV for common cases
> as our implementation of FNV process only 8 to 32 bits per cycle instead
> of 32 or 64. I haven't actually benchmarked how a faster hash algorithm
> affects the a real program, though ...
>
> I also like to make it easier to replace the hash algorithm with a
> different one in case a vulnerability is found. With the new API vendors
> and embedders have an easy and clean way to use their own hash
> implementation or an optimized version that is more suitable for their
> platform, too. For example a mobile phone vendor could provide an
> optimized implementation with ARM NEON intrinsics.
>
>
> On which level should Python support a pluggable hash algorithm?
>
> 1) Compile time option: The hash code is compiled into Python's core.
> Embedders have to recompile Python with different options to replace the
> function.
>
> 2) Library option: A hash algorithm can be added and one avaible hash
> algorithm can be set before Py_Initialize() is called for the first
> time. The approach gives embedders the chance the set their own
> algorithm without recompiling Python.
>
> 3) Startup options: Like 2) plus an additional environment variable and
> command line argument to select an algorithm. With a startup option
> users can select a different algorithm themselves.
>
> Christian
>
> [1] http://www.python.org/dev/peps/pep-0456/
> [2] https://twitter.com/EDEADLK/status/385572395777818624
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20131003/c5b9566f/attachment.html>


More information about the Python-Dev mailing list