[Python-Dev] PEP 456

Christian Heimes christian at python.org
Thu Oct 3 22:49:13 CEST 2013


Am 03.10.2013 21:53, schrieb Serhiy Storchaka:
>> the first time time with a bit shift of 7
> 
> Double "time".

thx, fixed

>> with a 128bit seed and 64-bit output
> 
> Inconsistancy with hyphen. There are same issues in other places.

I have unified the use of hyphens, thx!

>> bytes_hash provides the tp_hash slot function for unicode.
> 
> Typo. Should be "unicode_hash".

Fixed

> x = _PyHash_Func->hashfunc(PyUnicode_BYTE_DATA(self),
> PyUnicode_GET_LENGTH(self) * PyUnicode_KIND(self));

Oh nice, that's easier to read. It's PyUnicode_DATA().

> I doubt about this. If one collects bytes and strings in one dictionary,
> this equality will only double the number of collisions (for DoS attack
> we need increase it by thousands and millions times). So it doesn't
> matter. On the other hand, I one deliberately uses bytes and str
> subclasses with overridden equality, same hash for ASCII bytes and
> strings can be needed.

It's not a big problem. I merely wanted to point out that there is a
simple possibility for a minor optimization. That's all. :)

>> For very short strings the setup costs for SipHash dominates its speed
> but it is still in the same order of magnitude as the current FNV code.
> 
> We could use other algorithm for very short strings if it makes matter.

I though of that, too. The threshold is rather small, though. As far as
I remember an effective hash collision DoS works with 7 or 8 chars.

>> The summarized total runtime of the benchmark is within 1% of the
> runtime of an unmodified Python 3.4 binary.
> 
> What about deviations of individual tests?

Here you go.

http://pastebin.com/dKdnBCgb
http://pastebin.com/wtfUS5Zz

Christian


More information about the Python-Dev mailing list