[Cython] Hash-based vtables

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Mon Jun 4 21:44:11 CEST 2012


Me and Robert had a long discussion on the NumFOCUS list about this
already, but I figured it was better to continue it and provide more
in-depth benchmark results here.

It's basically a new idea of how to provide a vtable based on perfect
hashing, which should be a lot simpler to implement than what I first
imagined.

I'll write down some context first, if you're familiar with this
skip ahead a bit..

This means that you can do fast dispatches *without* the messy
business of binding vtable slots at compile time. To be concrete, this
might e.g. take the form

def f(obj):
     obj.method(3.4) # try to find a vtable with "void method(double)" in it

or, a more typed approach,

# File A
cdef class MyImpl:
     def double method(double x): return x * x

# File B
# Here we never know about MyImpl, hence "duck-typed"
@cython.interface
class MyIntf:
     def double method(double x): pass

def f(MyIntf obj):
     # obj *can* be MyImpl instance, or whatever else that supports
     # that interface
     obj.method(3.4)


Now, the idea to implement this is:

  a) Both caller and callee pre-hash name/argument string
     "mymethod:iidd" to 64 bits of hash data (probably lower 64 bits of
     md5)

  b) Callee (MyImpl) generates a vtable of its methods by *perfect*
     hashing. What you do is define a final hash fh as a function
     of the pre-hash ph, for instance

fh = ((ph >> vtable.r1) ^ (ph >> vtable.r2) ^ (ph >> vtable.r3)) & vtable.m

(Me and Robert are benchmarking different functions to use here.)  By
playing with r1, r2, r3, you have 64**3 choices of hash function, and
will be able to pick a combination which gives *no* (or very few)
collisions.

  c) Caller then combines the pre-hash generated at compile-time, with
     r1, r2, r3, m stored in the vtable header, in order to find the
     final location in the hash-table.

The exciting thing is that in benchmark, the performance penalty is
actually very slight over a C++-style v-table. (Of course you can
cache a proper vtable, but the fact that you get so close without
caring about caching means that this can be done much faster.)

Back to my and Robert's discussion on benchmarks:

I've uploaded benchmarks here:

https://github.com/dagss/hashvtable/tree/master/dispatchbench

I've changed the benchmark taking to give more robust numbers (at
least for me), you want to look at the 'min' column.

I changed the benchmark a bit so that it benchmarks a *callsite*.
So we don't pass 'h' on the stack, but either a) looks it up in a global
variable (default), or b) it's a compile-time constant (immediate in
assembly) (compile with -DIMHASH).

Similarly, the ID is either an "interned" global variable, or an
immediate (-DIMID).

The results are very different on my machine depending on this aspect.
My conclusions:

  - Both three shifts with masking, two shifts with a "fallback slot"
    (allowing for a single collision), three shifts, two shifts with
    two masks allows for constructing good vtables. In the case of only
    two shifts, one colliding method gets the twoshift+fback
    performance and the rest gets the twoshift performance.

  - Performance is really more affected by whether hashes are
    immediates or global variables than the hash function. This is in
    contrast to the interning vs. key benchmarks -- so I think that if
    we looked up the vtable through PyTypeObject, rather than getting
    the vtable directly, the loads of the global variables could
    potentially be masked by that.

  - My conclusion: Just use lower bits of md5 *both* for the hashing
    and the ID-ing (don't bother with any interning), and compile the
    thing as a 64-bit immediate. This can cause crashes/stack smashes
    etc. if there's lower-64bit-of-md5 collisions, but a) the
    probability is incredibly small, b) it would only matter in
    situations that should cause an AttributeError anyway, c) if we
    really care, we can always use an interning-like mechanism to
    validate on module loading that its hashes doesn't collide with
    other hashes (and raise an exception "Congratulations, you've
    discovered a phenomenal md5 collision, get in touch with cython
    devs and we'll work around it right away").

    The RTTI (i.e. the char*) is also put in there, but is not used for
    comparison and is not interned.

At least, that's what I think we should do for duck-style vtables.

Do we then go to all the pain of defining key-encoding, interning
etc. just for SEP 201? Isn't it easier to just mandate a md5 dependency
and be done with it? (After all, md5 usually comes with Python in the
md5 and hashlib modules)

direct: Early-binding
index: Call slot 0 (C++-style vtable/function pointer)
noshift: h & m1
oneshift: (h >> r1) & m1
twoshift: ((h >> r1) ^ (h >> r2)) & m1
twoshift+fback: hash doesn't
threeshift: ((h >> r1) ^ (h >> r2) ^ (h >> r3)) & m1
doublemask: ((h >> r1) & m1) ^ ((h >> r2) & m2)
doublemask2: ((h >> r1) & m1) ^ ((h & m2) >> r2)

Default distutils build (-O2):
------------------------------

Hash globalvar, ids globalvar

          direct: min=5.38e-09  mean=5.45e-09  std=3.79e-11 
val=1600000000.000000
           index: min=5.38e-09  mean=5.44e-09  std=3.09e-11 
val=1200000000.000000
         noshift: min=5.99e-09  mean=6.14e-09  std=6.63e-11 
val=1200000000.000000
        oneshift: min=6.47e-09  mean=6.53e-09  std=3.21e-11 
val=1200000000.000000
        twoshift: min=7.00e-09  mean=7.08e-09  std=3.73e-11 
val=1200000000.000000
  twoshift+fback: min=7.54e-09  mean=7.61e-09  std=4.46e-11 
val=1200000000.000000
      threeshift: min=7.54e-09  mean=7.64e-09  std=3.79e-11 
val=1200000000.000000
      doublemask: min=7.56e-09  mean=7.64e-09  std=3.02e-11 
val=1200000000.000000
     doublemask2: min=7.55e-09  mean=7.62e-09  std=3.24e-11 
val=1200000000.000000

hash immediate, ids globalvar

          direct: min=5.38e-09  mean=5.45e-09  std=3.87e-11 
val=1600000000.000000
           index: min=5.40e-09  mean=5.45e-09  std=2.92e-11 
val=1200000000.000000
         noshift: min=5.38e-09  mean=5.44e-09  std=3.48e-11 
val=1200000000.000000
        oneshift: min=5.90e-09  mean=5.99e-09  std=4.05e-11 
val=1200000000.000000
        twoshift: min=6.09e-09  mean=6.17e-09  std=3.52e-11 
val=1200000000.000000
  twoshift+fback: min=7.00e-09  mean=7.08e-09  std=3.64e-11 
val=1200000000.000000
      threeshift: min=6.47e-09  mean=6.55e-09  std=6.04e-11 
val=1200000000.000000
      doublemask: min=6.46e-09  mean=6.50e-09  std=3.37e-11 
val=1200000000.000000
     doublemask2: min=6.46e-09  mean=6.51e-09  std=3.04e-11 
val=1200000000.000000

all immediate:

          direct: min=5.39e-09  mean=5.50e-09  std=5.22e-11 
val=1600000000.000000
           index: min=5.38e-09  mean=5.51e-09  std=6.25e-11 
val=1200000000.000000
         noshift: min=5.38e-09  mean=5.51e-09  std=6.90e-11 
val=1200000000.000000
        oneshift: min=5.40e-09  mean=5.51e-09  std=5.35e-11 
val=1200000000.000000
        twoshift: min=5.94e-09  mean=6.06e-09  std=5.91e-11 
val=1200000000.000000
  twoshift+fback: min=7.06e-09  mean=7.19e-09  std=5.39e-11 
val=1200000000.000000
      threeshift: min=5.96e-09  mean=6.07e-09  std=5.54e-11 
val=1200000000.000000
      doublemask: min=5.88e-09  mean=6.01e-09  std=6.06e-11 
val=1200000000.000000
     doublemask2: min=5.94e-09  mean=6.05e-09  std=6.16e-11 
val=1200000000.000000

-O3 build
---------

all globalvars:

          direct: min=1.61e-09  mean=1.63e-09  std=1.40e-11 
val=1600000000.000000
           index: min=5.38e-09  mean=5.43e-09  std=2.82e-11 
val=1200000000.000000
         noshift: min=6.04e-09  mean=6.13e-09  std=4.76e-11 
val=1200000000.000000
        oneshift: min=6.46e-09  mean=6.54e-09  std=3.82e-11 
val=1200000000.000000
        twoshift: min=7.01e-09  mean=7.06e-09  std=3.41e-11 
val=1200000000.000000
  twoshift+fback: min=7.57e-09  mean=7.64e-09  std=3.47e-11 
val=1200000000.000000
      threeshift: min=7.54e-09  mean=7.63e-09  std=4.17e-11 
val=1200000000.000000
      doublemask: min=7.54e-09  mean=7.61e-09  std=3.64e-11 
val=1200000000.000000
     doublemask2: min=7.55e-09  mean=7.63e-09  std=3.35e-11 
val=1200000000.000000

hash immediate, ids globalvar:

          direct: min=1.61e-09  mean=1.66e-09  std=3.30e-11 
val=1600000000.000000
           index: min=5.40e-09  mean=5.50e-09  std=4.94e-11 
val=1200000000.000000
         noshift: min=5.38e-09  mean=5.49e-09  std=6.02e-11 
val=1200000000.000000
        oneshift: min=5.95e-09  mean=6.06e-09  std=6.64e-11 
val=1200000000.000000
        twoshift: min=5.96e-09  mean=6.13e-09  std=7.22e-11 
val=1200000000.000000
  twoshift+fback: min=7.02e-09  mean=7.18e-09  std=7.04e-11 
val=1200000000.000000
      threeshift: min=6.52e-09  mean=6.65e-09  std=6.43e-11 
val=1200000000.000000
      doublemask: min=6.50e-09  mean=6.62e-09  std=5.28e-11 
val=1200000000.000000
     doublemask2: min=6.52e-09  mean=6.63e-09  std=5.23e-11 
val=1200000000.000000

all immediate:

          direct: min=1.61e-09  mean=1.62e-09  std=9.77e-12 
val=1600000000.000000
           index: min=5.38e-09  mean=5.39e-09  std=1.71e-11 
val=1200000000.000000
         noshift: min=5.38e-09  mean=5.40e-09  std=2.41e-11 
val=1200000000.000000
        oneshift: min=5.38e-09  mean=5.40e-09  std=1.81e-11 
val=1200000000.000000
        twoshift: min=5.92e-09  mean=5.93e-09  std=1.43e-11 
val=1200000000.000000
  twoshift+fback: min=7.00e-09  mean=7.01e-09  std=2.20e-11 
val=1200000000.000000
      threeshift: min=5.92e-09  mean=5.94e-09  std=1.99e-11 
val=1200000000.000000
      doublemask: min=5.79e-09  mean=5.82e-09  std=2.32e-11 
val=1200000000.000000
     doublemask2: min=5.92e-09  mean=5.94e-09  std=2.25e-11 
val=1200000000.000000



More information about the cython-devel mailing list