Rethinking of inheritance and conversions

Wed Jan 9 00:41:14 CET 2002

My earlier conclusions that we should only search for paths beginning
with upcasts were simply wrong. The counterexample is:

      X   Y
       \ /
        Z

If I have a polymorphic Z held via X*, it should be convertible to
Y. Furthermore, there is not neccessarily a single best path for
getting from X to Y:

   X   Y
   |\ /|
   | / |
   |/ \|
   Z1  Z2

Also, the user may opt not to tell us about parts of the
inheritance hierarchy:

      U
    ..:..
    :   :
    V   W
   / \ / \
  X   Y   Z

It might be that a path from S to T that begins with an upcast and
contains the fewest downcasts can always be used to get from S to T,
but I'm not even confident of that.

Okay, the strategy:

* The convertability graph is represented in a straightforward manner:
  if there is a conversion from A -> B, the graph contains an edge
  from A -> B

* Each edge is labelled with "upcast" or "downcast", and a
  void*(*)(void*) function pointer. The function is an instantiation
  of one of the following:

    template <class T, class U>
    void* upcast(void* x) { return static_cast<U>((T*)x); }

    template <class T, class U>
    void* downcast(void* x) { return dynamic_cast<U>((T*)x); }

* When an inheritance relationship is declared, 1 or 2 edges are added
  to the graph, depending whether the user tells us that the base
  class is polymorphic (an upcast, or an upcast and a downcast).

* Each extension class instance contains one or more "holders". A
  holder can tell us the type of object it holds, and whether or not
  the object is held by-value or by some kind of pointer or
  reference. Multiple holders will occur in case of multiple
  inheritance from wrapped classes.

* To convert an extension class instance to a T*:

  for each holder<U> of object u:

     if u is held by-value:

         breadth-first search the upcast edges of the convertability
         graph beginning at node U.

         At each expansion step record a back-link so that the path
         back to the source node can be retrieved

         if the node for T is reached, follow the back links to
         recover the conversion path, execute the conversion
         corresponding to each edge, and  return the result.

     else:

         dijkstra search the convertability graph beginning at node U.

            * upcast edges cost 0; downcast edges cost 1.

            * each node in the queue is labelled with an address: &u
              converted to the node's target type. These labels are
              produced by calling the conversion function associated
              by each edge as it is explored.

            * for the purposes of the search, edges are considered to be
              "in the graph" if
                  the edge is an upcast 
               OR
                  the edge's conversion function applied to the source
                  node's address label is nonzero.

            if at any point we reach the node for T, we return the
            node's address label.

Optimizing this process looks hard. What we'd really like is some way
to find out if a particular node is reachable through a given edge in
the graph without passing back thorough the edge's source node. If we
knew that, we could explore only the paths that connect the source and
target types.

One possibility is to pick a method that's right most of the time, but
allows some false positives. The better job you do of eliminating
false positives, the more you prune the search. If you do a perfect
job, the search heads directly for the target node (this begins to
sound a lot like A* search, doesn't it?)

We could use a function that produces a hash value for any (edge, target)
pair, and a bitset S with the rule:

      if node N is reachable through edge E, S[hash(E,N)] == 1.

Users could tune the optimization by changing the size of the bitset.

We'd need to turn on new bits whenever new inheritance relationships
were registered. Updating would require that the graph be
bidirectionally traversable.

This approach requires examination of all outgoing edges of the source
node, even when there is no path to the target. We could apply the
same technique to encode reachability relationships between nodes,
further speeding the determination that no conversion is possible.

There may be better approaches to generating this sort of hint
information. If anybody has suggestions, I'd love to hear them.

---- A less important idea with some promise ---

We can always find out the most-derived type to which any held object
can be cast. That's just typeid(x), since downcasts are impossible
anyway if the type isn't polymorphic.

If a type /is/ polymorphic, we can identify the object's address also,
via dynamic_cast<void*>(x).

If that type is registered with the system, we can then convert it to
any type from which it is unambiguously derived via a sequence of
efficient static_casts.

The problem is that the type might not have been registered with the
system. Suppose T is derived from U which is derived from V. If the
user has wrapped U and V, and returns a smart_ptr<V> pointing to a T,
it won't be convertible to U& via this method.

In other words, this approach requires the registration of every
concrete class that the user will return to us. Note that this is not
neccessarily mean the user must have created a class_builder for these
types; a different mechanism for registering inheritance is
possible. Still, each concrete type in a polymorphic inheritance
hierarchy will need to have its inheritance relationship with the
most-derived class which is used as an argument to a wrapped function
declared (whew!). As you can see, this requirement is difficult to
explain, and thus is too much of a burden to place on users.

A hybrid approach is possible; if the class' concrete type is
registered, we can use the above method. Otherwise, we need to resort
to something else. However, that effectively relegates the above to
the realm of optimization, so it should only be implemented later.