[C++-sig] Implementation of proper overload resolution
Dane Springmeyer
blake at hailmail.net
Thu Dec 17 20:31:02 CET 2009
Troy,
Incidentally, do you know the proper way (or is there a proper way?)
to support None type keyword arguments?
Cheers,
Dane
On Dec 17, 2009, at 11:11 AM, Dane Springmeyer wrote:
> Troy,
>
> Really __impressive__ write-up and work on this. I'd definitely be
> interested in build instructions and your work getting into future
> releases.
>
> Cheers,
>
> Dane
>
>
> On Dec 17, 2009, at 8:08 AM, Troy D. Straszheim wrote:
>
>>
>> Here's what I've got on overloading. This turned out to be a lot
>> more
>> work, and this mail much longer, than I'd hoped. Would appreciate a
>> readthrough and comments/questions, I've done about all I can do.
>> First
>> a review of the problems, then a walkthrough of an implementation I
>> have
>> that fixes them.
>>
>> 1.a) Problem: Ambiguous calls from python to overloaded functions
>>
>> Boost.python's 'overloading' is and has always been broken w.r.t. is
>> ambiguity. The library currently has no notion, given some tuple of
>> arguments, of one function being a better match than another. It
>> knows
>> only "this function matches" or not. What the user observes is the
>> following: when a call to a c++ function is made from python e.g.
>>
>>>>> m.f(1)
>>
>> boost.python looks through the overloads for f, in the reverse
>> order of
>> registration, and calls the first one that is callable. For
>> instance if
>> the registrations are
>>
>> void f_int(int);
>> void f_bool(bool);
>>
>> BOOST_PYTHON_MODULE(m)
>> {
>> def("f", f_int);
>> def("f", f_bool);
>> }
>>
>> then f_bool will execute this call of f(1), since the python 'int'
>> converts to c++ bool. If the overloads were registered in the
>> reverse
>> order, f_int would be executed.
>>
>> In this case the c++ overloads in question are potentially
>> distinguishable from one another, as python has distinct types bool
>> and
>> int. So these aren't "necessarily" ambiguous.
>>
>> 1.b) Problem: "Necessarily" ambiguous overloads
>>
>> There are some overload sets that *are* necessarily ambiguous. This
>> set, for instance,
>>
>> void f_double(double);
>> void f_float(float);
>>
>> BOOST_PYTHON_MODULE(m)
>> {
>> def("f", f_double);
>> def("f", f_float);
>> }
>>
>> can never be unambiguous, since python has no 'float' type. I.e.
>>
>>>>> f(x)
>>
>> will call f_float for any value of x where the call succeeds at all.
>>
>> 1.c) Problem: Multiple values for keyword argument
>>
>> At the same place in the boost.python code where this 'first-match'
>> overload resolution is done, the user error 'multiple values for
>> keyword
>> argument' is not checked. Neal Becker recently pointed this out.
>> With
>> boost.python it looks like this:
>>
>> int addem(int x, int y, int z) { return x*100 + y*10 + z; }
>>
>> BOOST_PYTHON_MODULE(M)
>> {
>> def("addem", &addem, (arg("x"), arg("y"), arg("z")));
>> }
>>
>>>>> from M import addem
>>>>> addem(1, 8, 2, x=4)
>> Traceback (most recent call last):
>> ...
>> ArgumentError: Python argument types in
>> M.addem(int, int, int)
>> did not match C++ signature:
>> addem(int x, int y, int z)
>>
>> That error message is very confusing... f(int,int,int) doesn't match
>> f(int,int,int)? The pure python version reports something more
>> sensible:
>>
>>>>> def g(x,y,z, **kwargs):
>> ... print x,y,z,kwargs
>> ...
>>>>> g(1,2,3,x=4)
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> TypeError: g() got multiple values for keyword argument 'x'
>>
>>
>> 2.) An implemented solution
>>
>> I've got something implemented. Here's what it does.
>>
>> 2.a) Solution: multiple values for keyword
>>
>> The easiest case to catch is the last [1.c]. It is also orthogonal
>> to
>> the others. If a registered function has keyword arguments, check
>> for
>> multiple keyword values, and raise an error if so:
>>
>>>>> from M import addem
>>>>> addem(1,2,3,x=1)
>> Boost.Python.TypeError: M.addem() got multiple values for keyword
>> argument 'x'
>>
>> 2.b) Solution: "necessarily" ambiguous overloads
>>
>> The next easiest thing to catch is case [1.b], "necessarily"
>> ambiguous
>> registrations. It proceeds as follows: at import time, as each new
>> overload V for function F is registered, compare V to the existing
>> overloads EO for F. If V has the same signature as something in EO,
>> raise an AmbiguousOverload exception. For instance, if you load the
>> module from [1.b] above, you get:
>>
>>>>> import m
>> Traceback (most recent call last):
>> ...
>> AmbiguousOverload: Boost.Python function m.f
>> has ambiguous overloads. C++ signatures
>> f(float) -> None
>> and
>> f(double) -> None
>> are indistinguishable to python.
>>
>> Again this is because c++ float and c++ double both map to python
>> 'float'. This one is "easy" as it happens only once (not at every
>> function call) and doesn't take up any space.
>>
>> 2.c) Solution: ambiguous calls
>>
>> The hard case is [1.a]:
>>
>> void f_bool(bool);
>> void f_int(int);
>>
>> BOOST_PYTHON_MODULE(m)
>> {
>> def(f, f_bool);
>> def(f, f_int);
>> }
>>
>> For module 'm' above, a call to f(True) or f(1) should succeed and
>> call
>> the corresponding function. Passing a float, however, is ambiguous:
>>
>>>>> f(True) # ok
>>>>> f(1) # ok
>>>>> f(1.0)
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> Boost.Python.AmbiguousCall: Ambiguous call to Boost.Python function
>> m.f
>> C++ signatures:
>> f(int)
>> f(bool)
>>
>> So the implementation has some how 'scored' each possible overload
>> and
>> in each case either used the best one or reported ambiguity if
>> multiple
>> overloads are tied for 'best'.
>>
>> The scoring works as follows.
>>
>> In the guts of boost.python we recieve a tuple of arguments of type
>> PyObject*. We also have a list of overloads, each with an mpl vector
>> representing the signature of the associated C++ function. This is
>> unchanged from the released version.
>>
>> What has been done until now is to attempt to call each overload in
>> order and return the result of the first that succeeds.
>>
>> In this new implmentation, for each c++ argument of type T_n, we
>> ask a
>> class overload_score<T_n> to score the conversion of the type
>> (PyTypeObject*) of this PyObject* to T_n. Example:
>>
>> template <>
>> struct overload_score<int>
>> {
>> boost::optional<unsigned> operator()(PyObject* type)
>> {
>> if(PyInt_CheckExact(type))
>> return boost::optional<unsigned>(0); // int == perfect
>> else if (PyBool_Check(type))
>> return boost::optional<unsigned>(1); // bool == okay
>> else if (PyFloat_CheckExact(type))
>> return boost::optional<unsigned>(1); // float == okay
>> else if(arg_from_python<int>(type).convertible())
>> return boost::optional<unsigned>(1); // fallback
>> else
>> return boost::optional<unsigned>(); // unsuitable
>> }
>> };
>>
>> The "score" type is optional<unsigned>. A valueless,
>> (default-constructed) optional<unsigned> means 'unsuitable'.
>> optional<unsigned>(0) is a perfect match, and
>> optional<unsigned>(value)
>> for value > 0 is a workable but less than perfect match.
>>
>> These per-argument scores are added together for all arguments:
>> this is
>> the overload's total score. If any argument is 'unsuitable', the
>> total
>> score is 'unsuitable'.
>>
>> If there is a tie for the best (lowest) score, the call is
>> ambiguous and
>> a Boost.Python.AmbiguousCall exception is raised. If there is only
>> one
>> function in first place, call it.
>>
>> 3.) Implications
>>
>> This breaks a few corner cases that I've found in the tests.
>>
>> 3.a) implied init<> registration
>>
>> I have found a few instances like this one:
>>
>> struct X
>> {
>> X();
>> X(int);
>> };
>>
>> class_<X>("X") // #1
>> .def(init<>()) // #2
>> .def(init<int>())
>> ;
>>
>> Here, #1 causes a default constructor to be registered, as does #2.
>> This will cause a throw at load time as in [2.b]. It is simple to
>> fix.
>>
>> 3.b) raw_constructor
>>
>> The test suite for raw_constructor depends on our first_match
>> overload
>> resolution. The fix is to move a few things around to get the same
>> effect without relying on this behavior.
>>
>> The "before" code looks like this:
>>
>> class Foo
>> {
>> public:
>> Foo(tuple args, dict kw)
>> : args(args), kw(kw) {}
>>
>> tuple args;
>> dict kw;
>> };
>>
>> object init_foo(tuple args, dict kw)
>> {
>> tuple rest(args.slice(1,_));
>> return args[0].attr("__init__")(rest, kw);
>> }
>>
>> BOOST_PYTHON_MODULE(raw_ctor_ext)
>> {
>> // using no_init postpones defining __init__ function until after
>> // raw_function for proper overload resolution order, since later
>> // defs get higher priority.
>> class_<Foo>("Foo", no_init)
>> .def("__init__", raw_function(&init_foo))
>> .def(init<tuple, dict>())
>> .def_readwrite("args", &Foo::args)
>> .def_readwrite("kw", &Foo::kw)
>> ;
>> }
>>
>> The "after" code does the registration piecemeal: Maybe there is a
>> better way to do this. To my mind this is a little better because it
>> makes explicit what is happening rather than relying on some subtle
>> property of overload resolution:
>>
>> class Foo
>> {
>> public:
>> Foo(tuple args, dict kw)
>> : args(args), kw(kw) {}
>>
>> tuple args;
>> dict kw;
>> };
>>
>> // our custom factory function
>> object init_foo(tuple args, dict kw)
>> {
>> tuple rest(args.slice(1,_));
>> return args[0].attr("__real_init__")(rest, kw);
>> }
>>
>> BOOST_PYTHON_MODULE(raw_ctor_ext)
>> {
>> // to get the desired effect we register
>> // the actual constructor and our 'raw' constructor,
>> // and then rename them
>> class_<Foo> c("Foo", init<tuple, dict>());
>> c
>> .def("__tmp_init__", raw_function(&init_foo))
>> .def_readwrite("args", &Foo::args)
>> .def_readwrite("kw", &Foo::kw)
>> ;
>>
>> //
>> // __init__ => __real_init__
>> // __tmp_init__ => __init__
>> //
>> object real_constructor = getattr(c, "__init__");
>> object raw_constructor = getattr(c, "__tmp_init__");
>>
>> setattr(c, "__init__", raw_constructor);
>> delattr(c, "__tmp_init__");
>> setattr(c, "__real_init__", real_constructor);
>>
>> }
>>
>> And that basically covers it. Looking forward to comments/feedback.
>>
>> The code is here:
>>
>> http://gitorious.org/~straszheim/boost/straszheims-python/trees/
>> master
>>
>> Note that the headers are in subdirectory 'include'... If there is
>> enough response to this mail it I'll send out instructions on how
>> to get
>> a working build going.
>>
>> - t
>> _______________________________________________
>> Cplusplus-sig mailing list
>> Cplusplus-sig at python.org
>> http://mail.python.org/mailman/listinfo/cplusplus-sig
>
> _______________________________________________
> Cplusplus-sig mailing list
> Cplusplus-sig at python.org
> http://mail.python.org/mailman/listinfo/cplusplus-sig
More information about the Cplusplus-sig
mailing list