[C++-sig] Custom PyTypeObjects

Alex Leach beamesleach at gmail.com
Mon Apr 29 18:26:38 CEST 2013


Dear list,

A thread I started was originally meant to discuss how to use C++ memory  
management methods (operator new, delete etc.) with a boost python  
instance. Rather than dwelling on the concern, I've been (successfully)  
wrapping other code since, but have now arrived at a separate conundrum,  
which I think could be addressed by the same conceptual solution. This  
time I've found a working attempt at a solution here on this list[1], and  
was hoping that more generic, template-ised versions could be introduced  
into Boost..

[1] -  
http://mail.python.org/pipermail/cplusplus-sig/2007-August/012438.html


The message has turned into a bit of an essay, so I'll summarise what I've  
written here:-

        * Python objects and protocols - they're not all the same.
        * Python Buffers - An example and attempt at exposing one.
        * C++ IO streams - exposing buffered object interfaces to Python.
        * Customising PyTypeObjects already used in Boost Python.
        * "There should be one-- and preferably only one --obvious way to  
do it."
        * Summary


What's in an Object?
--------------------

What I think it boils down to is a lack of support for the different type  
objects defined in the Python C-API Abstract[2] and Concrete[3] Object  
Layers.

The problem in [1] was related to PyBufferProcs and PyBufferObjects. How  
can an object representing a buffer be properly exposed to Python? The  
PyBuffer* structs were designed with this in mind, but are now deprecated
in favour of memory view objects [4]. Either way, a `grep` of the Boost  
Python header and source files show no sign of either API being made of  
use.

[2] - http://docs.python.org/2/c-api/abstract.html
[3] - http://docs.python.org/2/c-api/concrete.html
[4] - http://docs.python.org/2/c-api/buffer.html#memoryview-objects


A buffered solution
-------------------

The solution from [1] makes it about as simple as possible for the client  
/ Python registration code to expose a return type that is managed by a  
PyBuffer_Type-like PyTypeObject. A custom to-python converter is  
registered and return_value_policy used.

However, this is still fairly cumbersome compared to current Boost Python  
usage, as the C-Python API needs to be used directly and a custom  
PyTypeObject defined, for any return-type that should use a different type  
protocol. The solution also goes nowhere to providing the functionality a  
Python buffer expects, but instead just demonstrates how one might use a  
new PyTypeObject.


A standards-compliant solution
------------------------------

With the C++ standard library in mind, I was wondering what boost python  
might be able to do with IO streams. I have a family of C++ classes that  
use iostream-like operators to serialise objects into either XML, plain  
text,
or binary formats. Providing this functionality via a buffered object  
seems to be the appropriate solution... Using boost python to expose such  
an interface though, looks non-trivial, to say the least.

A boost-friendly solution might be to recognise boost::asio::buffer[6]  
objects, perhaps using boost::mpl statements in the to-python converter  
registrations.

I'm still trying to get to grips with standard library templates  
personally, so would prefer if classes derived from ios_base could  
automatically have their '<<' and '>>' operators exposed at compile time,  
depending on whether they are read-only or read-write. An exposed seek  
function would also be useful, when one is available in the C++ type.


Specialised PyTypeObjects
-------------------------

Discussing each of the different object types is too large a subject to  
describe in full here, but would it not be sensible for Boost Python to  
make it easier to expose other PyTypeObjects?

The NumPy C-API exposes 8 public and 4 private type specialisations[5],  
for representing clearly different types of data. These are essentially  
PyTypeObjects conforming to the API defined in the C-Python object layers  
documentation[2,3].

With quite a lot more code, Boost Python could potentially provide  
capability to specialise the type objects for a number of pre-defined base  
types, by providing custom HolderGenerators[6] for each type  
specialisation. These HolderGenerators can be referred to by creating  
corresponding `return_value_policy`s. This is what the solution from [1]  
does, by defining both a new HolderGenerator and a corresponding  
return_value_policy.

This concept is not problem-free, however. In my case, I'd like to tie a  
C++ class's streaming interface directly to the PyTypeObject. For Python  
2.x this would mean populating a new PyTypeObject's tp_as_buffer attribute  
to a PyBufferProcs struct. The code from [1] could be modified to do this,  
but it would take quite a lot more work. (It has..)

For Python2.7 and above, there are of course the new buffer and memoryview  
APIs, but I haven't really read up on or done anything with them yet...


[5] -  
http://docs.scipy.org/doc/numpy/reference/c-api.types-and-structures.html
[6] -  
http://www.boost.org/doc/libs/1_53_0/libs/python/doc/v2/HolderGenerator.html


A generalised solution
----------------------

To answer my question from the previous thread I started here, on how to  
use a custom PyTypeObject on an exposed class_<> hierarchy, I think the  
way to do this is to use `pytype_object_manager_traits<PyTypeObject*,  
object>`, as is done in str.hpp, list.hpp, etc. e.g.:-

namespace converter
{
   template <>
   struct object_manager_traits<str>
       : pytype_object_manager_traits<&PyUnicode_Type, str>
   {
   };
}

This seems to be the best way to register a PyTypeObject to a C++ class,  
with Boost. But it does require a tremendous of work, when wanting to use  
PyTypeObjects that should use STL functionality.

C++ IO streams
--------------


Mapping C++ STL functions to PyTypeObject attributes[7] does not appear to  
have been done at all in Boost Python, in so far as I can tell. Of course  
there are the standard objects, bp::string, list, etc. , which use core  
Python's respective PyTypeObjects as instance managers, like above, but it  
doesn't seem like there is a robust way to replace a PyTypeObject's  
function pointers with STL-conforming implementations. I suppose it is  
possible to edit the PyTypeObject, after getting it with  
`object.get_type()`, but that seems a bit of an inefficient, run-time hack.

I was playing around with the code from [1] over the weekend, and have  
started to map the C++ iostream template functions to a PyTypeObject's  
`tp_as_buffer` member struct, to expose buffered access to C++ formatted  
stream methods through a PyBufferProcs struct[8]. Admittedly, this was a  
bit of a pointless exercise, as the buffer protocol has been removed in  
Python 3, but I am currently developing with Python 2.7 and wanted to try  
out an initial, working implementation where a custom PyTypeObject is used.

For std::i/ostream, there is some production code available that can  
perform Python file-like object conversions. In particular, the two  
subsequent replies to this message[9] here on this list, mention  
open-source libraries that can already do this. And from the code listed  
in [1], I've made available yet another (partially complete)  
implementation[10].

[7] - http://docs.python.org/2/c-api/typeobj.html#
[8] - http://docs.python.org/2/c-api/typeobj.html#PyBufferProcs
[9] - http://mail.python.org/pipermail/cplusplus-sig/2010-March/015411.html
[10] - https://github.com/alexleach/bp_helpers


Moving forward
--------------

Assuming Boost Python follows the Zen of Python, there should be one - and  
only one - obvious way to achieve what I want. That is currently, to  
expose a future-proof, STL-compliant iostream interface, through Boost  
Python. I don't think any of the above implementations are compatible with  
Python 3, since I don't think any of them use the new Python buffer or  
memoryview APIs, but I'd like to make the switch soon, myself.

I'm sure adding buffer support to Boost Python would be valuable for a  
number of users. From a backwards-compatibility perspective, it would  
probably be good to have both the old and the new buffer APIs included in  
Boost Python, to be selected with a Python preprocessor macro. Memoryviews  
are a relatively fancy and new feature, but buffers have been around for  
ages, so it would be good if they were supported, for basically all  
versions of Python. Ideally though, we would also have memoryview  
functionality in v2.7+, too.


One way to rule them all
------------------------

Now, I've discovered a number of ways to write to_python converters, and  
am not sure what is the "one obvious way" to define a new PyTypeObject's  
API.

I would be grateful for feedback on which should be the preferred way to  
expose a class with a custom PyTypeObject. Here are the methods I've  
looked into:-


1. indexing_suite

   Perhaps my favourite way I found to expose a to_python converter, was  
with boost python's indexing_suite, as I did for std::list[11] (also  
attached to a msg on this list, earlier this month). From the client's  
perspective, all that needs to be done is to instantiate a template. For  
examples, see the C++ test code[12]. However, I haven't really looked into  
how the converter is registered internally, as the base classes take care  
of that. Either way, the indexing suite functions are only attached to the  
PyObject, not its respective PyTypeObject.

[11] -  
https://github.com/alexleach/bp_helpers/blob/master/include/boost_helpers/make_list.hpp
[12] -  
https://github.com/alexleach/bp_helpers/blob/master/src/tests/test_make_list.cpp


2. class_<..>

   The code for the class_ template, its bases and typedefs is really quite  
advanced, but it can't be said that it is inflexible. Still, I haven't  
found an "obvious way" to replace a class's object manager. I get a  
runtime warning if a to_python_converter is registered as well as a class.  
bp::init has an operator[] method, which can be used to specify a  
CallPolicy, but I haven't managed to get that to change an instance's base  
type.

The registry is probably the way to do this, but for me at least, the  
registry is very opaque, so I haven't found a good way to edit or replace  
a PyTypeObject, either during or after an exposed class_<> has been  
initialised.


3. to_python_converter<class T, class Conversion, bool  
has_get_pytype=false>

   This is how the solution in [1] enables to-python (PyObject) conversion,  
and is also how I've been doing it in the testing code I modified from  
there[13-15]. A corresponding Conversion class seems necessary to write,  
for each new type of PyTypeObject. e.g. as done in  
return_opaque_pointer.hpp and opaque_pointer_converter.hpp.

[13] -  
https://github.com/alexleach/bp_helpers/blob/master/src/tests/test_buffer_object.cpp
[14] -  
https://github.com/alexleach/bp_helpers/blob/master/include/boost_helpers/return_buffer_object.hpp
[15] -  
https://github.com/alexleach/bp_helpers/blob/master/include/boost_helpers/buffer_pointer_converter.hpp


4. Return value policies and HolderGenerators

   In functions and methods where a CallPolicy can be specified, as I've  
said already, a custom CallPolicy can be used to refer to a custom  
HolderGenerator. These specify which PyTypeObject is used for managing the  
python-converted object, but can a custom return value policy and holder  
be specified with the class_<> template? I sure would like to find a way...

   However, alone, this doesn't seem to put a type converter in the  
registry. I thought that the MakeHolder::execute function should only need  
to be called once, but my code is currently calling it every time I want a  
new Python instance. So, I think I must not be registering the class' type  
id properly[14]...



Then there are the lvalue and rvalue Python converters, which admittedly I  
don't know much about. There's also some other concepts I haven't  
mentioned above, like install_holders[16], for example, and whatever is  
done when you add shared_ptr<X> to the class_ template's arguments.

[16] -  
http://www.boost.org/doc/libs/1_53_0/libs/python/doc/v2/instance_holder.html#instance_holder-spec


Summary
-------

Should the ability to expose C++ istreams and ostreams be added to Boost  
Python? How should this be done? I thought that having a chainable  
return_value_policy for both istreams and ostreams would be great. That  
way they could be both used in conjunction for an iostream, with the  
functionality just incrementally added to a base PyTypeObject. But I don't  
see how one could attach additional PyObject methods, like done by a  
class_ template's def methods.

What about memoryviews? If someone was to go ahead and write converters  
for Python memoryviews, are there any C++ standards-compliant classes that  
could be accommodated? i.e. Are there any classes defined in the C++  
standard for multidimensional, buffered memory access? Which, if any would  
be an appropriate match to a Python memoryview? I guess that nested  
std::vectors and lists might be good candidates, but I stand to be  
corrected.


Apologies for the length this became and thank you for sticking with me  
this far. Any advice, suggestions, pointers to code or documentation I've  
probably overlooked or neglected, or even criticism would be appreciated.  
Further discussion on how best to improve Boost Python as it is would be  
great! I do like to contribute to open source communities when possible,  
but I am strained for time...

Thanks again!
Kind regards,
Alex


More information about the Cplusplus-sig mailing list