simple(?) Python C module question

Wed Nov 17 19:18:44 EST 2010

Mark Crispin <nospam at panda.com> writes:

> I have a Python module written in C that interfaces with an external C
> library.  Basically, the project is to make it possible to use that
> library from Python scripts.  If you know who I am, you can guess
> which library.  :)

You have your very own Wikipedia page, so others probably needn't guess.

> However, I now need to write a method that creates what the library calls a
> "stream", and I need method calls to work on that stream.
>
> The obvious way to do this in any other OO language is to have an
> object that holds the stream from the library in an instance variable
> (which actually will be constant for that object instance), and has
> various object methods for operating on that stream.

Yeah, that's pretty much the way it works in Python, though it's a bit
more tedious.  (I suppose it's too late to suggest using Cython now.  It
makes this sort of thing much less unpleasant.)

> I assume that the object methods are defined by a PyMethodDef table,
> just as they are for the module.  But how do I:
>  [1] define the object

You need two things here: a C structure to represent the innards of your
custom Python type, and a Python type block (a PyTypeObject) providing
everything the Python interpreter needs to know about it.

The C structure can be pretty much whatever you like, as long as it
begins with the macro PyObject_HEAD:

        typedef struct thingy {
          PyObject_HEAD
          /* your stuff here; maybe: */
          stream *s;
        } thingy;

The PyTypeObject is big and boring to fill in but you can probably leave
most of it null.  The structure is described in the Python API manual,
but I think you'll need to fill in:

  * a name for the type (qualified by your module's name; tp_name);
  * the object size (sizeof(thingy); tp_basicsize);
  * a deallocation function (tp_dealloc);
  * some flags (you probably want Py_TPFLAGS_DEFAULT and maybe
    Py_TPFLAGS_BASETYPE if you don't mind people making subclasses;
    tp_flags);
  * a string to be shown to the user if he asks for help about your type
    (tp_doc);
  * a pointer to a methods table (tp_methods);
  * a pointer to a table of attribute getters and setters (if you want
    to expose stuff as attributes rather than methods; tp_getset);
  * an allocator (probably PyType_GenericAlloc; tp_alloc); and
  * a function to construct a new instance (to be called when Python
    tries to construct an object by calling the type -- leave it null if
    you construct instances in some other way; tp_new).

You might also want to implement the tp_str function to provide useful
information about your object's state (e.g., for debugging scripts).
There are some standard protocols which might be useful to implement; it
doesn't sound like your streams would benefit from behaving like numbers
or sequences, but maybe they might be iterable.

You attach methods on the type by mentioning them in the PyMethodDef
table you set in tp_methods; they get a pointer to the recipient object
(a `thingy' as above) so they can find the underlying stream if they
want.  Getters and setters are pretty similar, but have a simpler
interface because they don't need to mess with argument parsing..

>  [2] create an instance of the object with the stream and methods

If you want Python programs to be able to make streams by saying

        stream = mumble.stream(...)

or whatever then you'll need to implement tp_new: this is pretty much
like a standard method, except it gets a pointer to a PyTypeObject to
tell it what kind of type to make.  (The constructor may be called to
construct a subclass of your type.)  Parse the arguments and check that
they're plausible; then make a new skeleton instance by

        t = (thingy *)ty->tp_alloc(ty, 0);

(where ty is the type pointer you received), fill in your bits, and
return (PyObject *)t.

Otherwise, well, you go through the same procedure with tp_alloc, only
you know which type you want statically.

>  [3] hook the object's destruction to a library stream-close function

This is the tp_dealloc function.  It should free up any of your
resources, and then say

        obj->ob_type->tp_free(obj);

Python will have arranged for this function to exist if you left tp_free
as a null pointer.

> Python does NOT need to look at the stream in any way.  Ideally, the
> object is just a blob that only has method calls.

That's what you get anyway, if you use the C API.  If you want Python
programs to be able to poke about inside your objects, you have to let
them explicitly, and that means writing code.

> This ought to be simple, and not even require me to know much Python
> since basically the task is just this module and a few very basic
> Python scripts to use it.  Other people will be writing the real
> scripts.

It doesn't require you to know any Python at all.  It /does/ require a
certain familiarity with the implementation, though.

> Of course, I could just have the open method return the stream pointer
> as a big int, and have module methods that take the stream pointer as
> their first argument, just as in C code.

I don't think Python programmers would thank you for this kind of
obviously unsafe implementation; so you already have my gratitude for
trying to do the job properly.  I know that it's somewhat tedious.  And
consider giving Cython a look.

-- [mdw]