[C++-sig] Re: C++/Boost vs. Python object memory footprint

Tue Dec 10 11:17:04 CET 2002

Stephen Davies <chalky at ieee.org> writes:

> Hi Dave,
>
> I was wondering what the memory usage of Synopsis would be if I
> converted the in-memory AST to be C++ objects with Boost.Python
> wrappers.
>
> I had 350MB of AST in memory yesterday and my system didn't cope too
> well. I figure each object has at least one dictionary which can't be
> cheap memory wise. The question is, will I still have proxy objects
> floating around if I use Boost.Python, or will the python code directly
> use the C++ objects without creating the instance dictionary et.al.?

All the code for implementing C++ object wrappers is in
libs/python/src/object/class.cpp.  Instance dictionaries are created
only "on demand", the first time the instance's __dict__ attribute is
accessed (see instance_get_dict), but I have no idea whether that
tends to happen almost always or almost never.

In general, a wrapped C++ object with a corresponding Python object is
the size of a new-style class (derived from 'object' in Python)
instance plus the extra size required to allow variable-length data in
the instance, plus the size of the C++ object, plus the size of a
vtable pointer, plus a pointer to the C++ object's instanceholder,
plus zero or more bytes of padding required to ensure that the
instanceholder is properly aligned.  

You can see this in boost/python/object/instance.hpp. Most Python
objects are represented by instance<value_holder<T> >, for some C++
class T.

I'm not sure what you mean by your question "will I still have proxy
objects floating around...?" 

If your C++ data structure contains pointers or smart pointers, you
can arrange for Python objects to be created which only embed those
pointers (instance<pointer_holder<Ptr> >). These Python objects will
be in existence only as long as your Python code holds a reference to
them. So, for example, it should be possible for Python code to do a
walk over your C++ AST, with only O(log(N)) Python objects in
existence corresponding to those N C++ objects at any given time.

> A related question is what the speed is like calling the C++ objects vs.
> normal Python objects. I can't imagine it would be slower. 

I haven't done any tests, but it certainly could be slower if used
poorly.  There is some overhead at the Python/C++ boundary associated
with looking for eligible type converters, overloading, etc.  However,
I imagine that ends up being negligible in most cases.  The best use
of a Boost.Python C++ binding puts a large amount of computation on
the C++ side of the language boundary.  One way I could imagine
slowing down a Python program would be to translate a very large
number of trivial functions to C++.  C++ function wrappers will
certainly occupy more memory than the corresponding Python code would,
and this could eventually affect cache locality.

> The pickling/unpickling speed is also of some concern
> currently. Would this be affected?

Probably, one way or the other ;-) I think that if you treat your tree
as one monolithic Python object from the standpoint of pickling, you
could probably achieve much better speed than you currently have by
writing some C++ serialization/deserialization code and pickling it as
one long string.  Whether or not that's practical for you, of course,
I can't say.

> Perhaps you can add these to the Boost.Python FAQ :)

If you'll help me edit them into a suitable form, I'd be most happy
to!

-- 
                       David Abrahams
   dave at boost-consulting.com * http://www.boost-consulting.com
Boost support, enhancements, training, and commercial distribution