[C++-sig] Pyste: problems with inheritance across headers.

Tue Jul 8 18:48:22 CEST 2003

Hi,

Prabhu Ramachandran wrote:

>Hi,
>
>I found a few more serious bugs with Pyste.  I split up a simple class
>hierarchy over several header files and also made different pyste
>files to wrap these.  Pyste has difficulty with executing the module
>functions in the right order (as I suspected in an earlier post) and
>also has difficulties with the base classes.  The trouble being that
>the exported_names are grouped by headers and only the exported names
>grouped by header are known when generating the code for a class.  So
>if a base class is in another header the bases will not be correctly
>set leading to problems.
>  
>

Yep, a serious bug indeed. 8(

The original algorithm worked without problems, but I received reports 
(I lost the email, can't tell who was 8/) that the memory consumption 
was too high: while trying to export lots of classes, the memory usage 
reached about 600+Mb, which made pyste unusable. The memory usage grew 
so much because all the headers were being parsed and kept in memory, 
because I needed to order the exports by number of base classes, to 
ensure that base classes are instantiated first.

Then I changed the algorithm to what it is today: group the exporters by 
header files, and parse one at a time, and discard the parsed 
declarations immediately. The rationale is that given a class C in a 
header file, that header contains all information needed about the base 
classes. Of course, I ignored that all the exporters wouldn't be 
avaiable at the same time to order them.

We have two problems here:

1. exported_names is not working correctly, it depends on the order that 
the exporters generate the code, and that order is basically randomical.

2. The order that the classes are being instantiated is not correct.

>Possible resolution for this problem:
>
> 1. Compute all the exported names in one go and then use that
>    information when generating the code.
>

Yes, that solves problem #1. Since we parse all pyste files in the 
beginning, we have all the name of all classes that are being exported. 
I just tested this and it works fine.

> 2. Additionally, do not use grouping by headers to order the writes.
>    A more sophisticated approach is needed to get the right sequence
>    of module initialization.
>  
>

Yes, and here we have a hairy problem, more below.

> 3. Assume that all bases are exported by default instead of relying
>    on exported_names.  I guess this is not an option. :(
>

Agreed. 8/

>The necessity of the exported_names destroys my plans of incremental
>generation of wrappers without generating an additional file with
>necessary information.  Why?  Well, because when you generate wrappers
>with a single pyste file it has no way of knowing the exported names
>from other pyste files and therefore will not set the bases correctly.
>  
>

Indeed, we have to give Pyste all pyste files to be able to generate the 
correct code.

>So the only way of doing this would be to allow Pyste to generate the
>xml file into an _interface.xml in individual steps and then get Pyste
>to use the generated XML files touching only the wrapper files that
>changed.  My guess is that this will reduce the time taken to generate
>the interface files.  This is what Nicodemus was saying earlier but I
>learnt it from 'The University of Hard Knocks'.
>

Hehehe, I didn't know that expression.

>Essentially the --only-wrap and --only-main options are useless.
>

8(

>Instead we need a --generate-xml and --use-xml option.  --generate-xml
>simply generates the xml file(s) and --use-xml does not call gccxml
>and uses the generated xml file(s).  Its up to the build system to
>determine when and how to generate the xml file when the interface
>file changes and that is quite easy to do.
>

Is it easy for a build system to delete a file when a file that it 
depends changes (I am talking here about deleting the xml file whenever 
the header changes)? If so, I believe that an option that tells Pyste of 
a directory where to write xml files would be sufficient. If a file with 
the same name of a header is not present, parse it with gccxml and write 
the file in that directory. Otherwise, use the file already there. If a 
header changes, the xml is deleted and Pyste will regenerate it.

But first we have to solve "The Big Nasty Bug". ;)

>I've attached a tarball of the files that demonstrate the above
>problem with inheritance across several headers.
>  
>

Thanks! They were great help to understand better the bug. I will add 
them to the test suite too.

>Anyway, I'd like to know if the above approach for multiple files is
>possible or I'm missing something again.  If it is, I can probably
>work on a patch sometime, unless Nicodemus has it covered.
>

Let's discuss this and share some work. 8)

To solve the bug, we have to somehow parse all the files first to get 
the ordering for them. Problem is that this consumes a lot of memory 
with large libraries (enough to make it unusable) if we use a naive 
approach. I was thinking along this line:

1. Gather all the "infos" and generate the exported_names dict. Easy enough.
2. Parse all headers, one at a time:
   a) For each export, we generate its "order", since we have a list of 
its bases and the exported_names.
   b) save the declarations to disk somehow, so we can generate the code 
later (Pickle can be our friend here).
   c) destroy the declarations, to save memory
3. Now we have all the exporters, in the right order.
4. For each exporter, read the parsed information from disk, generate 
code, dispose it.

We can just keep the xml file in disk while we order the exporters, 
instead of actually saving it. But then we have to parse it again, and 
that could take a while. If that's the case, we could save a pickle 
representation of the declarations instead.

What do you think?

Thanks again Prabhu!

Regards,
Nicodemus.