[C++-sig] Patches and complete pyste replacement prototype for pyplusplus

Mon Feb 27 20:11:44 CET 2006

In the following I tried to answer to Allen and Roman postings at the 
same time (as they both refer to the same mail), quoting both of their 
mails....

["Module" vs "Pipeline" vs "ModuleBuilder" vs "module_builder_t]

Allen Bierbaum wrote:
> I debated for quite a while what to call this object.  It corresponds
> roughly to a builder for what pyplusplus calls a module, so I went
> with that.  Another reason I chose this name is that from the user
> perspective what they are trying to build is a python module and this
> is the tool they are using to build it.  I am definitely willing to
> rename this and thinking about it now it may be better to call it
> "ModuleBuilder" or something along those lines.

Roman Yakovenko wrote:
> I would like to call it generator_t, but module_builder_t is also okay.

I'm fine with "Module Builder" as well. As to the exact spelling, I'd 
vote for "ModuleBuilder" for two reasons:

1) It's the same guideline used in the standard Python library (and also 
most other packages I've come across)

2) It separates the high level API classes from the internal lower level 
  API classes. So the user will notice when he's about to use stuff that 
requires some "deeper understanding" of the internals of pyplusplus (and 
that is more likely to break his script in future versions as I'd still 
argue that the low level API is more likely to change than the high 
level API from version to version).

3) It's the naming convention I'm used to because I use it for my own 
stuff as well... ;-)

ok, these are three reasons, I'd vote for it for three reasons.

Allen Bierbaum wrote:
>> In Allen's version, the user always explicitly creates an instance of
>> that class himself, in my version this instance is created internally
>> and each method is also available as function which internally calls the
>> corresponding method of the global instance (if desired the user could
>> also create an instance himself).
> 
> This is definitely one area where our efforts diverged.  I really took
> hold of the idea early on to use an object oriented API throughout
> because:
> - I am assuming that people using the tool know python
> - I want the ability to have binding generation scripts instantiate
> multiple separate builders
> - It seemed to be a good idea conceptually to deal with objects throughout
> 
> I could definitely add a similar interface of global methods that
> automatically call through to a single global instance, but it
> wouldn't work for my bindings so I didn't spend to much time on it. 
> If this is a required capability I could easily add it.

Roman Yakovenko wrote:
> I think object oriented interface is fine. I will not let global variables to
> come into pyplusplus without good reason. More over, I think, that within same
> script it should be possible to use more then one module_builder_t( I
> like this name ).

I introduced the global functions simply as an abbreviation so that I 
could be more concise in my script. My point is not to forbid explicit 
usage of the module builder class. Even in my version of the API you 
could have used the builder class explicitly as Allen did in his version.
Well, having global functions and an internal global builder class is a 
rather minor feature, I could also live without. My argument is just 
that introducing the global functions has absolutely no effect on those 
people who instantiate the builder themselves, whereas leaving the 
functions out affects those people who would prefer to use them. So why 
not let the user decide for themselves?

Roman Yakovenko wrote:
>> In Allen's version there are three main "control methods": parse(),
>> createCreators(), createModule(). In my version, I have the three
>> methods parse(), codeCreators() and writeFiles() which serve the same
>> purpose (as said above, these methods are also available as functions).
>> In both versions, the second step (creating the code creators) is done
>> internally if it wasn't done explicitly by the user (in my version I
>> also applied that rule for the parse() step, but I admit that probably
>> everyone has to do that step manually anyway (but it's a nice feature
>> for a "Hello World" example :) ).
> 
> I can think about an other approach: properties
> For example, class module_builder_t will have 4 properties:
> - parser configuration
>    keeps all data to configure pygccxml parser
> - code creators factory configuration
>    keeps all data to configure module_creator.creator_t class
> - declarations
>    returns declarations
>    within this property, files will be parsed, only once, and
> declaration tree will be returned.
> - module_creator
>    returns module_t code creator that has been created using by creator_t class
> 
> In this way user don't need to think "parsing" and "code creators
> factory", but rather I have a set of declarations, lets do some adaptation.

I have to admit that I caught myself forgetting to call parse() before 
trying to access the declaration tree when I was setting up a simple 
pyplusplus example. But on the other hand, triggering such a "big" 
operation like parsing the headers just by accessing an attribute sounds 
unusual. But then, you didn't say how the attribute access would look 
like. The parse() step could really be done internally once the user 
calls any of the Class(), Method(), etc. methods which is basically what 
you were proposing. I think this is not such a bad idea at all, I'm in 
favor of trying it out. :)

Allen Bierbaum wrote:
>> In both versions, there are methods Class, Method, Function, etc. to
>> select one or more particular declarations that can then be decorated to
>> customize the final bindings. In Allen's version, these function either
>> return a DeclWrapper or MultiDeclWrapper object (depending on whether
>> the selection contains one or more declarations). In my version, the
>> return value is an IDecl object (that always acts like a MultiDeclWrapper).
>> Decorating the declarations also looks almost the same in both versions.
> 
> I thought about doing this similar to Matthias, but I decided that I
> wanted an easy ability to detect user errors and give good warnings. 
> What I found was that by splitting this is two I could have a separate
> interface for MultiDeclWrapper (the case where multiple declarations
> are wrapped) and only allow methods that made sense for multiple
> declarations.  Similarly this interface can modify the way the methods
> operate to make them take into account they they are wrapping multiple
> declarations.   If I made everything wrap multiple declarations then I
> would have to add test/handling code in each method to check wether
> the method was valid.
> 
> I am not too hung up on this though as it was more an implementation
> detail then anything else.

I agree that the decision whether there should be two declaration 
wrapper classes or only one is really just an implementation detail.
I suppose the question rather is what interface we would like to have on 
that declaration wrapper(s) and whether the interface should depend on 
a) the number of contained declarations and b) on the type of the 
contained declaration(s).
Our implementations agreed in that they did not base the interface on 
the declaration type (which means there should already be test/handling 
code in each method). I also didn't base the interface on the number of 
contained declarations because I thought whenever I call a method on a 
MultiDecl object I could just as well iterate over the contained 
declarations and call that method on each of them individually. And 
that's basically what I'm doing, relieving the user from having to write 
that loop himself.

Roman Yakovenko wrote:
> mb = module_builder_t( ... )
> mb.class_ = mb.class_group
> 
> You replace function that return 1 class with function that returns
> many classes. Your code will work without changes.

If the basic idea behind this can be rephrased as "let the user 
customize the API", then I think I can agree, but I'd do it the other 
way. Instead of replacing methods by new methods I would just allow to 
set options that alter the semantics of the methods a little bit. For 
example, you could provide new default values for arguments (like the 
recursive flag mentioned somewhere below) or you could enable/disable 
the automatic assertion feature that I've mentioned in an earlier mail. 
If I want to reference several classes at once I could disable automatic 
assertion for class queries. Whereas if I want to be sure to get exactly 
the class I have specified I enable automatic assertion with a count of 1.

Roman Yakovenko wrote:
> Also we can not join between decl_wrapper and multi_decl_wrapper.
> Every declaration
> has set of unique properties like parent or location. Those properties
> will not be in interface of multi_decl_wrapper.

As mentioned above neither Allen's nor my API bases the declaration 
interface on the *types* of the contained declaration. So currently, you 
don't have that anyway (but this hasn't been a problem for me, and 
obviously neither for Allen as the main purpose of the DeclWrapper class 
is to *decorate* the declarations, the selection has been done earlier).

Allen Bierbaum wrote:
> There is one area here though where I am a little worried.  Namely I
> find the way I query only the children of a declaration to be a little
> more structured.
> 
> For example with my method the user would always go about build up
> their module based on the name hierarchy of the module:
> 
> ns = mod.Namespace("test_ns")
> class1 = ns.Class("class1")
> class1_method1 = class1.Method("method1")
> class2 = ns.Class("class2")
> class2_method1 = class2.Method("method1")
> 
> In Matthias's API I believe you could do something where you could ask
> for all methods named "method1" across the entire decl tree.  

Right, you *could*, but you don't *have to*. The above code would work 
in my version just as well with the same semantics, i.e. class1_method 
would only contain the method of class1 and not the one from class2 
because you called Method() on a previous selection of exactly one 
class. Only if you would call Method() on the main namespace (which by 
default also contains all children nodes) or on a class selection that 
references both classes would you get the "method1" methods from both 
classes.

Suppose I modify the above code and add a line like this (assuming your 
version of the API):

classes = ns.Class("class.*")

This would already address both classes and return a MultiDeclWrapper 
object in the above case. This means, I couldn't call Method() on them 
to further refine my query. But if the library only had a class1 class 
but no class2 class, the above call would return a DeclWrapper object 
and I would get a different interface where Method() is available. In my 
version I wanted to prevent such cases as I consider this to be somewhat 
inconsistent (you cannot tell what interface the returned object has 
just by looking at the above line. You can only answer that question by 
knowing the contents of the headers that were parsed).

The bottom line is that my main argument for my approach would be the 
same as above. Together with auto assertions my approach doesn't affect 
the way you use your API whereas limiting the flexibility affects the 
way I was creating my wrappers. So again, why not letting the users 
decide for themselves which approach suits them best?

Roman Yakovenko wrote:
> But some time it should be possible to say something
> like this: give me all declarations that their names start with QXml or QDom.

That's already possible in both versions by using a regular expression 
(such as QDom.*) on the name.

Allen Bierbaum wrote:
>> Then I ignore all ()-operators that return a reference to a float or
>> double by the following line:
>>
>> Method("operator()", retval=["float &", "double &"]).ignore()
>>
>> Again, this addresses several classes and several methods at once. There
>> are four filters (and three filter types) involved in this query:
> 
> This is the one I am not so sure about.  I like the idea of being able
> to do this but I am not convinced that it should be default behavior
> to search across the entire declaration tree.

Note however, that by using the global Method() function I more or less 
explicitly stated that I really wanted to search the entire declaration 
tree. When I would have wanted to restrict the query to a particular 
class I would have written:

Class("Foo").Method("operator()", retval=["float &", "double &"]).ignore()

> Maybe something like this instead:
> 
> ns = mod.Namespace("test_ns")
> ns.Method("operator()", retval=["float &", "double &"], recursive=True).ignore()
> 
> (notice the explicit request to recursively search).

Well, I could argue that calling Method() on a namespace and explicitly 
setting recursive to True is sort of redundant. ;) But apart from that 
I'm fine with it. (Could we also agree on making the default value for 
recursive customizable? Then it almost feels like home... :)

Roman Yakovenko wrote:
> Well, I think that for the first version we will implement Matthias's API.
> Using it, is very easy to mimic what you want:
> 
> mb = module_builder_t(...)
> 
> class1 = mb.class( name="class1" )
> class1_method1 = mb.member_function( name=class1.decl_string + "method1" )

Ah, it seems there is still some confusion about how my version of the 
API is used. Even though I was mainly using the global query functions 
(that act on the main namespace) you are by no means restricted to them. 
Of course, you can use the respective methods on a previously made 
selection, so the last line in the above example would rather look like 
this:

class1_method1 = class1.member_function( name="method1" )

That is, you would call member_function() on the class and not on the 
namespace.

Allen Bierbaum wrote:
> In my personal opinion (and I am higly biased) I would summarize the
> comparison by saying that the prototype I put together may be further
> ahead on features in general but could definitely be helped out with
> more expressiveness of queries.  

I agree with that summary. :)

> If we could come to some agreement
> about how queries should work across the decl tree I would like to add
> to extend my api proposal with the expressiveness of yours.  I could
> build upon many of the ideas from your implementation and I am already
> thinking of places in my wrapper scripts where doing so would help
> simplify my life quite a bit. :)
> 
> Do you think it would be a good idea for me to refine my prototype
> with your query system or should we start over with a new code base
> merging the best ideas?

As our APIs are close enough I don't think we have to start over from 
scratch again. Feel free to take anything you need from my version and 
post any updates as soon as you have them finished so that I can test it 
and maybe even add some stuff. In the meantime, I'll refrain from doing 
more changes to my version.

Personally, I think the following items have to be sorted out as quickly 
as possible:

- Where is the main version of the "experimental" API kept? Ideally, 
this should be a cvs/subversion repository that we can all access. I 
guess the only repository that is already there is the pyplusplus 
repository itself. But this would mean Roman would have to reserve an 
area in his repository and give us write access to it. Alternatively, 
I'm fine with keeping the main sources in Allen's hands and sending him 
patches whenever someone actually does changes to the code (I'd 
recommend to announce such attempts here so that everyone knows what 
everyone else is up to. Maybe this is really the time to start using the 
wiki).

- What is the internal "decoration" API of pyplusplus? Does the patch 
from Allen already contain everything that is needed? Was this part of 
the patch accepted and applied to cvs? Where is this API documented?

- What are the guidelines for writing doc strings and which tool will be 
used to create reference documentation? (I think pyplusplus itself is 
also in dire need of doc string and now that I keep looking at the 
sources I could just as well provide some doc strings myself. But for 
this, I need to know what guidelines I have to follow (should it be 
plain text or is it ok to add some markup for a specific tool? And if 
so, which tool? epydoc? doxygen? etc))

- Matthias -