[C++-sig] Py++ and caching

Kirill Lapshin kir at lapshin.net
Wed Sep 27 16:48:40 CEST 2006


Roman Yakovenko wrote:

>> Is there a way to not regenerate source code if cache is valid?
> The short answer is - no.
> Before we proceed I would like to know the time you will save, can you post it?
> Actually few other metrics like size of the project, size of gccxml
> generated file
> b will be good.

The whole process with parsing and generation takes about 40 sec, if 
gccxml output is cached, then runtime is reduced to 15 sec. I don't have 
hard numbers on how long does it take to validate cache, but from my 
debugging sessions it seemed to be almost instantaneous. The cache file 
is 4.6 MB.

I would love to get rid of these remaining 15 secs if possible.


> This is not a new idea. It has been raised few times on pygccxml development
> mailing list. I don't actually understand how it should work

I think it is hard to make bulletproof, but relatively easy to get 99% 
there. The hard part is already done -- you already cache gccxml 
results. If we hit the cache, then chances are the generated code 
already exists and there is no need to update it. There are only two 
reasons why that may be not the case: there are no generated files (user 
have deleted them), or python code describing what has to be exported 
has changed. Arguably first option is quite rare -- if there is cache, 
but no generated files then user should've deleted them manually. If we 
disregard this case then all we have to check is whether python code 
have been changed since last run of Py++. So we basically have to know 
the name(s) of python driver script(s) and record somewhere when last 
time Py++ has been run.
Accounting for the possibility of tampering with generated files is much 
more complicated and arguably is not possible to do reliably for the 
case of split files, since without loading gccxml output we can't tell 
what are the names of generated files. We can cover one common case of 
all generated files being deleted by checking whether main generated 
.cpp file is present. Not a bulletproof solution but certainly better 
then nothing.


> and more over in
> my opinion this functionality should be implemented outside of the code
> generation script. Like make utility, you describe the dependencies and
> it decide what actions should be run.

Well in some sense what I am asking for is more of build system domain. 
In ideal world build system would know that there is no need to run Py++ 
at all if nothing have changed. Unfortunately we are not living in ideal 
world and not all build systems are flexible enough to account for cases 
like this.

For example I was struggling with plugging Py++ into MSVC 
project/solution build. I've tried quite a few options, and finally 
settled on dummy project for code generation that runs Py++ in post 
build step. As a result code generation is started every time I run 
build, even if I haven't touched anything. Not nice, but other options 
were even uglier. No you may see why I would like to save the 15 sec.

> I understand that it is not possible to implement such functionality without
> some help from Py++. I think you need a function that will check whether
> header files are different from those Py++ has in cache.

That should be really simple, no? When I instantiate cache, but before 
instantiating module_builder, I can ask cache whether it is stale or 
not. Right?

What is needed is some way to quickly devise the names of all output 
files. Not sure if it is possible without extending data saved in cache 
though.

> 
> Py++ has goodies package. This package contains few utilities\convenience
> function that were contributed by other users.  Can you create an
> initial version
> of the script and define missing functionality, than we will work
> together to improve it and to add it to the goodies package?
> 

Sounds like a good idea. Will give it a try. Though I am bit tied up 
with other stuff, and it is not a high priority thing on my todo list. 
So don't expect anything soon, but hopefully I will dig into this 
eventually.


Kirill




More information about the Cplusplus-sig mailing list