[C++-sig] Py++ and caching
Kirill Lapshin
kir at lapshin.net
Wed Sep 27 16:48:40 CEST 2006
Roman Yakovenko wrote:
>> Is there a way to not regenerate source code if cache is valid?
> The short answer is - no.
> Before we proceed I would like to know the time you will save, can you post it?
> Actually few other metrics like size of the project, size of gccxml
> generated file
> b will be good.
The whole process with parsing and generation takes about 40 sec, if
gccxml output is cached, then runtime is reduced to 15 sec. I don't have
hard numbers on how long does it take to validate cache, but from my
debugging sessions it seemed to be almost instantaneous. The cache file
is 4.6 MB.
I would love to get rid of these remaining 15 secs if possible.
> This is not a new idea. It has been raised few times on pygccxml development
> mailing list. I don't actually understand how it should work
I think it is hard to make bulletproof, but relatively easy to get 99%
there. The hard part is already done -- you already cache gccxml
results. If we hit the cache, then chances are the generated code
already exists and there is no need to update it. There are only two
reasons why that may be not the case: there are no generated files (user
have deleted them), or python code describing what has to be exported
has changed. Arguably first option is quite rare -- if there is cache,
but no generated files then user should've deleted them manually. If we
disregard this case then all we have to check is whether python code
have been changed since last run of Py++. So we basically have to know
the name(s) of python driver script(s) and record somewhere when last
time Py++ has been run.
Accounting for the possibility of tampering with generated files is much
more complicated and arguably is not possible to do reliably for the
case of split files, since without loading gccxml output we can't tell
what are the names of generated files. We can cover one common case of
all generated files being deleted by checking whether main generated
.cpp file is present. Not a bulletproof solution but certainly better
then nothing.
> and more over in
> my opinion this functionality should be implemented outside of the code
> generation script. Like make utility, you describe the dependencies and
> it decide what actions should be run.
Well in some sense what I am asking for is more of build system domain.
In ideal world build system would know that there is no need to run Py++
at all if nothing have changed. Unfortunately we are not living in ideal
world and not all build systems are flexible enough to account for cases
like this.
For example I was struggling with plugging Py++ into MSVC
project/solution build. I've tried quite a few options, and finally
settled on dummy project for code generation that runs Py++ in post
build step. As a result code generation is started every time I run
build, even if I haven't touched anything. Not nice, but other options
were even uglier. No you may see why I would like to save the 15 sec.
> I understand that it is not possible to implement such functionality without
> some help from Py++. I think you need a function that will check whether
> header files are different from those Py++ has in cache.
That should be really simple, no? When I instantiate cache, but before
instantiating module_builder, I can ask cache whether it is stale or
not. Right?
What is needed is some way to quickly devise the names of all output
files. Not sure if it is possible without extending data saved in cache
though.
>
> Py++ has goodies package. This package contains few utilities\convenience
> function that were contributed by other users. Can you create an
> initial version
> of the script and define missing functionality, than we will work
> together to improve it and to add it to the goodies package?
>
Sounds like a good idea. Will give it a try. Though I am bit tied up
with other stuff, and it is not a high priority thing on my todo list.
So don't expect anything soon, but hopefully I will dig into this
eventually.
Kirill
More information about the Cplusplus-sig
mailing list