[Doc-SIG] Building Python Document 30% faster.

Naoki INADA inada-n at klab.jp
Sat Apr 4 18:03:25 CEST 2009


Hi Georg.

>> Attached patches make building document 30% faster.
>> (In my environ. 330sec -> 220sec roughly)
>>
>> I post sphinx.patch to bitbucket, but I don't know where to post docutils.patch.
>> Could anyone review these patch?
>
> I will, when I have a bit more time.

Thank you.

>> But searchindex.js with PyStemmer is different to one with PorterStemmer.
>
> This could be a problem.  The client-side search implemented in JavaScript
> uses exactly the same stemmer (which is necessary to be able to find all
> words).  In short, if you can find a C implementation of the Porter stemmer
> we could include it in Sphinx as an optional extension.

I see.
Original Porter Stemmer is here.
http://tartarus.org/~martin/PorterStemmer/

And that implemented in C. I'll try to make Python wrapper with swig and
compare searchindex.js. Wait for a while.


>> 2. Avoid building OptionParser many times.
>> Sphinx uses docutils.core.publish_parts() without `settings` argument
>> many times.
>> This causes building docutils.frontend.OptionParser many times and consumes
>> 29 seconds.
>>
>> 3. Avoid building NestedStateMachine many times.
>> NestedStateMachine is built and destroyed many times.
>> Recycling that SM make significant performance gain.
>
> I assume that both of this is in the second commit I see on bitbucket?  Both
> look like a worthy optimization.

Former is in bitbucket.
http://bitbucket.org/methane/sphinx-speedup/changeset/72fa0ceefcae/

And later is not in bitbucket because NestedStateMachine is not in Sphinx
but docutils.

-- 
Naoki INADA  <inada-n at klab.jp>
   KLab Inc.  <http://www.klab.jp>


More information about the Doc-SIG mailing list