[Distutils] buildout: several fold performance increases

Ross Patterson me at rpatterson.net
Sat Jan 21 13:59:41 CET 2012


Sorry, I spoke a little too soon.  I generated and normalized diffs of
the test output (ugh!) and found some differences that were specific to
my env caching changes.  I've resolved all those differences on my
branch.  Mostly involves clearing the appropriate cached envs when using
easy_install.develop or easy_install.build.

Ross

Ross Patterson <me at rpatterson.net> writes:

> I moved this patch to a branch of zc.buildout:
>
> svn+ssh://svn.zope.org/repos/main/zc.buildout/branches/env-cache
>
> Digging into one of the additional failures, I traced the problem to
> develop eggs.  Since zc.buildout.easy_install.develop() can change what
> eggs and versions a pkg_resources.Environment would find, the cached
> environments were out of date after buildout processed develop eggs.  I
> addressed this by finding any cached environments whose paths include
> the path the develop egg is installed into and having those environments
> rescan that path.  With that fix, my zc.buildout tests have no
> additional failures beyond what they do on my system (with an isolated
> python built from source).
>
> I also compared buildout run times (using the 'time' command, not
> cProfile) on a real world buildout with 6 identical parts and a few
> other parts with very similar distribution requirements.  The time
> without the patches was 1m27.513s and with the patches as applied to
> zc.buildout 1.4.4 it was 0m34.674s.  Also, buildout.dumppickedversions
> suffers the same logging hot spot as I fixed in zc.buildout r122980 and
> before I patched it the buildout run time was 2m13s.  Can someone cut a
> release of buildout.dumppickedversions?
>
> All told, this is a 4 fold real world improvement.  With all those hot
> spots addressed there are no obvious wastes I can find in the profiling
> data.  I'd like to merge this into trunk and the 1.4 branch and see
> releases cut of both 1.5 and 1.4.  May I begin the merging?
>
> Ross
>
> Ross Patterson <me at rpatterson.net> writes:
>
>> I've long been perplexed by how long a buildout takes to run with
>> multiple parts whose required distributions are largely similar.  Taking
>> a stab at it, I found two hot spots that yield several fold improvements
>> in performance.
>>
>> First, zc.buildout.easy_install._log_requirements was doing expensive
>> requirements parsing and sorting even when no message would be logged.
>> I committed a fix for it that on a 10 part buildout with a large "eggs"
>> option for each part decreased update time from a cProfile run time of
>> 93 seconds to 15 seconds:
>>
>> http://svn.zope.org/zc.buildout/trunk/src/zc/buildout/easy_install.py?rev=124059&r1=122980&r2=124059
>>
>> Secondly, instantiating pkg_resources.Environment, including the
>> setuptools.package_index.PackageIndex subclass, is very expensive and
>> was being done multiple times for any given part, and was being done for
>> parts whose environments were identical.  There was some existing global
>> caching for package indexes that I've duplicated for environments in the
>> attached patch.
>>
>> Unfortunately, I haven't been able to get a clean test environment for
>> the life of me.  I'm using a clean Python 2.7 build from source, turning
>> everything in ~/.buildout/default.cfg off, and running tests in a clean
>> checkout of the zc.buildout/trunk buildout.  Even under those conditions
>> I get 17 failing tests before any changes.  With this environments
>> cache, I see 41 failures, but I can't make sense of it.  This patch
>> yields another 2-3 fold decrease to 6 seconds for the same buildout and
>> is driven by profiling data, not guessing.  Can someone help me get this
>> patch in?
>>
>> Finally, it would be great to see releases of zc.buildout with these
>> performance improvements get out in the world.  I've been hearing more
>> and more complaints about buildout run times and these are easy fixes.
>> If we can get the second, attached patch in quickly, then I'd say we
>> should release with both.  If not, then it's still worth it to cut a
>> release for the first, already committed patch, which yields the
>> greatest improvement.
>>
>> Thanks!
>> Ross
>
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> http://mail.python.org/mailman/listinfo/distutils-sig



More information about the Distutils-SIG mailing list