[Distutils] namespace packages

Tarek Ziadé ziade.tarek at gmail.com
Fri Apr 23 12:16:11 CEST 2010


On Fri, Apr 23, 2010 at 12:01 PM, David Cournapeau <cournape at gmail.com> wrote:
> On Fri, Apr 23, 2010 at 5:54 PM, Tarek Ziadé <ziade.tarek at gmail.com> wrote:
>
>> I am not sure what you are defining as "complicated". While pkg_resources
>> is hard to read and it's a project on its own with many other
>> features, the use case
>> we are talking about here is dead simple:
>>
>>  scan all sys.path entries to look for .egg and .egg-info files/directories.
>
> My knowledge may be lacking here, but doen't pkg_resources need to
> scan things beyond egg-info (to get namespace_package.txt, presumably)
> ?

That's a file located in egg-info, it reads.

>
> Scanning egg/egg-info is easy, but that does not explain most
> additional syscalls caused by pkg_resources import.

Well it scans directories and open files so you have roughly  (N * F) + P calls
where N is the number of packages, F the average files open per package,
and P the number of entries in sys.path

>
>> We have an implementation
>> (http://bitbucket.org/tarek/pep376/src/tip/pkgutil.py) that is being
>> moved
>> to distutils2, and that will probably land in the stdlib if PEP 376 is accepted.
>
> pkgutil.py implements everything needed for setuptools namespace
> package ? From reading the discussions around PEP 382, it seemed much
> more complicated.

No sorry, I am not talking about the namespace packages in particular anymore,
I am talking about the low-level feature of scanning the metadata of
projects found is sys.path.

That's the basis of all stuff.

>
>> Hey, I barely did any C since college, I am going to give it a shot
>> and see if it goes faster :)
>
> If pkgutils.py is indeed all that is needed to support setuptools
> namespace package, I am willing to look at the code to see if it can
> be sped up.

For any feature that needs to scan the metadata of installed packages,
unless there's a central database, we will have to loop over directories.

Now if we consider that everything loaded in sys.path has to be scaned,
you can't have a central database, thus you need to read the dirs.

> I really doubt that C should be used - if it takes so much
> time for a couple of dozens packages, it is much more likely a design
> problem than an implementation problem.

What's "so much time" ? That's pretty vague. Again, knowing the code,
all it does is I/O and string comparisons, so my guess is that C would
help here.

But guessing is not the right thing to do for optimization, we need facts.
So if you come back with some profiling information on your use case
where it seems so slow,
we can see who is causing this and have a much more efficient
discussion I guess :)

Regards
Tarek

-- 
Tarek Ziadé | http://ziade.org


More information about the Distutils-SIG mailing list