Recursively listing the contents of a package?

Steven D'Aprano steve at pearwood.info
Fri Dec 25 19:14:43 EST 2015


On Sat, 26 Dec 2015 10:01 am, jeanbigboute at gmail.com wrote:

> As an occasional Python user, I'd like to be able to get for myself a
> high-level overview of a package's capabilities.

Best way to do that is to read the package's documentation on the web.


[...]
> Is there a way to determine if a method/function/correct term has items
> underneath it?
> 
> If such a thing exists, I think I could write the code to descend through
> a package's functions/methods, make a list of nodes and edges, send it to
> networkx, and create a graph of a package's capabilities.

Let's say you want to analyse package aardvark.

First thing is to determine whether it is an actual package, or a single
module, or a "namespace package". Namespace packages are new, and I don't
know anything about them, so you're on your own with them. But for the
others:


import aardvark
if aardvark.__package__ in (None, ''):
    print("regular module")
else:
    print("package")


If it is a regular module, you can use dir() to get a list of names in the
module:


dir(aardvark)
=> returns ['spam', 'eggs', 'Cheese', ...]

For each name, get the actual attribute and inspect what it is:


import inspect
obj = getattr(aardvark, 'spam')
if inspect.isfunction(obj):
    print("it is a function")
elif inspect.isclass(obj):
    print("it is a class")
else:
    print("it is a %s" % type(obj).__name__)


Or, instead of dir() and getattr(), you can use vars() and get a dict of
{name: object}.


vars(aardvark)
=> returns {'spam': <built-in function spam at 0xb7abfc8c>, 
            'eggs': <function eggs at 0xb7df89c4>,
            'Cheese': <class '__main__.Cheese'>,
            ...}


Do the same for classes:


vars(aardvark.Cheese)
=> returns a dict of methods etc.


You can build a tree showing how classes are related to each other by
looking at the class.__mro__, but I believe that Python already has a
class-browser:

import pyclbr

if you can work out how to get useful results out of it. (Frankly, I suspect
it is rubbish, but maybe I just don't understand what it is for or how to
use it.)


If aardvark is a package, it probably contains more than one module, but
they aren't automatically imported when the main module is imported. That
means you can't introspect them from the main module. Instead, look in the
directory where the package exists to get a list of file names:


import os
assert aardvark.__package__
directory = aardvark.__path__[0]
os.listdir(directory)
=> returns ['__init__.py', 'spam.py', 'spam.pyc', 'foo.txt', 'subdir', ...]


Or use os.walk to enter into the subdirectories as well.

That gives you a list of modules in the package:

aardvark  # corresponds to directory/__init__.py
aardvark.spam  # directory/spam.py or spam.pyc


Valid module extensions depend on the version of Python and the operating
system used, but you can expect any of the following will be valid:

.py   # Python source code
.pyc  # pre-compiled byte-code
.pyo  # pre-compiled optimized byte-code
.pyw  # only on Windows
.so   # extension module 
.dll  # extension module?


Any other file extensions may be data files and can probably be ignored.


One complication: a package may be embedded in a zip file, rather than a
directory. I don't know all the details on that -- it isn't well
documented, but I think it is as simple as taking the usual package tree
and zipping it into a .zip file.


I think this more or less gives you the details you need to know to
programmatically analyse a package.



-- 
Steven




More information about the Python-list mailing list