[Distutils] Indexing modules in Python distributions

Thomas Kluyver thomas at kluyver.me.uk
Tue Feb 7 06:29:30 EST 2017


For a variety of reasons, I would like to build an index of what
modules/packages are contained in which distributions ('packages') on
PyPI. For instance:

- Identifying requirements by static analysis of code: 'import zmq' ->
requires pyzmq
- Finding corresponding packages from different packaging systems: pyzmq
on PyPI corresponds to pyzmq in conda, and python[3]-zmq in Debian
repositories. This is an oversimplification, but importable module names
provide a common basis to compare packages. I'd like a tool that could
pick between different ways of installing a given module.

People often assume that the import name is the same as the name on
PyPI. This is true in the vast majority of cases, but there's no
requirement that they are the same, and there are cases where they're
not - pyzmq is one example.

The metadata field 'Provides' is, according to PEP 314, intended for
this purpose, but the standard packaging tools don't make it easy to
use, and consequently very few packages specify it.

I have started putting together a tool to index wheels. It reads a .whl
file, finds modules inside it, and tries to identify namespace packages.
It's still quite rough, but it worked with the wheels I tried.
https://github.com/takluyver/wheeldex

Is this something that other people are interested in?

One thing I'm trying to work out at the moment is how the data would be
accessed: as a web service that tools can query online, or more like
Linux packaging, where tools download and cache a list to do lookups
locally. Or both? There's also, of course, the question of how the index
would be built and updated.

Thanks,
Thomas


More information about the Distutils-SIG mailing list