[Distutils] distutils data_files and setuptools.pkg_resources are driving me crazy

Robin Bryce robinbryce at gmail.com
Thu Jul 13 21:52:03 CEST 2006


Hi,

[using setuptools 0.6b4]

I'm a setuptools user and greatly appreciative of it as well. I'd like
to understand how to use it more appropriately with respect to
bundling miscellaneous data files. "just put them in python packages"
is really not what I want but perhaps its time to refactor my tastes
and accept this is the most appropriate thing to do.

I think the root of my confusion lies in how I perceive the 'pkg_'
prefix of the setuptools pkg_resources api. I think of it in the
general sense of the word package - data I have 'packaged' along with
my library or application. Experience suggests it would be better read
in the python specific sense of 'package': data that is contained in a
file that is a descendant of a python package (the file must be in or
under a directory that contains an __init__.py).

Which of these two views most accurately reflects the usage for which
pkg_resources was designed ?

Is it possible to have a separate 'zip_safe' decision for data files
versus python packages. Ie., a deployed egg with data files and non
zip safe packages would appear in site-packages (or wherever) as both
a zip archive for the zip safe data AND a directory tree containing
the 'eager' resources ?

As context for the rest of this plea for help: A trivial layout that
bundles a .conf file::

setuptools-test
    lib/foopackage/__init__.py
    lib/foopackage/foo.py
    conf/foo.conf
    setup.py

versions etc:
Ubuntu(dapper), Python 2.4.3, setuptools-0.6b4. My python is installed
with the prefix:  /home/robin/devel/0root. It is a 'proper' install
rather than virtual-python.py setup.

Is there a specific reason why there isn't a find_data_files to
compliment find_packages in setuptools ?

eg., ``data_files=[('/', find_data_files('*.conf'))`` spells:
recursively find all .conf files, starting in the directory containing
my setup.py, and bundle them in my egg root. I would then expect
``unzip -l foopackage-VER-py2.4.egg`` to produce a tree like::

   foopackage/*.py, *.pyc
   conf/foo.conf
   EGG-INFO/ # usual suspects

Does/could setuptools to overload the distutils keyword 'data_files'
and change it's meaning so that it can work with pkg_resources rather
than being --prefix relative ? (package_data, while a useful 2.4
addition, is not what I want here)

In foopackage/foo.py Why are all of::

    pkg_resources(Requirement(__name__), '/conf/foo.conf')
    pkg_resources(Requirement(__name__), 'conf/foo.conf')
    pkg_resources(Requirement('foopackage'), '/conf/foo.conf')

interpreted as relative to the foopackage directory ?

And why does the resource_name '/' not refer to the top of the egg ?

Irrespective of whether I specify a relative or absolute path above
pkg_resources always looks under the top most package directory. Is
this by design ?

Non packaged data files are packaged as siblings of 'foopackage'. What
is the most convenient way package and access these files such that
the references work for egg installs, normal 'setup.py install'
installations and for ``python setup.py develop`` pseudo installs ?


Extending my example with the following changes I explore
pkg_resources.resource_string and friends::

file:setup.cfg
[egg_info]
egg_base=./ # because I guessed (incorrectly) that this would help.

file:setup.py::

    from setuptools import setup, find_packages
    setup(
        name='foopackage',
        packages=find_packages('lib'),
        package_dir={'','lib'},
        data_files=[('conf','conf/foo.conf')],
        entry_points=dict(console_scripts=[
            'fooconf = foopackage.foo:run']))

file:foopackage/foo.py::

    from pkg_resources import resource_string
    def run():
        print __name__,__file__
        try:
            foo_config=resource_string(__name__,'/conf/foo.conf')
        except IOError, e:
            print str(e)
        else:
            print foo_config
    if __name__=='__main__':
        run()


running in place::

    $python setuptools-test/lib/foopackage/foo.py
    __main__ setuptools-test/lib/foopackage/foo.py
    [Errno 20] Not no such file or directory:
'setuptools-test/lib/foopackage/conf/foo.conf'

This is (almost) what I'd expect, I have not run setup.py yet so
setuptools/pkg_resources has no way of knowing anything about my
weirdo preferences. Given that setuptools has not had a chance to see
my egg_base setting, I would expect '/' to mean the directory
*containing* the top most package inferable from __file__. So I would
have expected the path in the error to be
'setuptools-test/lib/conf/foo.conf'. But I don't care so much about
the 'pre setup.py' scenario.

Make an egg::

    $python setup.py bdist_egg --keep-temp
    <snip>
    copying conf/foo.conf -> build/bdist.linux-i686/egg/conf
    creating 'dist/foopackage-0.0.0-py2.4.egg' and adding
    'build/bdist.linux-i686/egg' to it
    $ls build/bdist.linux-i686/egg
    conf EGG-INFO foopackage
    $unzip -l dist/foopackage-0.0.0-py2.4.egg
    # paraphrasing the output
    foopackage/ *.py *.pyc
    conf/foo.conf
    EGG-INFO/

Woot! Exactly what I had hoped for.

Install the package using develop mode (note the explicit egg_base
option above)::

    First, manually clean up site-packages just to be sure. (rm
easy-install.pth;
    rm foopackage*)

    $cd setuptools-test
    $python setup.py develop
    <snip>
    Installing fooconf script to /home/robin/devel/0root/bin
    $cd ..
    $fooconf
    Traceback (most recent call last):
    <snip>
    ImportError: No module named foopackage.foo
    $cat $PYSITE/foopackage.egg-link
    /home/robin/devel/setuptools-test
    $cat $PYSITE/easy-install.pth
    import sys; sys.__plen = len(sys.path)
    /home/robin/devel/setuptools-test
    <snip - its a fresh easy-install.pth file>

Rats. It seems like egg_base is taken both as the place to put my
.egg-info directory AND as the means of deciding what should be placed
on sys.path in order for my package to be importable.

remembering the package_dir option I reach for the distutils docs. But
a variety of package=[], package_dir=[] combinations have no effect on
the easy-install.pth.

Double rats.

Is there a way to have easy-install.pth, in the develop case, to get
entries with the form:

'/path/to/source-root/my/python/packages/live/here'
^--------------------------^ this is the bit we already have
                             | assuming egg_base=./ (which it does not
by default)

AND an independent way of directing pkg_resources to where my data
files are rooted ??

I look at the egg install case::

    delete dist & build & do python setup.py bdist_egg, delete
easy-install.pth and *.link files because I've been fiddling.

    $easy_install dist/foopackage-0.0.0-py2.4.egg
    Processing foopackage-0.0.0-py2.4.egg
    Copying foopackage-0.0.0-py2.4.egg to
/home/robin/devel/0root/lib/python2.4/site-packages
    Adding foopackage 0.0.0 to easy-install.pth file

    Installed /home/robin/devel/0root/lib/python2.4/site-packages/foopackage-0.0.0-py2.4.egg
    Processing dependencies for foopackage==0.0.0

Erm what happened to my script ? (using setuptools 0.6b4). Quoting the
easy_install docs:

"Whenever you install, upgrade, or change versions of a package,
EasyInstall automatically installs the scripts for the selected
package version, unless you tell it not to"

I fire up python::

    $pwd
    /home/robin/devel/setuptools-test
    $cd ..
    $which python
    $/home/robin/devel/0root/bin/python
    $python
    >>>from foopackage.foo import run
    >>>run()
    foopackage.foo
/home/robin/devel/0root/lib/python2.4/site-packages/foopackage-0.0.0-py2.4.egg/foopackage/foo.pyc
    [Errno 2] No such file or directory:
'/home/robin/devel/0root/lib/python2.4/site-packages/foopackage-0.0.0-py2.4.egg/foopackage/conf/foo.conf'
    Ctrl-D
    $ls 0root/lib/python2.4/site-packages/foopackage-0.0.0-py2.4.egg/conf/
    foo.conf

(extreme gnashing of teeth, followed by a visit to the refrigerator so
i can throw some eggs at the wall)

I give up on egg_base, delete my setup.cfg, manually clean up my
site-packages directory and delete my dist & build trees.

I create a new egg, this time without the egg_base option::

    $cd setuptools-test
    $python setup.py bdist_egg
    $unzip -l dist/foopackage-0.0.0-py2.4.egg
    # paraphrasing the output
    foopackage/ *.py *.pyc
    conf/foo.conf
    EGG-INFO/

Again, exactly what I want and shows that egg_base has  *no* effect on
the internal layout of the egg. Lets install it::

    $easy_install dist/foopackage-0.0.0-py2.4.egg
    Processing foopackage-0.0.0-py2.4.egg
    creating /home/robin/devel/0root/lib/python2.4/site-packages/foopackage-0.0.0-py2.4.egg
    Extracting foopackage-0.0.0-py2.4.egg to
/home/robin/devel/0root/lib/python2.4/site-packages
    Adding foopackage 0.0.0 to easy-install.pth file
    Installing fooconf script to /home/robin/devel/0root/bin

    Installed /home/robin/devel/0root/lib/python2.4/site-packages/foopackage-0.0.0-py2.4.egg
    Processing dependencies for foopackage==0.0.0

What the frak, this time I get my script. What on earth does egg_base
have to do with script generation ?

I try it out::

   $fooconf
   foopackage.foo PYSITE/foopackage-0.0.0-py2.4.egg/foopackage/foo.pyc
    [Errno 2] No such file or directory:
'PYSITE/foopackage-0.0.0-py2.4.egg/foopackage/conf/foo.conf'

Curses.

<gripe>The fact that distutils does not include files specified in
MANIFIEST.in for anything other than the sdist command (source
distributions) is really tedious (and horribly confusing when first
encountered).</gripe>

(minor nit) the include_package_data option is well named but poorly described:
"Accept all data files and directories matched by MANIFEST.in or found
in source control".

This is a lie. Only those files that are desendants of a python
package directory (directory that has __init__.py) are considered.

"Accept all python package data files " would reduce confusion and
pointless optimism. At least for those that know just enough to get
themselves into trouble (like me).

I very much would prefer that the machinery for including data files
in a package to be orthogonal to the source building/packaging
machinery.

I think there is a compelling argument that says complex data should
be explicitly packaged separately. Ie if foopackage had non trivial
data then I, as the package author, should create and distribute
foopackage.egg and foopackage.data.egg as separate things.
foopackage.egg would require foopackage.data and would, as an
additionally benefit, be free to use existant setuptools machinery to
separate data versions from package versions. In fact, to argue
completely against the thrust of this mail I'm now thinking along the
lines of:

- *never* package data in the same egg as the application or library
- *always* create a separate foopackage-data package, even if it has
no python source in it beyond setup.py and even if the data is
trivial.
- use the optional dependencies mechanism to pull data in as needed.

Anyhow. Thats got that lot off my chest. I have no intention of giving
up on setuptools, it is *far* to useful for that. I do want to hear
from distutils folks that could help straighten me out :-)

Cheers,

Robin


More information about the Distutils-SIG mailing list