[Python-Dev] PEP 376 and PEP 302 - allowing import hooks to provide distribution metadata

Paul Moore p.f.moore at gmail.com
Sat Jul 4 16:31:40 CEST 2009


2009/7/4 Paul Moore <p.f.moore at gmail.com>:
> 2009/7/3 Tarek Ziadé <ziade.tarek at gmail.com>:
>> You can give me a bitbucket account so I can give you write access to the repo,
>> There are tests as long as you install Nose.
>
> How do I get the tests to work? Just running nosetests gives an error
> (probably because pkgutil is being imported from the stdlib, rather
> than from this directory).
>
> If I set PYTHONPATH=. then I get errors. I suspect path normalisation
> (for backslashes) in the zipfile handling.

Actually, the test

    assert_equals(list(dist.get_egginfo_files(local=True)),
                  [os.path.join(SITE_PKG, 'mercurial-1.0.1.egg-info/PKG_INFO'),
                   os.path.join(SITE_PKG, 'mercurial-1.0.1.egg-info/RECORD')])

is broken, because the expected value uses slashes, which are *not*
the local separator on win32.

I've attached a patch.

But there's 2 comments I'd make (one minor, one major)

Minor one: The tests often seem to be exercising the internal classes,
not so much the public API, so many of them will probably not be of
much use to me :-(

Major one: I'm not entirely sure what the purpose of the "local" flag
is. With local=True, you are supposed to get a filesystem path in the
local form. But that makes no sense for non-filesystem based loaders
(e.g. zipfiles). So how is a user expected to use the value? With
local=false, you get slash-separated files (which is good, because
it's consistent) which are either absolute (see my earlier comments
about how broken this is in practice) or relative to the "directory"
containing the egg-info file (where this is of course not necessarily
a real filesystem location). But these aren't useful as filesystem
names, because they aren't local format.

So what is the point of the get_installed_files, uses, and
get_egginfo_files functions? Who will use the returned values, and
what for? Where will a user get a value to pass to uses()? Given that
get_egginfo_file takes names relative to the egginfo dir, shouldn't
get_egginfo_files *return* values relative to the same location, so
that they can be used there?

I guess the same type of argument applies with get_inetalled_files()
and uses() - the values returned by get_installed_files can be seen as
a set of opaque tokens (OK, realistically they will be "filenames" in
some sense) which can be passed to uses() - but if they aren't
absolute filesystem paths, you can't compare them. Example:

    distribution a contains file p.py
    distribution b contains file p.py

pkgutil.get_distribution(a).get_installed_files contains 'p.py'
pkgutil.get_distribution(b).uses('p.py') is True

What does this mean? If the 2 distributions are in the same sys.path
element, there's a clash. If not, is there a clash? What if they are
in 2 different directories? What if they are in the same directory,
but it's repeated under different names (e.g. symlinks) on sys.path?
What if one is in a directory and the other in a zip file?

This isn't even related to non-filesystem loaders. It's a fundamental
issue with the whole API. What you seem to want is actually a
canonical "name" for a file object - for filesystem files,
os.path.normcase(os.path.normpath(filename)) is probably good enough,
although it doesn't deal with symlinks. For other types of loader, you
have to rely on the loader itself - loader.get_fullname() is probably
OK. But even then, it's not clear how user code would actually *use*
these APIs.

I think you need some real-world use cases, with actual sample
(pseudo-)code, to validate the design here. As things stand, it's both
confusing and (I suspect) unusable in practice. Sorry, I know that
sounds negative, but if this isn't to be a source of subtle bugs for
years to come, it needs to be clarified now. PEP 302 is still hitting
this type of issue - runpy and importlib have brought out errors and
holes in the protocol quite recently - even though Just and I went to
great lengths to try to tease out hidden assumptions up front.

Concrete proposal:

get_metadata_files() - returns slash-separated names, relative to the
egginfo dir
get_metadata_file(path) - path must be slash-separated, relative to
the egginfo dir

get_installed_files - returns the contents of RECORD unaltered
uses(path) - checks if path is in RECORD

The latter 2 are not very useful in practice - you can't say anything
about entries in different RECORD files, which is likely the real use
case you want. Maybe RECORD could have an extra "Location" entry,
which determines where it exists globally (this would be the directory
to which the filenames were relative, in the case of filesystem-based
distributions) and RECORD entries are comparable if the Location
values in the 2 RECORD files match. That's a lot more complex - but
depending on what use people expect to make of these 2 APIs, it may be
justified.

Paul.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test-win32.patch
Type: text/x-patch
Size: 666 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090704/b82f38ce/attachment-0001.bin>


More information about the Python-Dev mailing list