From brett at python.org Sat Feb 1 19:44:12 2014 From: brett at python.org (Brett Cannon) Date: Sat, 1 Feb 2014 13:44:12 -0500 Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files Message-ID: Over on distutils-sig it came up that getting people to not simply assume that __file__ points to an actual file and thus avoid using open() directly to read intra-package files is an issue. In order to make using a loader's get_data reasonable (let alone set_data), there needs to be a clear specification of how things are expected to work and make sure that everything that people need is available. The docs for importlib.ResourceLoader.get_data ( http://docs.python.org/3.4/library/importlib.html#importlib.abc.ResourceLoader.get_data) say that things are expected to be based off of __file__, and with Python 3.4 using only absolute paths (except for __main__) that means all paths would be absolute by default. As long as people stick to pathlib/os.path and don't use non-standard path separators then this should just work. But what if people don't do that? I honestly say that it should either be explicitly undefined or that it's an IOError. IOW either we say "use absolute paths or else you're on your own" or "use absolute paths, period". That prevents having to make a decision as to whether a relative path is relative to the module the loader is attached to or relative to the package (e.g. easier for pre-module loaders or per-package loaders, respectively). The former is more backwards-compatible so I say the docs get updated to say that relative paths are undefined behaviour. The second issue is whether get_data/set_data are enough or if something else is needed, e.g. a listdir-like method. Since this is meant for handling intra-package data my assumption is that it isn't really necessary as chances are you know what files you included in your distribution (or at least what the possible names are). I know some have asked for a listdir-like API to help discover what modules are available so as to provide a plugin API, but I view that as a separate thing and potentially more appropriate on finders. Remember, the smaller the API service for the common case the better for the stdlib. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Sat Feb 1 20:36:52 2014 From: brett at python.org (Brett Cannon) Date: Sat, 1 Feb 2014 14:36:52 -0500 Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files In-Reply-To: References: Message-ID: On Sat, Feb 1, 2014 at 2:04 PM, Paul Moore wrote: > On 1 February 2014 18:44, Brett Cannon wrote: > > The second issue is whether get_data/set_data are enough or if something > > else is needed, e.g. a listdir-like method. Since this is meant for > handling > > intra-package data my assumption is that it isn't really necessary as > > chances are you know what files you included in your distribution (or at > > least what the possible names are). I know some have asked for a > > listdir-like API to help discover what modules are available so as to > > provide a plugin API, but I view that as a separate thing and potentially > > more appropriate on finders. Remember, the smaller the API service for > the > > common case the better for the stdlib. > > One immediate use case for a listdir-type function is virtualenv. > Admittedly, listdir is only half of the problem (and the other half - > the receiving code needs a real file, not a resource - is harder to > adderss) but it is an example of where the API might be helpful. > > Basically, virtualenv uses a virtualenv_support directory (package - > in that it has an __init__.py) to hold the wheels for pip and > setuptools that are to be loaded into the virtualenv being created. > While in theory we know the names of those wheels, the problem is that > wheel names encode the version of the package, so rather than have to > update the code every time we look for setuptools*.whl and pip*.whl. > Also, we support users replacing the supplied wheels with newer > versions, so the actual filenames aren't in our control anyway. > > As I say, to actually use the wheels we need to have them in the > filesystem, so there are other issues that would prevent us from > removing the filesystem assumption in the short term. But we couldn't > even start without a listdir API. > > BTW, an unrelated issue is that if we did go down this route with > virtualenv, we'd be looking at having a resource that is the content > of a zipfile that we'd want to put on sys.path. There's no support in > Python for putting in-memory zipfiles on sys.path. We could, and > probably would, dump the data to a temporary file in the first > instance and put that on sys.path, but in the light of this thread, is > putting in-memory zipfiles onto sys.path something that we should be > supporting? > Not quite sure what you are suggesting as an in-memory zipfile vs. one that isn't. Any zipfile on sys.path has to be in-memory to read from to do a load so I don't know where you are drawing the distinction. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Sat Feb 1 21:37:04 2014 From: brett at python.org (Brett Cannon) Date: Sat, 1 Feb 2014 15:37:04 -0500 Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files In-Reply-To: References: Message-ID: On Feb 1, 2014 3:15 PM, "Paul Moore" wrote: > > On 1 February 2014 19:36, Brett Cannon wrote: > > Not quite sure what you are suggesting as an in-memory zipfile vs. one that > > isn't. Any zipfile on sys.path has to be in-memory to read from to do a load > > so I don't know where you are drawing the distinction. > > Normally you put a filesystem path to the zipfile onto sys.path. If I > load the zipfile via get_data() it's not got a filesystem path, I have > the raw zip data in memory. To import from it I have to write that > data to a file and put the filename onto sys.path. Yes, if you wanted to try to import from a zip file from within your package (which pex stopped doing for sanity reasons) then yes, you are on your own =) -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sat Feb 1 20:04:05 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 1 Feb 2014 19:04:05 +0000 Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files In-Reply-To: References: Message-ID: On 1 February 2014 18:44, Brett Cannon wrote: > The second issue is whether get_data/set_data are enough or if something > else is needed, e.g. a listdir-like method. Since this is meant for handling > intra-package data my assumption is that it isn't really necessary as > chances are you know what files you included in your distribution (or at > least what the possible names are). I know some have asked for a > listdir-like API to help discover what modules are available so as to > provide a plugin API, but I view that as a separate thing and potentially > more appropriate on finders. Remember, the smaller the API service for the > common case the better for the stdlib. One immediate use case for a listdir-type function is virtualenv. Admittedly, listdir is only half of the problem (and the other half - the receiving code needs a real file, not a resource - is harder to adderss) but it is an example of where the API might be helpful. Basically, virtualenv uses a virtualenv_support directory (package - in that it has an __init__.py) to hold the wheels for pip and setuptools that are to be loaded into the virtualenv being created. While in theory we know the names of those wheels, the problem is that wheel names encode the version of the package, so rather than have to update the code every time we look for setuptools*.whl and pip*.whl. Also, we support users replacing the supplied wheels with newer versions, so the actual filenames aren't in our control anyway. As I say, to actually use the wheels we need to have them in the filesystem, so there are other issues that would prevent us from removing the filesystem assumption in the short term. But we couldn't even start without a listdir API. BTW, an unrelated issue is that if we did go down this route with virtualenv, we'd be looking at having a resource that is the content of a zipfile that we'd want to put on sys.path. There's no support in Python for putting in-memory zipfiles on sys.path. We could, and probably would, dump the data to a temporary file in the first instance and put that on sys.path, but in the light of this thread, is putting in-memory zipfiles onto sys.path something that we should be supporting? Paul From p.f.moore at gmail.com Sat Feb 1 21:14:54 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 1 Feb 2014 20:14:54 +0000 Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files In-Reply-To: References: Message-ID: On 1 February 2014 19:36, Brett Cannon wrote: > Not quite sure what you are suggesting as an in-memory zipfile vs. one that > isn't. Any zipfile on sys.path has to be in-memory to read from to do a load > so I don't know where you are drawing the distinction. Normally you put a filesystem path to the zipfile onto sys.path. If I load the zipfile via get_data() it's not got a filesystem path, I have the raw zip data in memory. To import from it I have to write that data to a file and put the filename onto sys.path. Paul. From p.f.moore at gmail.com Sun Feb 2 14:50:24 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 2 Feb 2014 13:50:24 +0000 Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files In-Reply-To: References: Message-ID: On 1 February 2014 18:44, Brett Cannon wrote: > Over on distutils-sig it came up that getting people to not simply assume > that __file__ points to an actual file and thus avoid using open() directly > to read intra-package files is an issue. In order to make using a loader's > get_data reasonable (let alone set_data), there needs to be a clear > specification of how things are expected to work and make sure that > everything that people need is available. An alternative suggestion - now that Python 3.4 has pathlib, how practical would it be to create a ZipFilePath subclass that acted as a concrete path for files in a zipfile? Getting people to use pathlib for dealing with __file__ rather than the old os.path functions might be an easier sell, and if pathlib handled zipfiles "behind the scenes" we could avoid the whole issue. Paul. From p.f.moore at gmail.com Sun Feb 2 14:54:41 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 2 Feb 2014 13:54:41 +0000 Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files In-Reply-To: References: Message-ID: BTW, I keep geting moderated on this list. I just checked and I seem to be subscribed. Is there a problem somewhere? Or have I been blocked for some reason? On 2 February 2014 13:50, Paul Moore wrote: > On 1 February 2014 18:44, Brett Cannon wrote: >> Over on distutils-sig it came up that getting people to not simply assume >> that __file__ points to an actual file and thus avoid using open() directly >> to read intra-package files is an issue. In order to make using a loader's >> get_data reasonable (let alone set_data), there needs to be a clear >> specification of how things are expected to work and make sure that >> everything that people need is available. > > An alternative suggestion - now that Python 3.4 has pathlib, how > practical would it be to create a ZipFilePath subclass that acted as a > concrete path for files in a zipfile? Getting people to use pathlib > for dealing with __file__ rather than the old os.path functions might > be an easier sell, and if pathlib handled zipfiles "behind the scenes" > we could avoid the whole issue. > > Paul. From ncoghlan at gmail.com Mon Feb 3 09:47:29 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 3 Feb 2014 18:47:29 +1000 Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files In-Reply-To: References: Message-ID: On 3 Feb 2014 00:05, "Paul Moore" wrote: > > BTW, I keep geting moderated on this list. I just checked and I seem > to be subscribed. Is there a problem somewhere? Or have I been blocked > for some reason? Maybe someone tried to make you a moderator and set your moderation flag instead? We accidentally did that to Victor on the core mentorship list :P Cheers, Nick. > > On 2 February 2014 13:50, Paul Moore wrote: > > On 1 February 2014 18:44, Brett Cannon wrote: > >> Over on distutils-sig it came up that getting people to not simply assume > >> that __file__ points to an actual file and thus avoid using open() directly > >> to read intra-package files is an issue. In order to make using a loader's > >> get_data reasonable (let alone set_data), there needs to be a clear > >> specification of how things are expected to work and make sure that > >> everything that people need is available. > > > > An alternative suggestion - now that Python 3.4 has pathlib, how > > practical would it be to create a ZipFilePath subclass that acted as a > > concrete path for files in a zipfile? Getting people to use pathlib > > for dealing with __file__ rather than the old os.path functions might > > be an easier sell, and if pathlib handled zipfiles "behind the scenes" > > we could avoid the whole issue. > > > > Paul. > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > https://mail.python.org/mailman/listinfo/import-sig -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Mon Feb 3 17:54:50 2014 From: brett at python.org (Brett Cannon) Date: Mon, 3 Feb 2014 11:54:50 -0500 Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files In-Reply-To: References: Message-ID: On Sun, Feb 2, 2014 at 8:50 AM, Paul Moore wrote: > On 1 February 2014 18:44, Brett Cannon wrote: > > Over on distutils-sig it came up that getting people to not simply assume > > that __file__ points to an actual file and thus avoid using open() > directly > > to read intra-package files is an issue. In order to make using a > loader's > > get_data reasonable (let alone set_data), there needs to be a clear > > specification of how things are expected to work and make sure that > > everything that people need is available. > > An alternative suggestion - now that Python 3.4 has pathlib, how > practical would it be to create a ZipFilePath subclass that acted as a > concrete path for files in a zipfile? Getting people to use pathlib > for dealing with __file__ rather than the old os.path functions might > be an easier sell, and if pathlib handled zipfiles "behind the scenes" > we could avoid the whole issue. > Interesting idea, but I don't know how backwards-compatible it would be. But since pathlib itself has several subclasses I don't see why there couldn't be some way to construct something that understood what part of a path was a zipfile and what was a path within the zipfile. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Feb 3 18:47:16 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 3 Feb 2014 17:47:16 +0000 Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files In-Reply-To: References: Message-ID: On 3 February 2014 16:54, Brett Cannon wrote: > Interesting idea, but I don't know how backwards-compatible it would be. But > since pathlib itself has several subclasses I don't see why there couldn't > be some way to construct something that understood what part of a path was a > zipfile and what was a path within the zipfile. Yeah, I had a play with it last night and it certainly looks plausible. There's a backport of pathlib on PyPI that supports 2.7, so backwards compatibility is reasonable. Partlib's internals are pretty impressive - easy to hook into and extend. I have a sort-of proof of concept, but I need to write some tests to find out if the code actually does what I think it does (note to self - start actually doing test-driven coding :-)) Paul From pje at telecommunity.com Mon Feb 3 21:13:00 2014 From: pje at telecommunity.com (PJ Eby) Date: Mon, 3 Feb 2014 15:13:00 -0500 Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files In-Reply-To: References: Message-ID: On Sat, Feb 1, 2014 at 1:44 PM, Brett Cannon wrote: > Over on distutils-sig it came up that getting people to not simply assume > that __file__ points to an actual file and thus avoid using open() directly > to read intra-package files is an issue. In order to make using a loader's > get_data reasonable (let alone set_data), there needs to be a clear > specification of how things are expected to work and make sure that > everything that people need is available. > > The docs for importlib.ResourceLoader.get_data > (http://docs.python.org/3.4/library/importlib.html#importlib.abc.ResourceLoader.get_data) > say that things are expected to be based off of __file__, and with Python > 3.4 using only absolute paths (except for __main__) that means all paths > would be absolute by default. As long as people stick to pathlib/os.path and > don't use non-standard path separators then this should just work. Wait, what? How can you define an "absolute path" when __file__ might not be a filesystem path? ISTM this *must* be loader-defined. pkg_resources' loader-to-resources adapter framework specifically abstracts a "generate an appropriate get_data() path" operation (the '_fn()' method of ResourceProvider objects) specifically to handle the possibility that a particular loader class handles this differently. I don't see how this can work properly without a higher-level API for resource management, ala pkg_resources or distlib. This seems to me like a place where an API should be provided, rather than have every program have to keep track of available loader implementations. From barry at python.org Mon Feb 3 22:00:36 2014 From: barry at python.org (Barry Warsaw) Date: Mon, 3 Feb 2014 16:00:36 -0500 Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files In-Reply-To: References: Message-ID: <20140203160036.34f52855@anarchist.wooz.org> On Feb 01, 2014, at 01:44 PM, Brett Cannon wrote: >Over on distutils-sig it came up that getting people to not simply assume >that __file__ points to an actual file and thus avoid using open() directly >to read intra-package files is an issue. I've always recommended that people use the Resource Manager APIs of pkg_resources to get at in-package data[*]. Those have always been the most reliable APIs AFAICT, but it's a shame that they're not available in the stdlib in any kind of backward compatible way. Maybe the breadth or implementation of pkg_resources prevents it from being adopted wholesale into stdlib (and of course, it's too late for 3.4), but I really think we need something like that which we can promote loud and far. And then there's PEP 365. pkgutil.get_data() is as close as the stdlib comes I think, but it's not enough since sometimes you actually need a file name, or some of the other pkg_resources APIs. -Barry [*] Specifically: resource_exists(), resource_stream(), resource_string(), resource_isdir(), resource_listdir(). From ncoghlan at gmail.com Tue Feb 4 16:55:04 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 5 Feb 2014 01:55:04 +1000 Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files In-Reply-To: <20140203160036.34f52855@anarchist.wooz.org> References: <20140203160036.34f52855@anarchist.wooz.org> Message-ID: On 4 February 2014 07:00, Barry Warsaw wrote: > On Feb 01, 2014, at 01:44 PM, Brett Cannon wrote: > >>Over on distutils-sig it came up that getting people to not simply assume >>that __file__ points to an actual file and thus avoid using open() directly >>to read intra-package files is an issue. > > I've always recommended that people use the Resource Manager APIs of > pkg_resources to get at in-package data[*]. Those have always been the most > reliable APIs AFAICT, but it's a shame that they're not available in the > stdlib in any kind of backward compatible way. Maybe the breadth or > implementation of pkg_resources prevents it from being adopted wholesale into > stdlib (and of course, it's too late for 3.4), but I really think we need > something like that which we can promote loud and far. And then there's PEP > 365. The problem with trying to use pkg_resources is that it conflates multiple concepts in a hard to disentangle way, and its import time side effects on sys.path are brutally confusing if you're trying to use it to depend on non-default versions of a package on Fedora. You have to get __main__.__requires__ set before pkg_resources is imported, which means you're in a world of pain if you're trying to run inside something like sphinx, gunicorn or nosetests that uses a pkg_resources dependent wrapper script - instead of using the normal CLI for those tools, you instead have to bypass that script to avoid importing pkg_resources too early, and thus you end up with invocation gems like these ones from Beaker: ==== args=[sys.executable, '-c', '__requires__ = ["CherryPy < 3.0"]; import pkg_resources; ' \ 'from gunicorn.app.wsgiapp import run; run()' ... === python -c '__requires__ = ["CherryPy < 3.0"]; import pkg_resources; from nose.core import main; main()' === python -c '__requires__ = [$(SPHINXREQUIRES)]; import pkg_resources; \ ... === We have to do that so we can get our multi-version support requirements into place without the underlying utility choosing the wrong version of key dependencies by default as a side effect of importing pkg_resources to look for the project's entry point. The two core problems from my point of view are that pkg_resources is difficult to comprehend (because so much of it relies on implicit side effects as triggers react to data changes and it has non-trivial side effects on the process global state at import time that may cause failures later) and difficult to refactor (because it's hard to tell what is a guaranteed API and what can be safely changed). There are also a couple of thorny usability bugs that confused even me for a while, and I have a pretty good idea how the import system works: https://bitbucket.org/pypa/setuptools/issue/6/pkg_resources-merrily-adds-site-packages and https://bitbucket.org/pypa/setuptools/issue/2/emit-less-cryptic-error-message-for-a However, once you figure out those arcane workarounds and usability traps (or if you're always using virtual environments and hence never run into them), pkg_resources *works well*. It's only if you're trying to use it in a shared distro environment with multi-level constructs that it can cause trouble. I have some ideas on how to fix those issues (see https://bitbucket.org/pypa/import_resources/overview), but it hasn't made it to the top of my todo list in a very long time (and doesn't appear likely to get there any time soon, either). > pkgutil.get_data() is as close as the stdlib comes I think, but it's not > enough since sometimes you actually need a file name, or some of the other > pkg_resources APIs. > > -Barry > > [*] Specifically: resource_exists(), resource_stream(), resource_string(), > resource_isdir(), resource_listdir(). But unfortunately, you can't even import pkg_resources to get at those without it version locking your entire sys.path. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From barry at python.org Tue Feb 4 17:28:25 2014 From: barry at python.org (Barry Warsaw) Date: Tue, 4 Feb 2014 11:28:25 -0500 Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files In-Reply-To: References: <20140203160036.34f52855@anarchist.wooz.org> Message-ID: <20140204112825.67d7e6ac@anarchist.wooz.org> On Feb 05, 2014, at 01:55 AM, Nick Coghlan wrote: >But unfortunately, you can't even import pkg_resources to get at those >without it version locking your entire sys.path. Which supports my point, i.e. that the stdlib should provide reasonable implementations of these APIs that we can promote far and wide. But FWIW, I've never run into the pkg_resource problems you're describing. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: not available URL: From ncoghlan at gmail.com Wed Feb 5 06:13:57 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 5 Feb 2014 15:13:57 +1000 Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files In-Reply-To: <20140204112825.67d7e6ac@anarchist.wooz.org> References: <20140203160036.34f52855@anarchist.wooz.org> <20140204112825.67d7e6ac@anarchist.wooz.org> Message-ID: On 5 Feb 2014 02:28, "Barry Warsaw" wrote: > > On Feb 05, 2014, at 01:55 AM, Nick Coghlan wrote: > > >But unfortunately, you can't even import pkg_resources to get at those > >without it version locking your entire sys.path. > > Which supports my point, i.e. that the stdlib should provide reasonable > implementations of these APIs that we can promote far and wide. But FWIW, > I've never run into the pkg_resource problems you're describing. If I hadn't started working on a production RHEL application in a Fedora dev environment, I doubt I would have either :) Fedora hits it because we use pkg_resources dependent layouts to ship potentially API incompatible versions of Python packages (CherryPy2 v 3, modern Sphinx in EPEL, etc) that target a common system Python install. The problem is that pkg_resources assumes that either *all* packages are on sys.path by default or none of them are, and doesn't allow requirements to be supplied incrementally, so while this model *does* work, it isn't always pretty and can generate some rather confusing error messages. The key advantages of a new replacement package for the tasks that pkg_resources handles are being able to improve the handling of this scenario, break up the interface to better handle less-Chandler-like use cases in general, simplify the implementation and decouple it from setuptools. However, finding the roundtuits to work on it is a serious challenge, especially when pkg_resources isn't generally *broken*, just user-unfriendly in some cases. It also takes a fairly deep knowledge of both packaging and the import system to even attempt to tackle it, so the intersection between "has the required expertise" and "is interested and available" is currently the null set :P Cheers, Nick. > > -Barry > > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > https://mail.python.org/mailman/listinfo/import-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Wed Feb 5 20:20:25 2014 From: pje at telecommunity.com (PJ Eby) Date: Wed, 5 Feb 2014 14:20:25 -0500 Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files In-Reply-To: References: <20140203160036.34f52855@anarchist.wooz.org> <20140204112825.67d7e6ac@anarchist.wooz.org> Message-ID: On Wed, Feb 5, 2014 at 12:13 AM, Nick Coghlan wrote: > > On 5 Feb 2014 02:28, "Barry Warsaw" wrote: >> >> On Feb 05, 2014, at 01:55 AM, Nick Coghlan wrote: >> >> >But unfortunately, you can't even import pkg_resources to get at those >> >without it version locking your entire sys.path. >> >> Which supports my point, i.e. that the stdlib should provide reasonable >> implementations of these APIs that we can promote far and wide. But FWIW, >> I've never run into the pkg_resource problems you're describing. > > If I hadn't started working on a production RHEL application in a Fedora dev > environment, I doubt I would have either :) > > Fedora hits it because we use pkg_resources dependent layouts to ship > potentially API incompatible versions of Python packages (CherryPy2 v 3, > modern Sphinx in EPEL, etc) that target a common system Python install. > > The problem is that pkg_resources assumes that either *all* packages are on > sys.path by default or none of them are, and doesn't allow requirements to > be supplied incrementally, so while this model *does* work, it isn't always > pretty and can generate some rather confusing error messages. > > The key advantages of a new replacement package for the tasks that > pkg_resources handles are being able to improve the handling of this > scenario, break up the interface to better handle less-Chandler-like use > cases in general, simplify the implementation and decouple it from > setuptools. However, finding the roundtuits to work on it is a serious > challenge, especially when pkg_resources isn't generally *broken*, just > user-unfriendly in some cases. It also takes a fairly deep knowledge of both > packaging and the import system to even attempt to tackle it, so the > intersection between "has the required expertise" and "is interested and > available" is currently the null set :P I don't think Barry was advocating pkg_resources, but rather, having *some* equally-powerful resource API available in the stdlib. But the "resources" part of pkg_resources isn't actually that big, nor is it strongly connected to the rest of pkg_resources. The core of it is just: ResourceManager -- The main API, implements methods for resource_string(), resource_stream(), etc., by delegating to "provider" objects IResourceProvider -- abstract class that just documents what operations a resource provider has to implement get_provider() -- a way to find a __loader__ and look up a IResourceProvider implementation for it get_default_cache() -- a function to return the default cache base directory ExtractionError -- base exception class for resource extraction problems Most of the above is ridiculously straightforward code -- a stdlib implementation would mainly rewrite get_provider(). pkg_resources also contains some provider classes, specifically: NullProvider -- an abstract base that implements IResourceProvider by delegation to some "virtual file system" abstract methods EggProvider -- handle doing paths relative to parent ".egg" container (could be changed to do wheels) DefaultProvider -- standard filesystem implementation of virtual file system methods EmptyProvider -- empty virtual filesystem (no resources) ZipProvider -- zipfile virtual fileystem, with some egg-specific (could be wheel-specific) extraction features This is the sum total of the bits of pkg_resources relevant to such an API. The bits that are .egg specific (one method in EggProvider, a few in ZipProvider) could be readily translated to wheels, for the most part. The get_provider() function is the *only* piece of all this that calls into the rest of pkg_resources, and that could be replaced with distlib calls, if anything. (If the stdlib only supported module-relative resources, even that wouldn't be necessary: the API could run directly off of module names instead of project/distribution names.) It's possible after reviewing these classes and functions, somebody would basically say, "screw this, I'll write my own". Which would actually be reasonable, because there's hardly anything to these classes: they weigh in at maybe 600 lines (including extra blank lines between them) in my last worked-on version, out of nearly 3000 lines in pkg_resources. Many of these classes are 20-40 lines -- ResourceManager and ZipProvider are the only ones that run into hundreds of lines, and in ResourceManager's case it's because of its extensive docstrings. IResourceProvider is pure documentation, since it just documents what methods ResourceManager expects to find. pkg_resources' resource API is basically just the methods of a ResourceManager: resource_listdir(), resource_string(), etc. It creates a default ResourceManager instance, and then exports its methods as API functions. It does it this way because it allows an app to create its own manager with its own cache policies, cleanup, etc., but in the default case the direct API is fine. Few (maybe no) apps actually make their own ResourceManager, but it gives them the option of doing so. (One would simply create a ResourceManager instance (or subclass instance), and then call its .resource_*() methods instead of the module-level APIs.) There: now you know almost as much about the pkg_resources resource management architecture as I do. ;-) Most of what one would do to port this code to a stdlib module would be to delete the unused bits, and replace .egg path/name/metadata conventions with .wheel-appropriate ones. If somebody wants to take a whack at it, I'll be happy to answer questions. Really, this stuff is some of the *simplest* code in pkg_resources that isn't just string parsing code. And it's really old, stable code, in the sense that it was among the first parts of pkg_resources written, and least changed since then: nearly all of it has last-change dates in 2005, with most changes since then being minor feature additions post-Distribute-merge for better error handling, switching away from using zipimport's file cache for zip directory information, Python 3-support tweaks, .dist-info support, etc. (Which also means that there are other people who understood it well enough to make those additions, including Jason, MvL, and Vinay. There's also a "philip_thiem" who apparently did the zipimport->ZipFile changeover about a year ago, and who at first glance appears -- along with Jason -- to have pretty deeply grokked the hairiest part of the whole thing, i.e. the zipfile extraction code.) From ncoghlan at gmail.com Wed Feb 5 23:58:23 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 6 Feb 2014 08:58:23 +1000 Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files In-Reply-To: References: <20140203160036.34f52855@anarchist.wooz.org> <20140204112825.67d7e6ac@anarchist.wooz.org> Message-ID: That's a very fair point - when I dove into pkg_resources I was interested in the WorkingSet issues affecting Beaker, so my comments about complexity are better read as referring specifically to pkg_resources.WorkingSet and the other components related to multi-version support, rather than the resource access API. (And in the context of Chandler as an integrated application and there being no "default" version of packages already on sys.path, pkg_resources.WorkingSet works fine - problems only arise because Fedora *does* bless one version as default, puts it directly on sys.path, and then the first pkg_resources import in an application locks all of those default versions in as the expected versions if you don't arrange to set __main__.__requires__ first, which then doesn't play well with entry point based script wrappers) So extracting just the resource API to add to pkgutil sounds like a good idea to me, and should be a lot simpler than trying to tackle WorkingSet. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From waterbug at pangalactic.us Thu Feb 6 00:44:39 2014 From: waterbug at pangalactic.us (Stephen Waterbury) Date: Wed, 5 Feb 2014 18:44:39 -0500 (EST) Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files Message-ID: <20140205234439.85E1BC404EB@pangalactic.us> On 02/05/2014 05:58 PM, Nick Coghlan wrote: > So extracting just the resource API to add to pkgutil sounds like a good > idea ... API's, yes let's have more of those! (Especially that one ... ;) Cheers, Steve From pje at telecommunity.com Thu Feb 6 18:15:10 2014 From: pje at telecommunity.com (PJ Eby) Date: Thu, 6 Feb 2014 12:15:10 -0500 Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files In-Reply-To: References: <20140203160036.34f52855@anarchist.wooz.org> <20140204112825.67d7e6ac@anarchist.wooz.org> Message-ID: On Wed, Feb 5, 2014 at 5:58 PM, Nick Coghlan wrote: > And in the context of Chandler as an integrated > application and there being no "default" version of packages already on > sys.path, pkg_resources.WorkingSet works fine - problems only arise because > Fedora *does* bless one version as default, puts it directly on sys.path, > and then the first pkg_resources import in an application locks all of those > default versions in as the expected versions if you don't arrange to set > __main__.__requires__ first, which then doesn't play well with entry point > based script wrappers Huh? Entry point script wrappers *set* __requires__ as the very first thing they do, followed by importing pkg_resources. (Alternatively, if you build the scripts using buildout, they have paths hardcoded. Either way, no problems with default versions.) The only way I can see problems is if you *aren't* using entry point wrappers for your scripts, *and* you want non-default versions. > So extracting just the resource API to add to pkgutil sounds like a good > idea to me, and should be a lot simpler than trying to tackle WorkingSet. Indeed. From ncoghlan at gmail.com Fri Feb 7 15:59:44 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 8 Feb 2014 00:59:44 +1000 Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files In-Reply-To: References: <20140203160036.34f52855@anarchist.wooz.org> <20140204112825.67d7e6ac@anarchist.wooz.org> Message-ID: On 7 February 2014 03:15, PJ Eby wrote: > On Wed, Feb 5, 2014 at 5:58 PM, Nick Coghlan wrote: >> And in the context of Chandler as an integrated >> application and there being no "default" version of packages already on >> sys.path, pkg_resources.WorkingSet works fine - problems only arise because >> Fedora *does* bless one version as default, puts it directly on sys.path, >> and then the first pkg_resources import in an application locks all of those >> default versions in as the expected versions if you don't arrange to set >> __main__.__requires__ first, which then doesn't play well with entry point >> based script wrappers > > Huh? Entry point script wrappers *set* __requires__ as the very first > thing they do, followed by importing pkg_resources. (Alternatively, > if you build the scripts using buildout, they have paths hardcoded. > Either way, no problems with default versions.) If only a single layer of software is involved, you're correct, but try it with something like nosetests, gunicorn or Sphinx: *they* use entry point wrapper scripts themselves, so by the time execution gets to the application that wants to set requirements for its dependencies, it's far too late - the dependencies have all been locked to their default versions simply by starting nose/gunicorn/sphinx, and the actual application code doesn't get a chance to change that. > The only way I can see problems is if you *aren't* using entry point > wrappers for your scripts, *and* you want non-default versions. Correct - the problem is specifically with command line applications and daemons that are themselves written to use setuptools wrapper scripts, but then subsequently import application code into the same process. If that application code needs non-default versions of various dependencies, then it's necessary to find a way to bypass the early import of pkg_resources so that you can set __main__.__requires__ appropriately. We currently hack around the limitation by using "python -c" to bypass the normal script wrappers for the affected tools, but it's tremendously ugly. Most people would probably just give up on the idea and run inside a virtual environment instead, but we actually want to get Beaker accepted into Fedora itself eventually, so that means it needs to play nice with the system Python environment. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From pje at telecommunity.com Fri Feb 7 17:27:48 2014 From: pje at telecommunity.com (PJ Eby) Date: Fri, 7 Feb 2014 11:27:48 -0500 Subject: [Import-SIG] making it feasible to rely on loaders for reading intra-package data files In-Reply-To: References: <20140203160036.34f52855@anarchist.wooz.org> <20140204112825.67d7e6ac@anarchist.wooz.org> Message-ID: On Fri, Feb 7, 2014 at 9:59 AM, Nick Coghlan wrote: > We currently hack around the limitation by using "python -c" to bypass > the normal script wrappers for the affected tools, but it's > tremendously ugly. Can you explain the use cases in more detail? It sort of sounds like there ought to be a way for pkg_resources to support this better, at least in conjunction with the code that generates script wrappers. (Of course, that should maybe be in a different thread.)