From ncoghlan at gmail.com Wed Nov 2 01:41:31 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 2 Nov 2011 10:41:31 +1000 Subject: [Import-SIG] PEP 382 update In-Reply-To: <20111019221650.2f54058d@resist.wooz.org> References: <4E9F34C8.6040308@v.loewis.de> <20111019221650.2f54058d@resist.wooz.org> Message-ID: On Thu, Oct 20, 2011 at 12:16 PM, Barry Warsaw wrote: > I vaguely recall that something similar has been discussed on the mailing list > before, but that there were problems with directory name markers. ?I could be > misremembering, and will try to find details in my archives. My recollection is similar, but I think the latest version may address those objections by allowing the existing "package/__init__.py" convention to still indicate a package directory *as well as* the new "package.pyp" convention. > Eric did remark last night that while PEP 402 is broader in scope, and *seems* > useful, we really don't know what it will break. ?Still, we need to get this > feature moving again. However, what if PEP 402 was *also* rewritten to only look at directories named "package.pyp" rather than "package" when building the virtual package paths? It still has a more coherent story for handling namespace package initialisation than PEP 382, *without* slowing down existing module and package imports. In the latest PEP 382 update, the PEP proposes that finding a package/__init__.py file *not stop the sys.path scan* (it's actually inconsistent currently, but that behaviour is what the latest additions describe). That means all imports of packages get slower since the whole sys.path is always scanned in order to populate __path__. I believe the PEP 402 approach is much cleaner: both foo.py and foo/__init__.py would stop the sys.path scan *immediately* (thus eliminating any performance impact on existing imports), but subpackage imports ("import foo.bar" or "from foo import bar") will attempt to either convert an existing "foo" module into a package, or else create a new package, by scanning the whole of sys.path for "foo.pyp" directories. The "foo/__init__.py" approach would create self-contained packages that are explicitly closed to extension. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From martin at v.loewis.de Wed Nov 2 05:42:41 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 02 Nov 2011 05:42:41 +0100 Subject: [Import-SIG] PEP 382 update In-Reply-To: References: <4E9F34C8.6040308@v.loewis.de> <20111019221650.2f54058d@resist.wooz.org> Message-ID: <4EB0CA41.5060009@v.loewis.de> > I believe the PEP 402 approach is much cleaner: both foo.py and > foo/__init__.py would stop the sys.path scan *immediately* (thus > eliminating any performance impact on existing imports), but > subpackage imports ("import foo.bar" or "from foo import bar") will > attempt to either convert an existing "foo" module into a package, or > else create a new package, by scanning the whole of sys.path for > "foo.pyp" directories. The "foo/__init__.py" approach would create > self-contained packages that are explicitly closed to extension. I think that's under-specified in PEP 402. It doesn't classify "from foo import bar" as a subpackage import, since it states that this magic only applies to imports involving dotted names. So which imports trigger this path scanning exactly remains to be specified. Performance-wise, I would expect that PEP 382 is more efficient if the package has code in it, and not worse for "pure" namespace packages. If there is code in the package, with PEP 402, you would have to provide a P.py file, and multiple P.pyp directories. On importing P, it searches the path finding P.py. On importing a sub-package, it searches the path *again* to establish the package's __path__. With PEP 382, there is only a single run over the path. PEP 402 might be more efficient for P/__init__.py packages. I'm skeptical that it matters much: the majority of stat calls comes from the many forms of module files, which neither PEP 382 nor PEP 402 would stat after the first hit. PEP 382 might be slightly more efficient here, since a .pyp directory early on the path would already cancel stat calls for module files (.py, .pyc, .pyd, module.pyd, ...). For a namespace package, PEP 402 would scan the entire path for all kinds of modules, even if it eventually turns out that it is only going to use the .pyp directories it has already seen. Regards, Martin From martin at v.loewis.de Wed Nov 2 10:24:24 2011 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 02 Nov 2011 10:24:24 +0100 Subject: [Import-SIG] PEP 382 specification and implementation complete Message-ID: <4EB10C48.6010202@v.loewis.de> I have now written an implementation of PEP 382, and fixed some details of the PEP in the process. The implementation is available at http://hg.python.org/features/pep-382-2/ In the new form, the PEP was much easier to implement than in the first version (plus I understand import.c better now). This implementation now features .pyp directories, zipimporter support, documentation and test cases. As the next step, I'd like to advance this to ask for pronouncement. Regards, Martin From pje at telecommunity.com Wed Nov 2 18:59:10 2011 From: pje at telecommunity.com (PJ Eby) Date: Wed, 2 Nov 2011 13:59:10 -0400 Subject: [Import-SIG] PEP 382 update In-Reply-To: <4E9F34C8.6040308@v.loewis.de> References: <4E9F34C8.6040308@v.loewis.de> Message-ID: On Wed, Oct 19, 2011 at 4:36 PM, "Martin v. L?wis" wrote: > In comparison with PEP 402, after my PyCon DE presentation, people > discussed that they prefer if Python packages require some kind of > explicit declaration - even though Java seems to have done well with > packages being just directories with the package name. In particular, > a Jython guy observed that they would likely have issues with an > approach where a directory P would already be part of a package P, > since they often have directories in Jython that have the name of > Python packages, but are not meant as such. > Unless those directories contain things which are importable, and someone actually imports them, PEP 402 does not treat them as a package. So, I suspect some confusion may have occurred, especially if this was a first exposure to the idea, rather than people actually reading the PEP. -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Wed Nov 2 19:54:37 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 02 Nov 2011 19:54:37 +0100 Subject: [Import-SIG] PEP 382 update In-Reply-To: References: <4E9F34C8.6040308@v.loewis.de> Message-ID: <4EB191ED.1090103@v.loewis.de> Am 02.11.2011 18:59, schrieb PJ Eby: > On Wed, Oct 19, 2011 at 4:36 PM, "Martin v. L?wis" > wrote: > > In comparison with PEP 402, after my PyCon DE presentation, people > discussed that they prefer if Python packages require some kind of > explicit declaration - even though Java seems to have done well with > packages being just directories with the package name. In particular, > a Jython guy observed that they would likely have issues with an > approach where a directory P would already be part of a package P, > since they often have directories in Jython that have the name of > Python packages, but are not meant as such. > > > Unless those directories contain things which are importable, and > someone actually imports them, PEP 402 does not treat them as a package. > So, I suspect some confusion may have occurred, especially if this was > a first exposure to the idea, rather than people actually reading the PEP. To the people, it doesn't really matter whether the directory would be considered as belonging to the package or not. It's more the feeling of properness that gets violated by not having to declare a package directory. In the specific case of Jython, it may be that Jython is willing to treat Java class files as Python modules ("extension" modules); if PEP 402 is accepted and Jython implements it, they might indeed have an issue with directories unexpectedly containing things which are importable. I'm not sure whether that actually is the issue, since I didn't talk to the Jython guy further. Regards, Martin From martin at v.loewis.de Wed Nov 9 10:35:42 2011 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 09 Nov 2011 10:35:42 +0100 Subject: [Import-SIG] PEP 402: specification questions Message-ID: <4EBA496E.2090702@v.loewis.de> I'm trying to understand PEP 402, and have difficulties figuring out what exactly it says. I presume that the section "Specification" is intended to give the complete syntax and semantics of the proposed change to Python. A. In "Virtual Paths", it talks about obtaining importer objects for each path item, and then calling get_subpath on it. In the current implementation, not all sys.path entries correspond to an importer object: so what's the impact (if any) on old-style sys.path entries (i.e. regular directories)? Or, if some "builtin" importer is implied: what is its semantics of get_subpath for the builtin importer? B. "Specification" starts with "importing names containing at least one .". That seems clear enough, however, I wonder whether from zope import interface is supported by the PEP (i.e. where zope.interface is a nested package, yet the names in the import don't contain a dot at all). I presume that the case is meant to be supported, but then I wonder how precisely the mechanism described in the PEP is triggered. Regards, Martin From pje at telecommunity.com Wed Nov 9 18:24:26 2011 From: pje at telecommunity.com (PJ Eby) Date: Wed, 9 Nov 2011 12:24:26 -0500 Subject: [Import-SIG] PEP 402: specification questions In-Reply-To: <4EBA496E.2090702@v.loewis.de> References: <4EBA496E.2090702@v.loewis.de> Message-ID: On Wed, Nov 9, 2011 at 4:35 AM, "Martin v. L?wis" wrote: > I'm trying to understand PEP 402, and have difficulties figuring > out what exactly it says. I presume that the section "Specification" > is intended to give the complete syntax and semantics of the > proposed change to Python. > > A. In "Virtual Paths", it talks about obtaining importer objects > for each path item, and then calling get_subpath on it. In the > current implementation, not all sys.path entries correspond to > an importer object: so what's the impact (if any) on old-style > sys.path entries (i.e. regular directories)? > Sorry - that's meant to be the importer returned by pkgutil.get_importer(); it should probably be made clearer. (IIUC, under the importlib version, there is *always* an importer object, whether you obtain it via pkgutil or some other means.) Or, if some "builtin" importer is implied: what is its semantics > of get_subpath for the builtin importer? > Those described in the rest of the PEP: i.e., if the subpath exists, return it. For a directory, that's os.path.isdir(os.path.join(base_path, name_suffix)). > B. "Specification" starts with "importing names containing at least > one .". That seems clear enough, however, I wonder whether > > from zope import interface > > is supported by the PEP (i.e. where zope.interface is a nested > package, yet the names in the import don't contain a dot at all). > > I presume that the case is meant to be supported, but then I > wonder how precisely the mechanism described in the PEP is > triggered. > IIRC, "from zope import interface" does an import zope.interface internally. It does occur to me, however, that if it first imports zope and tries to get an attribute of it, then that import would fail. That case should probably be addressed explicitly in the PEP. I was under the impression, though, that you were wanting to do a revised PEP 382 instead? -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Wed Nov 9 19:12:44 2011 From: eric at trueblade.com (Eric V. Smith) Date: Wed, 09 Nov 2011 13:12:44 -0500 Subject: [Import-SIG] PEP 402: specification questions In-Reply-To: References: <4EBA496E.2090702@v.loewis.de> Message-ID: <4EBAC29C.4000502@trueblade.com> On 11/09/2011 12:24 PM, PJ Eby wrote: > I was under the impression, though, that you were wanting to do a > revised PEP 382 instead? I think the point is to understand PEP 402 well enough so that we can choose between them. Eric. From martin at v.loewis.de Wed Nov 9 21:49:27 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 09 Nov 2011 21:49:27 +0100 Subject: [Import-SIG] PEP 402: specification questions In-Reply-To: References: <4EBA496E.2090702@v.loewis.de> Message-ID: <4EBAE757.8030907@v.loewis.de> > I was under the impression, though, that you were wanting to do a > revised PEP 382 instead? Responding to this first (I need to study your technical answers in detail later): I'm not quite sure how to proceed from here. I could well imagine merging the two PEPs somehow (and would prefer if they merge into PEP 382 as I'm more familiar with that). If that sounds reasonable to you, feel free to propose any changes that you think should be made to PEP 382. ISTM that the two PEPs give opposing answers to some questions, which ultimately requires somebody to make a decision. I'm not sure which of these differences you consider fundamental, and which arbitrary. To give some examples: - what constitutes a package on disk? - what's the impact of this new feature on existing P/__init__.py packages? - what's the impact on existing modules P.py? - when exactly is the path scan performed? It might also be that you worked on PEP 402 only because PEP 382 appeared stalled (which it was for some time). If you are happy with PEP 382 in its current form, you might want to withdraw PEP 402. Regards, Martin From pje at telecommunity.com Wed Nov 9 23:13:32 2011 From: pje at telecommunity.com (PJ Eby) Date: Wed, 9 Nov 2011 17:13:32 -0500 Subject: [Import-SIG] PEP 402: specification questions In-Reply-To: <4EBAE757.8030907@v.loewis.de> References: <4EBA496E.2090702@v.loewis.de> <4EBAE757.8030907@v.loewis.de> Message-ID: On Wed, Nov 9, 2011 at 3:49 PM, "Martin v. L?wis" wrote: > ISTM that the two PEPs give opposing answers to some questions, > which ultimately requires somebody to make a decision. I'm not sure > which of these differences you consider fundamental, and which > arbitrary. To give some examples: > - what constitutes a package on disk? > - what's the impact of this new feature on existing P/__init__.py > packages? > - what's the impact on existing modules P.py? > - when exactly is the path scan performed? > > It might also be that you worked on PEP 402 only because PEP 382 > appeared stalled (which it was for some time). If you review the Import-SIG traffic from that time period, you'll notice that I first attempted to revise PEP 382 to address various issues -- mostly having to do with clarity and ease of backported implementations for 2.x. As the work went on, it eventually became clear that the reason the terminology was complicated and the spec difficult to clarify (not just for me but for other import-sig participants) was because Python's fundamental notion of packages was flawed, and that what Guido previously tried to do with getting rid of the need for __init__.py (see the references in PEP 402) was a more Pythonic approach (as well as being more familiar to users of other languages). So, the goal for 402 was to make __init__.py ("self-contained" packages) the special case, rather than namespace packages, and achieve a more natural fit and ease overall. The use of .pyp extensions doesn't really fit well with that approach, though. It means, for example, that you have to use ugly paths (e.g. zope.pyp/interface.pyp/foo.py), and you have a less orthogonal path for switching between package types. That is, under 402, you can make a module a package just by adding a directory. And you can make a self-contained package into an open package (or vice versa) by adding or deleting packagename/__init__.py or moving it to packagename.py. In other words, the intention of PEP 402 is to have a uniform and simple way to evolve packages that as a side-effect allows both traditional and "namespace" packages to work. It implements namespace packages by *removing* something (i.e., getting rid of __init__.py) rather than by adding something new (e.g. .pyp extensions). For that reason, I think it's better for the future of the language. > If you are happy > with PEP 382 in its current form, you might want to withdraw PEP 402. > Not really. I think that PEP 402 is approximately how Python packages should have worked all along, and that this is a good opportunity to rectify the current situation. While some projects may run into issues with files becoming importable that previously weren't, any code that was trying to import those modules is already broken. -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Wed Nov 9 23:57:55 2011 From: barry at python.org (Barry Warsaw) Date: Wed, 9 Nov 2011 17:57:55 -0500 Subject: [Import-SIG] PEP 402: specification questions In-Reply-To: References: <4EBA496E.2090702@v.loewis.de> <4EBAE757.8030907@v.loewis.de> Message-ID: <20111109175755.639f811e@resist.wooz.org> On Nov 09, 2011, at 05:13 PM, PJ Eby wrote: >In other words, the intention of PEP 402 is to have a uniform and simple >way to evolve packages that as a side-effect allows both traditional and >"namespace" packages to work. It implements namespace packages by >*removing* something (i.e., getting rid of __init__.py) rather than by >adding something new (e.g. .pyp extensions). For that reason, I think it's >better for the future of the language. That's one thing that appeals to me as a distro packager about PEP 402. Under PEP 402, it seems like it would be less work to modify a set of upstream packages to eliminate the collisions on __init__.py. -Barry From ncoghlan at gmail.com Thu Nov 10 00:36:26 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 10 Nov 2011 09:36:26 +1000 Subject: [Import-SIG] PEP 402: specification questions In-Reply-To: <20111109175755.639f811e@resist.wooz.org> References: <4EBA496E.2090702@v.loewis.de> <4EBAE757.8030907@v.loewis.de> <20111109175755.639f811e@resist.wooz.org> Message-ID: On Thu, Nov 10, 2011 at 8:57 AM, Barry Warsaw wrote: > On Nov 09, 2011, at 05:13 PM, PJ Eby wrote: > >>In other words, the intention of PEP 402 is to have a uniform and simple >>way to evolve packages that as a side-effect allows both traditional and >>"namespace" packages to work. ?It implements namespace packages by >>*removing* something (i.e., getting rid of __init__.py) rather than by >>adding something new (e.g. .pyp extensions). ?For that reason, I think it's >>better for the future of the language. > > That's one thing that appeals to me as a distro packager about PEP 402. ?Under > PEP 402, it seems like it would be less work to modify a set of upstream > packages to eliminate the collisions on __init__.py. Indeed, I don't see PEP 382 reducing the number of "Why doesn't my 'foo' package work?" questions from beginners either, since it just replaces "add an __init__.py" with "change your directory name to 'foo.pyp'". PEP 402, by contrast, should *just work* in the most natural way possible. Similarly, "fixing" packaging conflicts just becomes a matter of making sure that *none* of the distro packages involved install an __init__.py file. By contrast, PEP 382 requires that *all* of the distro packages be updated to install to "foo.pyp" directories instead of "foo" directories. On the other hand, the Zen does say "Explicit is better than implicit" and if we don't allow arbitrary files without an extension as modules, why should we allow arbitrary directories as packages*? From that point of view, PEP 382 is actually just bringing packages into the same extension-based regime that we already use for distinguishing other module types. *This is a deliberate mischaracterisation of PEP 402, but it seems to be a common misperception that is distorting people's reactions to the proposal - 'marker files' actually still exist in that PEP, it's just that their definition is "any valid Python module file or a relevant subdirectory containing such files". If this causes problems for Jython, then they should be able to fix it the same way CPython fixed the DLL naming conflict problem on Windows: by *not* accepting standard Java extensions like ".jar" and ".java" as Jython modules, and instead requiring a Jython specific extension (e.g. ".pyj", similar to the ".pyd" CPython uses for Windows DLLs). While there's no reference implementation for PEP 402 that updates the standard import machinery as yet, it's worth taking a look at Greg Slodkowic's importlib-based implementation that came out of GSoC this year: https://bitbucket.org/jergosh/pep-402 So yeah, I still think PEP 402 is the right answer and am -1 on PEP 382 as a result - while I think PEP 382 *is* an improvement over the status quo, I also thing it represents an unnecessary detour relative to where I'd like to see the import system going. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From martin at v.loewis.de Thu Nov 10 05:54:57 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 10 Nov 2011 05:54:57 +0100 Subject: [Import-SIG] PEP 402: specification questions In-Reply-To: <20111109175755.639f811e@resist.wooz.org> References: <4EBA496E.2090702@v.loewis.de> <4EBAE757.8030907@v.loewis.de> <20111109175755.639f811e@resist.wooz.org> Message-ID: <4EBB5921.3020208@v.loewis.de> Am 09.11.2011 23:57, schrieb Barry Warsaw: > On Nov 09, 2011, at 05:13 PM, PJ Eby wrote: > >> In other words, the intention of PEP 402 is to have a uniform and simple >> way to evolve packages that as a side-effect allows both traditional and >> "namespace" packages to work. It implements namespace packages by >> *removing* something (i.e., getting rid of __init__.py) rather than by >> adding something new (e.g. .pyp extensions). For that reason, I think it's >> better for the future of the language. > > That's one thing that appeals to me as a distro packager about PEP 402. Under > PEP 402, it seems like it would be less work to modify a set of upstream > packages to eliminate the collisions on __init__.py. I think this impression is incorrect. Assuming we are talking about existing packages here that use the existing setuptools namespace mechanism and which have been ported to Python 3 already, then no change to the package should be necessary at all to support PEP 382. Instead, setuptools/distribute should implement the namespace_packages parameter of setup.py in such a way that it a) drops the __init__.py from the sources, as that should contain something like __import__('pkg_resources').declare_namespace(__name__) b) copies the files into a P.pyp folder rather than a P folder on build_py. Such a change would be necessary/possible with either PEP 382 or PEP 402, so it seems to make no difference. Regards, Martin From martin at v.loewis.de Thu Nov 10 06:32:59 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 10 Nov 2011 06:32:59 +0100 Subject: [Import-SIG] PEP 402: specification questions In-Reply-To: References: <4EBA496E.2090702@v.loewis.de> <4EBAE757.8030907@v.loewis.de> <20111109175755.639f811e@resist.wooz.org> Message-ID: <4EBB620B.4030301@v.loewis.de> > Indeed, I don't see PEP 382 reducing the number of "Why doesn't my > 'foo' package work?" questions from beginners either Do beginners really have that question (i.e. can you kindly point me to archived examples of that question)? I'd expect beginners to do whatever tutorials and examples tell them to do. If these refer to P.pyp folders, beginners will just take that as given, and copy it. > since it just replaces "add an __init__.py" with "change your directory name to > 'foo.pyp'". Why should users have to *change* the directory name? When they create a package, they would create the .pyp directory to begin with, so no need to change it. > PEP 402, by contrast, should *just work* in the most > natural way possible. Similarly, "fixing" packaging conflicts just > becomes a matter of making sure that *none* of the distro packages > involved install an __init__.py file. By contrast, PEP 382 requires > that *all* of the distro packages be updated to install to "foo.pyp" > directories instead of "foo" directories. See my message to Barry: setuptools/distribute could do that, with no change to the source tree. When people are ready to give up pre-3.3 support, they would likely have to modify *all* distro packages either way: with PEP 402, they would need to drop the __init__.py from the sources, and with PEP 382, they would additionally need to rename the directory. However, with PEP 382, they don't *have* to do that: some portions of a namespace package may keep the __init__.py, others may drop it, and it would still form a single namespace. OTOH, with PEP 402, *all* portions of the namespace would have simultaneously to agree to use the PEP 402 mechanism, since that mechanism will be ineffective if there is an __init__.py. > While there's no reference implementation for PEP 402 that updates the > standard import machinery as yet, it's worth taking a look at Greg > Slodkowic's importlib-based implementation that came out of GSoC this > year: https://bitbucket.org/jergosh/pep-402 Not sure whether that's the right way to use it, but I tried setting builtins.__import__ = importlib.__import__. Then, with "foo/bar.py" on disk, "import foo.bar" works fine. Interestingly, "import foo" fails afterwards, even though "foo" is in sys.path. Consequently, "from foo import bar" also fails, just as Phillip predicted. I presume that will have to be fixed in the PEP and the implementation. Regards, Martin From ncoghlan at gmail.com Thu Nov 10 07:31:08 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 10 Nov 2011 16:31:08 +1000 Subject: [Import-SIG] PEP 402: specification questions In-Reply-To: <4EBB620B.4030301@v.loewis.de> References: <4EBA496E.2090702@v.loewis.de> <4EBAE757.8030907@v.loewis.de> <20111109175755.639f811e@resist.wooz.org> <4EBB620B.4030301@v.loewis.de> Message-ID: On Thu, Nov 10, 2011 at 3:32 PM, "Martin v. L?wis" wrote: >> Indeed, I don't see PEP 382 reducing the number of "Why doesn't my >> 'foo' package work?" questions from beginners either > > Do beginners really have that question (i.e. can you kindly point > me to archived examples of that question)? I'd expect beginners to > do whatever tutorials and examples tell them to do. If these refer > to P.pyp folders, beginners will just take that as given, and copy > it. Yep, beginners do ask that question, particularly if they have experience with other languages that don't require explicit markers for package directories (hence the tone of PEP 402). I mostly saw this when I was following the Stack Overflow python RSS feed - here's one such example: http://stackoverflow.com/questions/456481/cant-get-python-to-import-from-a-different-folder >> since it just replaces "add an __init__.py" with "change your > directory name to >> 'foo.pyp'". > > Why should users have to *change* the directory name? When they > create a package, they would create the .pyp directory to begin > with, so no need to change it. Because they start by doing the wrong thing, and then go to places like SO to ask why it doesn't work and are told how to fix it. With PEP 402, they wouldn't have to be told how to fix it because it would just work the way they expected. >> PEP 402, by contrast, should *just work* in the most >> natural way possible. Similarly, "fixing" packaging conflicts just >> becomes a matter of making sure that *none* of the distro packages >> involved install an __init__.py file. By contrast, PEP 382 requires >> that *all* of the distro packages be updated to install to "foo.pyp" >> directories instead of "foo" directories. > > See my message to Barry: setuptools/distribute could do that, with > no change to the source tree. It still rings alarm bells for me - there's a non-trivial transform going on between what's in the source tree and what's expected on deployment with that approach, and that's going to break a lot of things. (e.g. symlinking source checkouts into place in order to pretend that plugins are installed) > However, with PEP 382, they don't *have* to do that: some portions > of a namespace package may keep the __init__.py, others may drop it, > and it would still form a single namespace. OTOH, with PEP 402, > *all* portions of the namespace would have simultaneously to agree > to use the PEP 402 mechanism, since that mechanism will be ineffective > if there is an __init__.py. That's a pretty good argument in favour of dropping the "self-contained package" concept from PEP 402, but aside from that aspect, it doesn't help decide between the two. >> While there's no reference implementation for PEP 402 that updates the >> standard import machinery as yet, it's worth taking a look at Greg >> Slodkowic's importlib-based implementation that came out of GSoC this >> year: https://bitbucket.org/jergosh/pep-402 > > Not sure whether that's the right way to use it, but I tried > setting builtins.__import__ = importlib.__import__. Then, > with "foo/bar.py" on disk, "import foo.bar" works fine. Interestingly, > "import foo" fails afterwards, even though "foo" is in sys.path. > Consequently, "from foo import bar" also fails, just as Phillip > predicted. I presume that will have to be fixed in the PEP and > the implementation. Yeah, I haven't actually had a chance to try it out yet. It sounds like it's just an implementation bug in the "import foo" part, though, since PJE is correct in his recollection that the "from x import y" algorithm is along the lines of: import x if hasattr(x, 'y'): return x.y else: import x.y return x.y (It doesn't *quite* work that way, but that's the gist of it: http://hg.python.org/cpython/file/default/Python/import.c#l3171) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Thu Nov 10 07:53:21 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 10 Nov 2011 16:53:21 +1000 Subject: [Import-SIG] PEP 402: specification questions In-Reply-To: References: <4EBA496E.2090702@v.loewis.de> <4EBAE757.8030907@v.loewis.de> <20111109175755.639f811e@resist.wooz.org> <4EBB620B.4030301@v.loewis.de> Message-ID: On Thu, Nov 10, 2011 at 4:31 PM, Nick Coghlan wrote: > On Thu, Nov 10, 2011 at 3:32 PM, "Martin v. L?wis" wrote: >> However, with PEP 382, they don't *have* to do that: some portions >> of a namespace package may keep the __init__.py, others may drop it, >> and it would still form a single namespace. OTOH, with PEP 402, >> *all* portions of the namespace would have simultaneously to agree >> to use the PEP 402 mechanism, since that mechanism will be ineffective >> if there is an __init__.py. > > That's a pretty good argument in favour of dropping the > "self-contained package" concept from PEP 402, but aside from that > aspect, it doesn't help decide between the two. Actually, scratch that part of my response. *Existing* namespace packages that work properly already have a single owner - the one that creates the __init__.py file and sets up the namespace extension mechanisms. They're forced to work that way due to the file collision problem. With PEP 402, those owning packages are the only ones that would have to change. With PEP 382, all the *other* distro packages have to change as well (either directly, or via the packaging utilities modifying path names on installation - in which case, good luck running any affected code from an uninstalled source tree). It seems I now remember at least some of the reasons why we didn't like the "directory extension" idea the first time it was suggested :) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From martin at v.loewis.de Thu Nov 10 19:03:13 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 10 Nov 2011 19:03:13 +0100 Subject: [Import-SIG] PEP 402: specification questions In-Reply-To: References: <4EBA496E.2090702@v.loewis.de> <4EBAE757.8030907@v.loewis.de> <20111109175755.639f811e@resist.wooz.org> <4EBB620B.4030301@v.loewis.de> Message-ID: <4EBC11E1.80409@v.loewis.de> > Actually, scratch that part of my response. *Existing* namespace > packages that work properly already have a single owner How so? The zope package certainly doesn't have a single owner. Instead, it's spread over a large number of subpackages. > With PEP 402, those owning packages are the only ones that would have > to change. No. In setuptools namespace packages, each portion of the namespace (i.e. each distribution) will have it's own __init__.py; which of them gets actually used is arbitrary but also irrelevant since they all look the same. So "only those" is actually "all of them". > With PEP 382, all the *other* distro packages have to > change as well What's a "distro package"? Which are the other ones? They don't need to change at all. The existing setuptools namespace mechanism will continue to work, and you can add PEP 382 package portions freely. > It seems I now remember at least some of the reasons why we didn't > like the "directory extension" idea the first time it was suggested :) Please elaborate - I missed your point. Regards, Martin From martin at v.loewis.de Thu Nov 10 19:14:51 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 10 Nov 2011 19:14:51 +0100 Subject: [Import-SIG] PEP 402: specification questions In-Reply-To: References: <4EBA496E.2090702@v.loewis.de> <4EBAE757.8030907@v.loewis.de> <20111109175755.639f811e@resist.wooz.org> <4EBB620B.4030301@v.loewis.de> Message-ID: <4EBC149B.8090405@v.loewis.de> > Yeah, I haven't actually had a chance to try it out yet. It sounds > like it's just an implementation bug in the "import foo" part, though, > since PJE is correct in his recollection that the "from x import y" > algorithm is along the lines of: > > import x That already fails with PEP 402: you can't import the virtual package, only the subpackages. It then just stops with that import failed, and doesn't even try to look for subpackages. So specification and implementation really match here - it's not just an implementation bug. Regards, Martin From pje at telecommunity.com Thu Nov 10 19:25:39 2011 From: pje at telecommunity.com (PJ Eby) Date: Thu, 10 Nov 2011 13:25:39 -0500 Subject: [Import-SIG] PEP 402: specification questions In-Reply-To: <4EBC11E1.80409@v.loewis.de> References: <4EBA496E.2090702@v.loewis.de> <4EBAE757.8030907@v.loewis.de> <20111109175755.639f811e@resist.wooz.org> <4EBB620B.4030301@v.loewis.de> <4EBC11E1.80409@v.loewis.de> Message-ID: On Thu, Nov 10, 2011 at 1:03 PM, "Martin v. L?wis" wrote: > > Actually, scratch that part of my response. *Existing* namespace > > packages that work properly already have a single owner > > How so? The zope package certainly doesn't have a single owner. Instead, > it's spread over a large number of subpackages. > In distro packages (i.e. "system packages") there may be a namespace-defining package that provides an __init__.py. For example, I believe Debian (system) packages peak.util this way, even though there are many separately distributed peak.util.* (python) packages. > With PEP 402, those owning packages are the only ones that would have > > to change. > > No. In setuptools namespace packages, each portion of the namespace > (i.e. each distribution) will have it's own __init__.py; which of them > gets actually used is arbitrary but also irrelevant since they all look > the same. > > So "only those" is actually "all of them". > Nick is speaking again about system packages released by OS distributors. A naive system package built with setuptools of a namespace package will not contain an __init__.py, but only a .nspkg.pth file used to make the __init__.py unnecessary. (In this sense, the existing setuptools namespace package implementation for system-installed packages is actually a primitive partial implementation of PEP 402.) In summary: some system packages are built with an owning package, some aren't. Those with an owning package will need to drop the __init__.py (from that one package), and the others do not, because they don't have an __init__.py. In either case, PEP 402 leaves the directory layout alone. A version of setuptools intended for PEP 402 support would drop the nspkg.pth inclusion, and a version of "packaging" intended for PEP 402 would simply not add one. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Thu Nov 10 19:28:22 2011 From: pje at telecommunity.com (PJ Eby) Date: Thu, 10 Nov 2011 13:28:22 -0500 Subject: [Import-SIG] PEP 402: specification questions In-Reply-To: <4EBC149B.8090405@v.loewis.de> References: <4EBA496E.2090702@v.loewis.de> <4EBAE757.8030907@v.loewis.de> <20111109175755.639f811e@resist.wooz.org> <4EBB620B.4030301@v.loewis.de> <4EBC149B.8090405@v.loewis.de> Message-ID: On Thu, Nov 10, 2011 at 1:14 PM, "Martin v. L?wis" wrote: > That already fails with PEP 402: you can't import the virtual package, > only the subpackages. It then just stops with that import failed, and > doesn't even try to look for subpackages. So specification and > implementation really match here - it's not just an implementation > bug. > Right - you found a bug in the spec, in that it should either explicitly amend the from-import algorithm to treat the first import the same as an AttributeError, or else say, "use 'import zope.interface as interface' instead". -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Nov 13 06:12:31 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 13 Nov 2011 15:12:31 +1000 Subject: [Import-SIG] Import engine PEP up on python.org Message-ID: I finally got around to updating the import engine draft PEP and publishing it on python.org: http://www.python.org/dev/peps/pep-0406/ I think this is a direction we want to move eventually, but I'm not in any great hurry (in particular, I don't believe any effort should be expended on an import.c based version, so bootstrapping importlib as the standard import mechanism is a blocking dependency*). If it doesn't make 3.3 (and there's a fair chance it won't, since I have a few other things to work on that I think will benefit more people in the near term), then 3.4 isn't that far away. *Brett: do you have a public Hg repo for working on the importlib bootstrapping effort? Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From martin at v.loewis.de Sun Nov 13 09:28:22 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 13 Nov 2011 09:28:22 +0100 Subject: [Import-SIG] Import engine PEP up on python.org In-Reply-To: References: Message-ID: <4EBF7FA6.3000007@v.loewis.de> Am 13.11.2011 06:12, schrieb Nick Coghlan: > I finally got around to updating the import engine draft PEP and > publishing it on python.org: http://www.python.org/dev/peps/pep-0406/ I think the rationale section needs to be improved. In fact, I still don't understand what the objective of this API is (I do understand what it achieves, but it's unclear why having that is desirable, and for what applications). I notice that there is overlap both with multiple subinterpreters, and with restricted execution. It hints at providing both, but actually provides neither. I think the long-term solution really should be proper support for subinterpreters, where there is no global state in C at all. Extension modules already can achieve this through the PEP 3121 API (even though few modules actually do). If the objective is to have more of the import machinery implemented in Python, then making importlib the import machinery might be best. If the objective is to allow hooks into the import procedure, it would be best to just provide the hooks. OTOH, PEP 302 already defined hooks, and it seems that people are happy with these. Regards, Martin From ncoghlan at gmail.com Sun Nov 13 11:21:09 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 13 Nov 2011 20:21:09 +1000 Subject: [Import-SIG] Import engine PEP up on python.org In-Reply-To: <4EBF7FA6.3000007@v.loewis.de> References: <4EBF7FA6.3000007@v.loewis.de> Message-ID: On Sun, Nov 13, 2011 at 6:28 PM, "Martin v. L?wis" wrote: > Am 13.11.2011 06:12, schrieb Nick Coghlan: >> I finally got around to updating the import engine draft PEP and >> publishing it on python.org: http://www.python.org/dev/peps/pep-0406/ > > I think the rationale section needs to be improved. In fact, I still > don't understand what the objective of this API is (I do understand what > it achieves, but it's unclear why having that is desirable, and for what > applications). It's desirable for the same reason *any* form of object-oriented encapsulation is desirable: because it makes it easier to *think* about the problem and manage interdependencies between state updates. I didn't realise the merits of OO designs needed to be justified - I figured the list of 6 pieces of interdependent process global state spoke for itself. > I notice that there is overlap both with multiple subinterpreters, > and with restricted execution. It hints at providing both, but actually > provides neither. It doesn't claim to provide either - it's sole aim is to provide a relatively lightweight mechanism to selectively adjust elements of the import system (e.g. adding to sys.path when importing plugins, but leaving it alone otherwise). But having the import state better encapsulated would make it easier to improve the isolation of subinterpreters so that they aren't sharing Python modules, even if they still share extension modules (you could put a single pointer to the import engine on the interpreter state rather than storing it in sys the way we do now). > I think the long-term solution really should be proper support for > subinterpreters, where there is no global state in C at all. Extension > modules already can achieve this through the PEP 3121 API (even though > few modules actually do). > > If the objective is to have more of the import machinery implemented in > Python, then making importlib the import machinery might be best. Guido already approved (in principle) that change - this PEP would actually depend on that being done first (because I think trying to build this API directly on top of import.c would be a complete waste of time). > If the objective is to allow hooks into the import procedure, it would > be best to just provide the hooks. OTOH, PEP 302 already defined hooks, > and it seems that people are happy with these. No, they're not. Yes, the hooks are *usable*, but they're damn hard to comprehend. When even the *experts* hate messing with a subsystem, it's a hint that there's something wrong with the way it is set up. In this case, I firmly believe a big part of the problem is that the import system is a complex, interdependent mechanism, but there's no coherence to the architecture. It's as if the whole thing was written in C from an architectural point of view, but without even bothering to create a dedicated structure to hold the state. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From martin at v.loewis.de Sun Nov 13 12:36:52 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 13 Nov 2011 12:36:52 +0100 Subject: [Import-SIG] Import engine PEP up on python.org In-Reply-To: References: <4EBF7FA6.3000007@v.loewis.de> Message-ID: <4EBFABD4.1030207@v.loewis.de> > It's desirable for the same reason *any* form of object-oriented > encapsulation is desirable: because it makes it easier to *think* > about the problem and manage interdependencies between state updates. I guess I'm -1 on that PEP then. If it introduces complications just for the sake of some presumed simplification, it's not worth it. > I didn't realise the merits of OO designs needed to be justified - I > figured the list of 6 pieces of interdependent process global state > spoke for itself. Perhaps I'm challenging the specific choice of classes then: I would find it completely reasonable to move all of this into the interpreter state, as I think it's fine that this "global" state is unique to the interpreter. There is only a single __import__ builtin, and the objective of the import statement is to make a change to the "global" state (scoped with the interpreter). >> I notice that there is overlap both with multiple subinterpreters, >> and with restricted execution. It hints at providing both, but actually >> provides neither. > > It doesn't claim to provide either - it's sole aim is to provide a > relatively lightweight mechanism to selectively adjust elements of the > import system (e.g. adding to sys.path when importing plugins, but > leaving it alone otherwise). Ok - that might be a use case. However, I'm skeptical that this PEP is good at achieving that objective - as you notice, there is the challenge of recursive imports. I would rather prefer to make such variables per-thread, or, rather "per context". Something like with sys.extended_path(directory): load_plugin() This would extend the path for all code run within the context of the with statement, but not elsewhere. As an implementation strategy, the thread state would be able to override the global variables, in a stacked (nested) manner. The exact list of variables that can be overridden needs to be carefully considered - for example, I would still view a single sys.modules as important in that use case. > But having the import state better encapsulated would make it easier > to improve the isolation of subinterpreters so that they aren't > sharing Python modules, even if they still share extension modules That already works, no? Subinterpreters don't share Python modules (that's about the only feature about subinterpreters that actually works). > No, they're not. Yes, the hooks are *usable*, but they're damn hard to > comprehend. When even the *experts* hate messing with a subsystem, > it's a hint that there's something wrong with the way it is set up. In > this case, I firmly believe a big part of the problem is that the > import system is a complex, interdependent mechanism, but there's no > coherence to the architecture. It's as if the whole thing was written > in C from an architectural point of view, but without even bothering > to create a dedicated structure to hold the state. I agree that the import system is difficult to understand, and the PEP 302 hooks in particular. I mightily disagree that the cause of these difficulties is the global state used in the implementation. It's rather the order in which things are called, and how they interact, which makes it difficult to understand. Adding an optional keyword argument to some of the function is surely no simplification. Regards, Martin From ericsnowcurrently at gmail.com Sun Nov 13 19:44:51 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sun, 13 Nov 2011 11:44:51 -0700 Subject: [Import-SIG] Import engine PEP up on python.org In-Reply-To: References: <4EBF7FA6.3000007@v.loewis.de> Message-ID: On Nov 13, 2011 3:21 AM, "Nick Coghlan" wrote: > > On Sun, Nov 13, 2011 at 6:28 PM, "Martin v. L?wis" wrote: > > Am 13.11.2011 06:12, schrieb Nick Coghlan: > >> I finally got around to updating the import engine draft PEP and > >> publishing it on python.org: http://www.python.org/dev/peps/pep-0406/ > > > > I think the rationale section needs to be improved. In fact, I still > > don't understand what the objective of this API is (I do understand what > > it achieves, but it's unclear why having that is desirable, and for what > > applications). > > It's desirable for the same reason *any* form of object-oriented > encapsulation is desirable: because it makes it easier to *think* > about the problem and manage interdependencies between state updates. > I didn't realise the merits of OO designs needed to be justified - I > figured the list of 6 pieces of interdependent process global state > spoke for itself. > > > I notice that there is overlap both with multiple subinterpreters, > > and with restricted execution. It hints at providing both, but actually > > provides neither. > > It doesn't claim to provide either - it's sole aim is to provide a > relatively lightweight mechanism to selectively adjust elements of the > import system (e.g. adding to sys.path when importing plugins, but > leaving it alone otherwise). > > But having the import state better encapsulated would make it easier > to improve the isolation of subinterpreters so that they aren't > sharing Python modules, even if they still share extension modules > (you could put a single pointer to the import engine on the > interpreter state rather than storing it in sys the way we do now). > > > I think the long-term solution really should be proper support for > > subinterpreters, where there is no global state in C at all. Extension > > modules already can achieve this through the PEP 3121 API (even though > > few modules actually do). > > > > If the objective is to have more of the import machinery implemented in > > Python, then making importlib the import machinery might be best. > > Guido already approved (in principle) that change - this PEP would > actually depend on that being done first (because I think trying to > build this API directly on top of import.c would be a complete waste > of time). > > > If the objective is to allow hooks into the import procedure, it would > > be best to just provide the hooks. OTOH, PEP 302 already defined hooks, > > and it seems that people are happy with these. > > No, they're not. Yes, the hooks are *usable*, but they're damn hard to > comprehend. When even the *experts* hate messing with a subsystem, > it's a hint that there's something wrong with the way it is set up. This is the big motivator for my talk proposal at the next PyCon, on getting the most out of Python imports. They're woefully under-utilized IMHO, exactly because the import machinery is generally poorly understood. -eric > In > this case, I firmly believe a big part of the problem is that the > import system is a complex, interdependent mechanism, but there's no > coherence to the architecture. It's as if the whole thing was written > in C from an architectural point of view, but without even bothering > to create a dedicated structure to hold the state. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Nov 14 02:27:11 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 14 Nov 2011 11:27:11 +1000 Subject: [Import-SIG] Import engine PEP up on python.org In-Reply-To: <4EBFABD4.1030207@v.loewis.de> References: <4EBF7FA6.3000007@v.loewis.de> <4EBFABD4.1030207@v.loewis.de> Message-ID: On Sun, Nov 13, 2011 at 9:36 PM, "Martin v. L?wis" wrote: >> It's desirable for the same reason *any* form of object-oriented >> encapsulation is desirable: because it makes it easier to *think* >> about the problem and manage interdependencies between state updates. > > I guess I'm -1 on that PEP then. If it introduces complications just > for the sake of some presumed simplification, it's not worth it. I think you're right on the PEP *as it stands* - I don't think it's an improvement on the status quo yet. However, I think it's a useful starting point for using the tools we have available (i.e. classes and context managers) to make further progress in bringing the complexity under control and making the import system as a whole less intimidating and magical. >> I didn't realise the merits of OO designs needed to be justified - I >> figured the list of 6 pieces of interdependent process global state >> spoke for itself. > > Perhaps I'm challenging the specific choice of classes then: I would > find it completely reasonable to move all of this into the interpreter > state, as I think it's fine that this "global" state is unique to the > interpreter. There is only a single __import__ builtin, and the > objective of the import statement is to make a change to the "global" > state (scoped with the interpreter). Yes, I think that's a reasonable way of looking at it. I believe there's merit in partitioning off the import state from everything else, but the basic idea would also be served just by moving everything to the interpreter state without adding a new class level to the mix (in fact, as it turns out, 'sys' as a whole is effectively part of the interpreter state, so this is already true to some degree). The analogy that occurred to me after reading your reply was the past migration from the separate "last_traceback", "last_type", "last_value" attributes in the sys module to the consolidated triple returned by sys.exc_info(). >>> I notice that there is overlap both with multiple subinterpreters, >>> and with restricted execution. It hints at providing both, but actually >>> provides neither. >> >> It doesn't claim to provide either - it's sole aim is to provide a >> relatively lightweight mechanism to selectively adjust elements of the >> import system (e.g. adding to sys.path when importing plugins, but >> leaving it alone otherwise). > > Ok - that might be a use case. However, I'm skeptical that this PEP is > good at achieving that objective - as you notice, there is the challenge > of recursive imports. > > I would rather prefer to make such variables per-thread, or, rather > "per context". Something like > > with sys.extended_path(directory): > ?load_plugin() > > This would extend the path for all code run within the context of the > with statement, but not elsewhere. As an implementation strategy, the > thread state would be able to override the global variables, in a > stacked (nested) manner. The exact list of variables that can be > overridden needs to be carefully considered - for example, I would > still view a single sys.modules as important in that use case. One advantage of the OO model is that it allows such decisions to be made on a case-by-case basis - as the PEP shows, you can use properties to control which attributes are isolated from the process global state and which still perform global modifications. >> But having the import state better encapsulated would make it easier >> to improve the isolation of subinterpreters so that they aren't >> sharing Python modules, even if they still share extension modules > > That already works, no? Subinterpreters don't share Python modules > (that's about the only feature about subinterpreters that actually > works). You're quite right - I forgot that the subinterpreter initialisation already takes care to ensure that the subinterpreter gets a new copy of the sys module, and then reinitialises the import state for that new copy. So I guess this proposal can be seen as an intermediate level of isolation that is accessible from Python code, without requiring actually swapping out the interpreter state the way Py_NewInterpreter() does. >> No, they're not. Yes, the hooks are *usable*, but they're damn hard to >> comprehend. When even the *experts* hate messing with a subsystem, >> it's a hint that there's something wrong with the way it is set up. In >> this case, I firmly believe a big part of the problem is that the >> import system is a complex, interdependent mechanism, but there's no >> coherence to the architecture. It's as if the whole thing was written >> in C from an architectural point of view, but without even bothering >> to create a dedicated structure to hold the state. > > I agree that the import system is difficult to understand, and the > PEP 302 hooks in particular. I mightily disagree that the cause of > these difficulties is the global state used in the implementation. > It's rather the order in which things are called, and how they interact, > which makes it difficult to understand. I don't think there's any one thing that makes it so hard to understand - I think it's a lot of smaller things stacking on top of each other. Off the top of my head: - 6 pieces of interdependent global state in 'sys' and 'imp' - distributed state in package '__path__' attributes - lack of builtin PEP 302 support for the standard filesystem import mechanism (hence the undocumented emulation inside pkgutil) - difficulty of tweaking pieces of the import algorithm while preserving the rest without copying a lot of code - scattered APIs (across imp, importlib and pkgutil) for correctly handling import state updates and data driven imports importlib is a big step forward on several of those fronts - if Brett wants/needs help bootstrapping it in as the main import implementation, then that's a far more important task than adding a top-level object-oriented API. However, a top level OO API will still be beneficial in at least a couple of respects: - it becomes significantly easier to replace *elements* of the import mechanism, beyond the hooks provided by PEP 302. Specifically, you can *subclass* the engine implementation and only replace the parts you want to change. - you can provide convenience functions that handle multi-stage updates to the import state in a consistent fashion (cf. many of the details in PEP 402 regarding correctly updating package paths as sys.path changes). > Adding an optional keyword > argument to some of the function is surely no simplification. Yeah, that's by far the weakest part of the idea so far - figuring out how to integrate it with the existing PEP 302 mechanisms. As an initial step, I'm now thinking we may be able to do something based on context management and the import lock that is even simpler than going to thread-local storage: offer a context manager as part of the engine API that acquires the import lock, swaps out all of the state in sys.modules for the engine's own state, then reverses the process at the end. Something like: IMPORT_STATE_ATTRS = ("path", "modules", "path_importer_cache", "meta_path", "path_hooks") @contextmanager def import_context(self): imp.acquire_lock() try: orig_state = tuple(getattr(sys, attr) for attr in IMPORT_STATE_ATTRS) new_state = tuple(getattr(self, attr) for attr in IMPORT_STATE_ATTRS) restore_state = [] try: for attr, new_value, orig_value in zip(state_attrs, new_state, orig_state): setattr(sys, attr, new_value) restore_state.append((attr, orig_value)) yield self finally: for attr, orig_value in restore_state: setattr(sys, attr, orig_value) finally: imp.release_lock() We would have to go through the interpreter and eliminate all of the current locations where 'sys' gets bypassed to make that work, though (e.g. most direct access to interp->modules from C code would need to be updated to go through 'sys' instead). The bar for the PEP really needs to be set at "existing importers and loaders work without modification" (so long as they're not caching sys attributes when they really shouldn't be) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Mon Nov 14 02:46:14 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 14 Nov 2011 11:46:14 +1000 Subject: [Import-SIG] Import engine PEP up on python.org In-Reply-To: References: <4EBF7FA6.3000007@v.loewis.de> <4EBFABD4.1030207@v.loewis.de> Message-ID: On Mon, Nov 14, 2011 at 11:27 AM, Nick Coghlan wrote: > We would have to go through the interpreter and eliminate all of the > current locations where 'sys' gets bypassed to make that work, though > (e.g. most direct access to interp->modules from C code would need to > be updated to go through 'sys' instead). Alternatively, we could decide to skip supporting isolation of sys.modules altogether in the initial incarnation - that would also deal with the builtins and extension module problem. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From brett at python.org Mon Nov 14 17:25:17 2011 From: brett at python.org (Brett Cannon) Date: Mon, 14 Nov 2011 11:25:17 -0500 Subject: [Import-SIG] Import engine PEP up on python.org In-Reply-To: References: Message-ID: On Sun, Nov 13, 2011 at 00:12, Nick Coghlan wrote: [SNIP] > > *Brett: do you have a public Hg repo for working on the importlib > bootstrapping effort? > Yep. If you look at?http://hg.python.org/sandbox/bcannon/ you will find a bootstrap_importlib branch. It's been about two months since I had a chance to work on it, so it needs to be merged with default. In the branch you will find a FAILING file which contains the test names of tests that are failing and a comment as to why (one I have not fully dived into and the test_pydoc failure can be fixed once ImportError has an attribute specifying what module it couldn't import). You can pass the file to regrtest to easily test the known failures. Otherwise comments on what is left to be done can be found in Python/pythonrun.c (although the comment about zipimport is out-of-date; I fixed that in?rev 72162:b4edd0d9fce6). At this point all that is left (I think) is dealing with: 1. _io wanting to import os at module initialization time (I suspect it can just be a post-importlib call in pythonrun.c to setup) 2. exposing the APIs that are added in importlib.__init__ (case_ok from import.c and reading/writing longs from marshal; need to add a comment to pythonrun.c about this) 3. adding a build rule to freeze importlib for importation (thinking it might not be best to do it automatically to make it easier to fix bugs using a known, good version of importlib, but that's still up in the air) IOW nothing crazy or insurmountable. I'm still hoping to be damn close by PyCon, but who knows. I have just moved to Toronto so at least my life should be stabilizing, but I am starting on a new team so I don't know what kind of ramp-up that will entail. I have absolutely no issues with receiving help on this; importlib if people want to help (and the C code sans build stuff to get the freezing working shouldn't be nuts, so people like Eric can help if they want =). > > Cheers, > Nick. > > -- > Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig From ericsnowcurrently at gmail.com Mon Nov 14 18:51:56 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 14 Nov 2011 10:51:56 -0700 Subject: [Import-SIG] Import engine PEP up on python.org In-Reply-To: References: Message-ID: On Mon, Nov 14, 2011 at 9:25 AM, Brett Cannon wrote: > IOW nothing crazy or insurmountable. I'm still hoping to be damn close > by PyCon, but who knows. I have just moved to Toronto so at least my > life should be stabilizing, but I am starting on a new team so I don't > know what kind of ramp-up that will entail. I have absolutely no > issues with receiving help on this; importlib if people want to help > (and the C code sans build stuff to get the freezing working shouldn't > be nuts, so people like Eric can help if they want =). I'd be glad to. :) I've cloned the repo[1] so I can work on it. Let me know if that's a problem. Any work I do I'll track on the existing ticket[2]. Between the PyCon program committee and the talk proposals I have in, I won't have time to work on this for a little while; but I agree that it'd be good to get this done sooner, rather than later. -eric [1] https://bitbucket.org/ericsnowcurrently/bcannon_sandbox [2] http://bugs.python.org/issue2377 From brett at python.org Mon Nov 14 19:02:48 2011 From: brett at python.org (Brett Cannon) Date: Mon, 14 Nov 2011 13:02:48 -0500 Subject: [Import-SIG] Import engine PEP up on python.org In-Reply-To: References: Message-ID: On Mon, Nov 14, 2011 at 12:51, Eric Snow wrote: > On Mon, Nov 14, 2011 at 9:25 AM, Brett Cannon wrote: > > IOW nothing crazy or insurmountable. I'm still hoping to be damn close > > by PyCon, but who knows. I have just moved to Toronto so at least my > > life should be stabilizing, but I am starting on a new team so I don't > > know what kind of ramp-up that will entail. I have absolutely no > > issues with receiving help on this; importlib if people want to help > > (and the C code sans build stuff to get the freezing working shouldn't > > be nuts, so people like Eric can help if they want =). > > I'd be glad to. :) I've cloned the repo[1] so I can work on it. Let > me know if that's a problem. Not at all! Reason we moved to a DVCS was so people could do what you are doing easily. > Any work I do I'll track on the existing > ticket[2]. > Wow, that ticket will be 4 years old come PyCon 2012. =P > > Between the PyCon program committee and the talk proposals I have in, > I won't have time to work on this for a little while; but I agree that > it'd be good to get this done sooner, rather than later. > No rush. =) I have had no time to do a single review for PyCon and the PC comes first since that has a hard deadline. > > -eric > > [1] https://bitbucket.org/ericsnowcurrently/bcannon_sandbox > [2] http://bugs.python.org/issue2377 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Nov 16 07:29:52 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 16 Nov 2011 16:29:52 +1000 Subject: [Import-SIG] PEP 395 (Module aliasing) and the namespace PEPs Message-ID: One of the fixes PEP 395 (module aliasing) proposes is to make running modules inside packages by filename work correctly (i.e. without breaking relative imports and without getting the directory where the module lives directly on sys.path which can lead to unexpected name clashes). The PEP currently states [1] that this can be made to work with both PEP 382 and PEP 402 In current Python, fixing this just involves checking for a colocated __init__.py file. If we find one, then we work our way up the directory hierarchy until we find a directory without an __init__.py file, put *that* on sys.path, then (effectively) rewrite the command line as if the -m switch had been used. The extension to the current version of PEP 382 is clear - we just accept both an __init__.py file and a .pyp extension as indicating "this is part of a Python package", but otherwise the walk back up the filesystem hierarchy to decide which directory to add to sys.path remains unchanged. However, I'm no longer convinced that this concept can actually be made to work in the context of PEP 402: 1. We can't use sys.path, since we're trying to figure out which directory we want to *add* to sys.path 2. We can't use "contains a Python module", since PEP 402 allows directories inside packages that only contain subpackages (only the leaf directories are required to contain valid Python modules), so we don't know the significance of an empty directory without already knowing what is on sys.path! So, without a clear answer to the question of "from module X, inside package (or package portion) Y, find the nearest parent directory that should be placed on sys.path" in a PEP 402 based world, I'm switching to supporting PEP 382 as my preferred approach to namespace packages. In this case, I think "explicit is better than implicit" means, "given only a filesystem hierarchy, you should be able to figure out the Python package hierarchy it contains". Only explicit markers (either files or extensions) let you do that - with PEP 402, the filesystem doesn't contain enough information to figure it out, you need to also know the contents of sys.path. Regards, Nick. [1] http://www.python.org/dev/peps/pep-0395/#fixing-direct-execution-inside-packages -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ericsnowcurrently at gmail.com Wed Nov 16 09:15:03 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 16 Nov 2011 01:15:03 -0700 Subject: [Import-SIG] PEP 395 (Module aliasing) and the namespace PEPs In-Reply-To: References: Message-ID: On Tue, Nov 15, 2011 at 11:29 PM, Nick Coghlan wrote: > One of the fixes PEP 395 (module aliasing) proposes is to make running > modules inside packages by filename work correctly (i.e. without > breaking relative imports and without getting the directory where the > module lives directly on sys.path which can lead to unexpected name > clashes). The PEP currently states [1] that this can be made to work > with both PEP 382 and PEP 402 > > In current Python, fixing this just involves checking for a colocated > __init__.py file. If we find one, then we work our way up the > directory hierarchy until we find a directory without an __init__.py > file, put *that* on sys.path, then (effectively) rewrite the command > line as if the -m switch had been used. > > The extension to the current version of PEP 382 is clear - we just > accept both an __init__.py file and a .pyp extension as indicating > "this is part of a Python package", but otherwise the walk back up the > filesystem hierarchy to decide which directory to add to sys.path > remains unchanged. > > However, I'm no longer convinced that this concept can actually be > made to work in the context of PEP 402: > > 1. We can't use sys.path, since we're trying to figure out which > directory we want to *add* to sys.path > 2. We can't use "contains a Python module", since PEP 402 allows > directories inside packages that only contain subpackages (only the > leaf directories are required to contain valid Python modules), so we > don't know the significance of an empty directory without already > knowing what is on sys.path! > > So, without a clear answer to the question of "from module X, inside > package (or package portion) Y, find the nearest parent directory that > should be placed on sys.path" in a PEP 402 based world, I'm switching > to supporting PEP 382 as my preferred approach to namespace packages. > In this case, I think "explicit is better than implicit" means, "given > only a filesystem hierarchy, you should be able to figure out the > Python package hierarchy it contains". Only explicit markers (either > files or extensions) let you do that - with PEP 402, the filesystem > doesn't contain enough information to figure it out, you need to also > know the contents of sys.path. Ouch. What about the following options? Indicator for the top-level package? No Leverage __pycache__? No Merge in the idea from PEP 382 of special directory names? To borrow an example from PEP 3147: alpha.pyp/ one.py two.py beta.py beta.pyp/ three.py four.py So package directories are explicitly marked but PEP 402 otherwise continues as-is. I'll have to double-check, but I don't think we tried this angle already. -eric > > Regards, > Nick. > > [1] http://www.python.org/dev/peps/pep-0395/#fixing-direct-execution-inside-packages > -- > Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig > From pje at telecommunity.com Wed Nov 16 16:08:56 2011 From: pje at telecommunity.com (PJ Eby) Date: Wed, 16 Nov 2011 10:08:56 -0500 Subject: [Import-SIG] PEP 395 (Module aliasing) and the namespace PEPs In-Reply-To: References: Message-ID: On Wed, Nov 16, 2011 at 1:29 AM, Nick Coghlan wrote: > So, without a clear answer to the question of "from module X, inside > package (or package portion) Y, find the nearest parent directory that > should be placed on sys.path" in a PEP 402 based world, I'm switching > to supporting PEP 382 as my preferred approach to namespace packages. > In this case, I think "explicit is better than implicit" means, "given > only a filesystem hierarchy, you should be able to figure out the > Python package hierarchy it contains". Only explicit markers (either > files or extensions) let you do that - with PEP 402, the filesystem > doesn't contain enough information to figure it out, you need to also > know the contents of sys.path. > After spending an hour or so reading through PEP 395 and trying to grok what it's doing, I actually come to the opposite conclusion: that PEP 395 is violating the ZofP by both guessing, and not encouraging One Obvious Way of invoking scripts-as-modules. For example, if somebody adds an __init__.py to their project directory, suddenly scripts that worked before will behave differently under PEP 395, creating a strange bit of "spooky action at a distance". (And yes, people add __init__.py files to their projects in odd places -- being setuptools maintainer, you get to see a LOT of weird looking project layouts.) While I think the __qname__ idea is fine, and it'd be good to have a way to avoid aliasing main (suggestion for how included below), I think that relative imports failing from inside a main module should offer an error message suggesting you use "-m" if you're running a script that's within a package, since that's the One Obvious Way of running a script that's also a module. (Albeit not obvious unless you're Dutch. ;-) ) For the import aliasing case, AFAICT it's only about cases where __name__ == '__main__', no? Why not just save the file/importer used for __main__, and then have the import machinery check whether a module being imported is about to alias __main__? For that, you don't need to know in *advance* what the qualified name of __main__ is - you just spot it the first time somebody re-imports it. I think removing qname-quessing from PEP 395 (and replacing it with instructive/google-able error messages) would be an unqualified improvement, independent of what happens to PEPs 382 and 402. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Wed Nov 16 19:21:22 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 16 Nov 2011 11:21:22 -0700 Subject: [Import-SIG] PEP 395 (Module aliasing) and the namespace PEPs In-Reply-To: References: Message-ID: On Wed, Nov 16, 2011 at 8:08 AM, PJ Eby wrote: > On Wed, Nov 16, 2011 at 1:29 AM, Nick Coghlan wrote: >> >> So, without a clear answer to the question of "from module X, inside >> package (or package portion) Y, find the nearest parent directory that >> should be placed on sys.path" in a PEP 402 based world, I'm switching >> to supporting PEP 382 as my preferred approach to namespace packages. >> In this case, I think "explicit is better than implicit" means, "given >> only a filesystem hierarchy, you should be able to figure out the >> Python package hierarchy it contains". Only explicit markers (either >> files or extensions) let you do that - with PEP 402, the filesystem >> doesn't contain enough information to figure it out, you need to also >> know the contents of sys.path. > > After spending an hour or so reading through PEP 395 and trying to grok what > it's doing, I actually come to the opposite conclusion: that PEP 395 is > violating the ZofP by both guessing, and not encouraging One Obvious Way of > invoking scripts-as-modules. > For example, if somebody adds an __init__.py to their project directory, > suddenly scripts that worked before will behave differently under PEP 395, > creating a strange bit of "spooky action at a distance". ?(And yes, people > add __init__.py files to their projects in odd places -- being setuptools > maintainer, you get to see a LOT of weird looking project layouts.) > While I think the __qname__ idea is fine, and it'd be good to have a way to > avoid aliasing main (suggestion for how included below), I think that > relative imports failing from inside a main module should offer an error > message suggesting you use "-m" if you're running a script that's within a > package, since that's the One Obvious Way of running a script that's also a > module. ?(Albeit not obvious unless you're Dutch. ?;-) ) > For the import aliasing case,?AFAICT it's only about cases where __name__ == > '__main__', no? ?Why not just save the file/importer used for __main__, and > then have the import machinery check whether a module being imported is > about to alias __main__? ?For that, you don't need to know in *advance* what > the qualified name of __main__ is - you just spot it the first time somebody > re-imports it. > I think removing qname-quessing from PEP 395 (and replacing it with > instructive/google-able error messages) would be an unqualified improvement, > independent of what happens to PEPs 382 and 402. But which is more astonishing (POLA and all that): running your module in Python, it behaves differently than when you import it (especially __name__); or you add an __init__.py to a directory and your *scripts* there start to behave differently? When I was learning Python, it took quite a while before I realized that modules are imported and scripts are passed at the commandline; and to understand the whole __main__ thing. It has always been a pain, particularly when I wanted to just check a module really quickly for errors. However, lately I've actually taken to the idea that it's better to write a test script that imports the module and running that, rather than running the module itself. But that came with the understanding that the module doesn't go through the import machinery when you *run* it, which I don't think is obvious, particularly to beginners. So Nick's solution, to me, is an appropriate concession to the reality that most folks will expect Python to treat their modules like modules and their scripts like scripts. Still, this actually got me wishing there were a way to customize script-running the same way you can customize import with __import__ and import hooks. -eric > > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig > > From pje at telecommunity.com Wed Nov 16 21:06:51 2011 From: pje at telecommunity.com (PJ Eby) Date: Wed, 16 Nov 2011 15:06:51 -0500 Subject: [Import-SIG] PEP 395 (Module aliasing) and the namespace PEPs In-Reply-To: References: Message-ID: On Wed, Nov 16, 2011 at 1:21 PM, Eric Snow wrote: > But which is more astonishing (POLA and all that): running your module > in Python, it behaves differently than when you import it (especially > __name__); or you add an __init__.py to a directory and your *scripts* > there start to behave differently? > To me it seems that the latter is more astonishing because there's less connection between your action and the result. If you're running something differently, it makes more sense that it acts differently, because you've changed what you're *doing*. In the scripts case, you haven't changed how you run the scripts, and you haven't changed the scripts, so the change in behavior seems to appear out of nowhere. When I was learning Python, it took quite a while before I realized > that modules are imported and scripts are passed at the commandline; > and to understand the whole __main__ thing. It doesn't seem to me that PEP 395 fixes this problem. In order to *actually* fix it, we'd need to have some sort of "package" statement like in other languages - then you'd declare right there in the code what package it's supposed to be part of. > It has always been a pain, particularly when I wanted to > just check a module really quickly for errors. > What, specifically, was a pain? That information might be of more use in determining a solution. If you mean that you had other modules importing the module that was also __main__, then I agree that having a solution for __main__-aliasing is a good idea. I just think it might be more cleanly fixed by checking whether the __file__ of a to-be-imported module is going to end up matching __main__.__file__, and if so, alias __main__ instead. > However, lately I've actually taken to the idea that it's better to > write a test script that imports the module and running that, rather > than running the module itself. But that came with the understanding > that the module doesn't go through the import machinery when you *run* > it, which I don't think is obvious, particularly to beginners. So > Nick's solution, to me, is an appropriate concession to the reality > that most folks will expect Python to treat their modules like modules > and their scripts like scripts. > You lost me there: if most people don't understand the difference, then why are they expecting a difference? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Nov 16 23:41:08 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 17 Nov 2011 08:41:08 +1000 Subject: [Import-SIG] PEP 395 (Module aliasing) and the namespace PEPs In-Reply-To: References: Message-ID: On Thu, Nov 17, 2011 at 1:08 AM, PJ Eby wrote: > On Wed, Nov 16, 2011 at 1:29 AM, Nick Coghlan wrote: >> >> So, without a clear answer to the question of "from module X, inside >> package (or package portion) Y, find the nearest parent directory that >> should be placed on sys.path" in a PEP 402 based world, I'm switching >> to supporting PEP 382 as my preferred approach to namespace packages. >> In this case, I think "explicit is better than implicit" means, "given >> only a filesystem hierarchy, you should be able to figure out the >> Python package hierarchy it contains". Only explicit markers (either >> files or extensions) let you do that - with PEP 402, the filesystem >> doesn't contain enough information to figure it out, you need to also >> know the contents of sys.path. > > After spending an hour or so reading through PEP 395 and trying to grok what > it's doing, I actually come to the opposite conclusion: that PEP 395 is > violating the ZofP by both guessing, and not encouraging One Obvious Way of > invoking scripts-as-modules. > For example, if somebody adds an __init__.py to their project directory, > suddenly scripts that worked before will behave differently under PEP 395, > creating a strange bit of "spooky action at a distance". ?(And yes, people > add __init__.py files to their projects in odd places -- being setuptools > maintainer, you get to see a LOT of weird looking project layouts.) > > While I think the __qname__ idea is fine, and it'd be good to have a way to > avoid aliasing main (suggestion for how included below), I think that > relative imports failing from inside a main module should offer an error > message suggesting you use "-m" if you're running a script that's within a > package, since that's the One Obvious Way of running a script that's also a > module. ?(Albeit not obvious unless you're Dutch. ?;-) ) The -m switch is not always an adequate replacement for direct execution, because it relies on the current working directory being set correctly (or else the module to be executed being accessible via sys.path, and there being nothing in the current directory that will shadow modules that you want to import). Direct execution will always have the advantage of allowing you more explicit control over all of sys.path[0], sys.argv[0] and __main__.__file__. The -m switch, on the other hand, will always set sys.path[0] to the empty string, which may not be what you really want. If the package directory markers are explicit (as they are now and as they are in PEP 382), then PEP 395 isn't guessing - the mapping from the filesystem layout to the Python module namespace is completely unambiguous, since the directory added as sys.path[0] will always be the first parent directory that isn't marked as a package directory: # Current rule sys.path[0] = os.path.abspath(os.path.dirname(__main__.__file__)) # PEP 395 rule path0 = os.path.abspath(os.path.dirname(__main__.__file__)) while is_package_dir(path0): path0 = os.path.dirname(path0) sys.path[0] = path0 In fact, both today and under PEP 382, we could fairly easily provide a "runpy.split_path_module()" function that converts an arbitrary filesystem path to the corresponding python module name and sys.path entry: def _splitmodname(fspath): path_entry, fname = os.path.split(fspath) modname = os.path.splitext(fname)[0] return path_entry, modname # Given appropriate definitions for "is_module_or_package" and "has_init_file"... def split_path_module(fspath): if not is_module_or_package(fspath): raise ValueError("{!r} is not recognized as a Python module".format(filepath)) path_entry, modname = _splitmodname(fspath) while path_entry.endswith(".pyp") or has_init_file(path_entry): path_entry, pkg_name = _splitmodname(path_entry) modname = pkg_name + '.' + modname return modname, path_entry As far as the "one obvious way" criticism goes, I think the obvious way (given PEP 395) is clear: 1. Do you have a filename? Just run it and Python will figure out where it lives in the module namespace 2. Do you have a module name? Run it with the -m switch and Python will figure out where it lives on the filesystem runpy.run_path() corresponds directly to 1, runpy.run_module() corresponds directly to 2. Currently, if you have a filename, just running it is sometimes the *wrong thing to do*, because it may point inside a package directory. But you have no easy way to tell if that is the case. Under PEP 402, you simply *can't* tell, as the filesystem no longer contains enough information to provide an unambiguous mapping to the Python module namespace - instead, the intended mapping depends not only on the filesystem contents, but also on the runtime configuration of sys.path. > For the import aliasing case,?AFAICT it's only about cases where __name__ == > '__main__', no? ?Why not just save the file/importer used for __main__, and > then have the import machinery check whether a module being imported is > about to alias __main__? ?For that, you don't need to know in *advance* what > the qualified name of __main__ is - you just spot it the first time somebody > re-imports it. Oh, I like that idea - once __main__.__qname__ is available, you could just have a metapath hook along the lines of the following: class MainImporter: def __init__(self): main = sys.modules.get("__main__", None): self.main_qname = getattr(main, "__qname__", None) def find_module(self, fullname, path=None): if fullname == self.main_qname: return self return None def load_module(self, fullname): return sys.modules["__main__"] > I think removing qname-quessing from PEP 395 (and replacing it with > instructive/google-able error messages) would be an unqualified improvement, > independent of what happens to PEPs 382 and 402. Even if the "just do what I mean" part of the proposal in PEP 395 is replaced by a "Did you mean?" error message, PEP 382 still offers the superior user experience, since we could use runpy.split_path_module() to state the *exact* argument to -m that should have been used. Of course, that still wouldn't get sys.path[0] set correctly, so it isn't a given that it would really help. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ericsnowcurrently at gmail.com Wed Nov 16 23:41:34 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 16 Nov 2011 15:41:34 -0700 Subject: [Import-SIG] PEP 395 (Module aliasing) and the namespace PEPs In-Reply-To: References: Message-ID: On Wed, Nov 16, 2011 at 1:06 PM, PJ Eby wrote: > On Wed, Nov 16, 2011 at 1:21 PM, Eric Snow > wrote: >> >> But which is more astonishing (POLA and all that): running your module >> in Python, it behaves differently than when you import it (especially >> __name__); or you add an __init__.py to a directory and your *scripts* >> there start to behave differently? > > To me it seems that the latter is more astonishing because there's less > connection between your action and the result. ?If you're running something > differently, it makes more sense that it acts differently, because you've > changed what you're *doing*. ?In the scripts case, you haven't changed how > you run the scripts, and you haven't changed the scripts, so the change in > behavior seems to appear out of nowhere. Well, then I suppose both are astonishing and, for me at least, the module-as-script side of it has bit me more. Regardless, both are a consequence of the script vs. module situation. > >> >> When I was learning Python, it took quite a while before I realized >> that modules are imported and scripts are passed at the commandline; >> and to understand the whole __main__ thing. > > It doesn't seem to me that PEP 395 fixes this problem. ?In order to > *actually* fix it, we'd need to have some sort of "package" statement like > in other languages - then you'd declare right there in the code what package > it's supposed to be part of. Certainly an effective indicator that a file's a module and not a script. Still, I'd rather we find a way to maintain the filesystem-based package approach we have now. It's nice not having to look in each file to figure out the package it belongs to or if it's a script or not. The consequence is that a package that's spread across multiple directories is likewise addressed through the filesystem, hence PEPs 382 and 402. However, the namespace package issue is a separate one from script-vs-module. > >> >> ?It has always been a?pain, particularly when I wanted to >> ?just check a module really quickly for errors. > > What, specifically, was a pain? ?That information might be of more use in > determining a solution. > > If you mean that you had other modules importing the module that was also > __main__, then I agree that having a solution for __main__-aliasing is a > good idea. PEP 395 spells out several pretty well. Additionally, running a module as a script can cause trouble if your module otherwise relies on the value of __name__. Finally, sometimes I rely on a module triggering an import hook, though that is likely a problem just for me. > ?I just think it might be more cleanly fixed by checking whether > the __file__ of a to-be-imported module is going to end up matching > __main__.__file__, and if so, alias __main__ instead. Currently the only promise regarding __file__ is that it will be set on module object once the module has been loaded but before the implicit binding for the import statement. So, unless I'm mistaken, that would have to change to allow for import hooks. Otherwise, sure. > >> >> However, lately I've actually taken to the idea that it's better to >> write a test script that imports the module and running that, rather >> than running the module itself. ?But that came with the understanding >> that the module doesn't go through the import machinery when you *run* >> it, which I don't think is obvious, particularly to beginners. ?So >> Nick's solution, to me, is an appropriate concession to the reality >> that most folks will expect Python to treat their modules like modules >> and their scripts like scripts. > > You lost me there: if most people don't understand the difference, then why > are they expecting a difference? > Yeah, that wasn't clear. :) When someone learns Python, they probably are not going to recognize the difference between running their module and importing it. They'll expect their module to work identically if run as a script or imported. They won't even think about the distinction. Or maybe I'm really out of touch (quite possible :). It'll finally bite them when they implicitly or explicitly rely on the module state set by the import machinery (__name__, __file__, etc.), or on customization of that machinery (a la import hooks). Educating developers on the distinction between scripts and modules is good, but it seems like PEP 395 is trying to bring the behavior more in line with the intuitive behavior, which sounds good to me. Regarding the PEP 402 conflict, if using .pyp on directory names addresses Nick's concern, would you be opposed to that solution? -eric p.s. where should I bring up general discussion on PEP 395? From ncoghlan at gmail.com Wed Nov 16 23:44:07 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 17 Nov 2011 08:44:07 +1000 Subject: [Import-SIG] PEP 395 (Module aliasing) and the namespace PEPs In-Reply-To: References: Message-ID: On Thu, Nov 17, 2011 at 8:41 AM, Eric Snow wrote: > p.s. where should I bring up general discussion on PEP 395? import-sig for now - it needs more thought before I take it back to python-dev. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From pje at telecommunity.com Thu Nov 17 01:10:06 2011 From: pje at telecommunity.com (PJ Eby) Date: Wed, 16 Nov 2011 19:10:06 -0500 Subject: [Import-SIG] PEP 395 (Module aliasing) and the namespace PEPs In-Reply-To: References: Message-ID: On Wed, Nov 16, 2011 at 5:41 PM, Nick Coghlan wrote: > If the package directory markers are explicit (as they are now and as > they are in PEP 382), then PEP 395 isn't guessing - the mapping from > the filesystem layout to the Python module namespace is completely > unambiguous, since the directory added as sys.path[0] will always be > the first parent directory that isn't marked as a package directory: > Sorry, but that's *still guessing*. Random extraneous __init__.py and subdirectories on sys.path can screw you over. For example, if I have a stray __init__.py in site-packages, does that mean that every module there is a submodule of a package called 'site-packages'? Sure, you could fix that problem by ignoring names with a '-', but that's just an illustration. The __init__.py idea was a very good attempt at solving the problem, but even in today's Python, it's still ambiguous and we should refuse to guess. (Because it will result in weird behavior that's *much* harder to debug.) Import aliasing detection and relative import errors, on the other hand, don't rely on guessing. Even if the "just do what I mean" part of the proposal in PEP 395 is > replaced by a "Did you mean?" error message, PEP 382 still offers the > superior user experience, since we could use runpy.split_path_module() > to state the *exact* argument to -m that should have been used. No, what you get is just a *guess* as to the correct directory. (And you can make similar guesses under PEP 402, if a parent directory of the script is already on sys.path.) > Of > course, that still wouldn't get sys.path[0] set correctly, so it isn't > a given that it would really help. > Right; and if you already *have* a correct sys.path, then you can make just as good a guess under PEP 402. Don't get me wrong - I'm all in favor of further confusion-reduction (which is what PEP 402's about, after all). I'm just concerned that PEP 395 isn't really clear about the tradeoffs, in the same way that PEP 382 was unclear back when I started doing all those proposed revisions leading up to PEP 402. That is, like early PEP 382, ISTM that it's an initial implementation attempt to solve a problem by patching over it, rather than an attempt to think through "how things are" and "how they ought to be". I think some of that sort of thinking ought to be done, to see if perhaps there's a better tradeoff to be had. For one thing, I wonder about the whole scripts-as-modules thing. In other scripting languages AFAICT it's not very common to have a script as a module; there's a pretty clear delineation between the two, because Python's about the only language with the name==main paradigm. In languages that have some sort of "main" paradigm, it's usually a specially named function or class method (Java) or whatever. So, I'm wondering a bit about the detailed use cases people have about using modules as scripts and vice versa. Are they writing scripts, then turning them into modules? Trying to run somebody else's modules? Copying example code from somewhere? (The part that confuses me is, if you *know* there's a difference between a script and a module, then presumably you either know about __name__, OR you wouldn't have any reason to run your module as a script. Conversely, if you don't know about __name__, then how would you conceive of making your script into a module? ISTM that in order to even have this problem you have to at least be knowledgeable enough to realize there's *some* difference between moduleness and scriptness.) Anyway, understanding the *details* of this process (of how people end up making the sort of errors PEP 395 aims to address) seems important to me for pinning down precisely what problem to solve and how. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Nov 17 02:47:46 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 17 Nov 2011 11:47:46 +1000 Subject: [Import-SIG] PEP 395 (Module aliasing) and the namespace PEPs In-Reply-To: References: Message-ID: On Thu, Nov 17, 2011 at 10:10 AM, PJ Eby wrote: > So, I'm wondering a bit about the detailed use cases people have about using > modules as scripts and vice versa. ?Are they writing scripts, then turning > them into modules? ?Trying to run somebody else's modules? ?Copying example > code from somewhere? > (The part that confuses me is, if you *know* there's a difference between a > script and a module, then presumably you either know about __name__, OR you > wouldn't have any reason to run your module as a script. ?Conversely, if you > don't know about __name__, then how would you conceive of making your script > into a module? ?ISTM that in order to even have this problem you have to at > least be knowledgeable enough to realize there's *some* difference between > moduleness and scriptness.) > Anyway, understanding the *details* of this process (of how people end up > making the sort of errors PEP 395 aims to address) seems important to me for > pinning down precisely what problem to solve and how. The module->script process comes from wanting to expose useful command line functionality from a Python module in a cross-platform way without any additional packaging effort (as exposing system-native scripts is a decidedly *non* trivial task, and also doesn't work from a source checkout). The genesis was actually the timeit module - "python -m timeit" is now the easiest way to run short benchmarking snippets. A variety of other standard library modules also offer useful "-m" functionality - "-m site" will dump diagnostic info regarding your path setup, "-m smptd" will run up a local SMTP server, "-m unittest" and "-m doctest" can be used to run tests, "-m pdb" can be used to invoke the debugger, "-m pydoc" will run pydoc as usual. (A more comprehensive list is below, but it's also worth caveating this list with Raymond's comments on http://bugs.python.org/issue11260) Third party wise, I've mostly seen "-m" support used for "scripts that run scripts" - tools like pychecker, coverage and so forth are naturally Python version specific, and running them via -m rather than directly automatically deals with those scoping issues. It's also fairly common for test definition modules to support execution via "-m" (by invoking unittest.main() from an "if __name__" guarded suite). Cheers, Nick. ==================== Top level stdlib modules with meaningful "if __name__ == '__main__':" blocks: base64.py - CLI for base64 encoding/decoding calendar.py - CLI to display text calendars cgi.py - displays some example CGI output code.py - code-based interactive interpreter compileall.py - CLI for bytecode file generation cProfile.py - profile a script with cProfile dis.py - CLI for file disassembly doctest.py - CLI for doctest execution filecmp.py - CLI to compare directory contents fileinput.py - line numbered file display formatter.py - reformats text and prints to stdout ftplib.py - very basic CLI for FTP gzip.py - basic CLI for creation of gzip files imaplib.py - basic IMAP client (localhost only) imghdr.py - scan a directory looking for valid image headers mailcap.py - display system mailcap config info mimetypes.py - CLI for querying mimetypes (but appears broken) modulefinder.py - dump list of all modules referenced (directly or indirectly) from a Python file netrc.py - dump netrc config (I think) nntplib.py - basic CLI for nntp pdb.py - debug a script pickle.py - dumps the content of a pickle file pickletools.py - prettier dump of pickle file contents platform.py - display platform info (e.g. Linux-3.1.1-1.fc16.x86_64-x86_64-with-fedora-16-Verne) profile.py - profile a script with profile pstats.py - CLI to browse profile stats pydoc.py - same as the installed pydoc script quopri.py - CLI for quoted printable encoding/decoding runpy.py - Essentially an indirect way to do what -m itself already does shlex.py - runs the lexer over the specified file site.py - dumps path config information smtpd.py - local SMTP server sndhdr.py - scan a directory looking for valid audio headers sysconfig.py - dumps system configuration details tabnanny.py - CLI to scan files telnetlib.py - very basic telnet CLI timeit.py - CLI to time snippets of code tokenize.py - CLI to tokenize files turtle.py - runs turtle demo (appears to be broken in trunk, though) uu.py - CLI for UUencode encoding/decoding webbrowser.py - CLI to launch a web browser zipfile.py - basic CLI for zipfile creation and inspection Not sure (no help text, no clear purpose without looking at the code): aifc.py - dump info about AIFF files? codecs.py decimal.py difflib.py getopt.py - manual sanity check? heapq.py inspect.py keyword.py - only valid in source checkout macurl2path.py - manual sanity check? poplib.py - simple POP3 client? pprint.py pyclbr.py - dump classes defined in files? py_compile.py random.py - manual sanity check? smtplib.py sre_constants.py - broken on Py3k! symbol.py - only valid in source checkout, broken on Py3k symtable.py - manual sanity check? textwrap.py - manual sanity check? token.py - only valid in source checkout -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Thu Nov 17 02:52:32 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 17 Nov 2011 11:52:32 +1000 Subject: [Import-SIG] PEP 395 (Module aliasing) and the namespace PEPs In-Reply-To: References: Message-ID: On Thu, Nov 17, 2011 at 10:10 AM, PJ Eby wrote: > On Wed, Nov 16, 2011 at 5:41 PM, Nick Coghlan wrote: >> >> If the package directory markers are explicit (as they are now and as >> they are in PEP 382), then PEP 395 isn't guessing - the mapping from >> the filesystem layout to the Python module namespace is completely >> unambiguous, since the directory added as sys.path[0] will always be >> the first parent directory that isn't marked as a package directory: > > Sorry, but that's *still guessing*. ?Random extraneous __init__.py and > subdirectories on sys.path can screw you over. ?For example, if I have a > stray __init__.py in site-packages, does that mean that every module there > is a submodule of a package called 'site-packages'? Yes. (although in that case, you'd error out, since the package name isn't valid). Errors should never pass silently - ignoring such a screw-up in their filesystem layout is letting an error pass silently and will most likely cause obscure problems further down the road. > Sure, you could fix that problem by ignoring names with a '-', but that's > just an illustration. ?The __init__.py idea was a very good attempt at > solving the problem, but even in today's Python, it's still ambiguous and we > should refuse to guess. ?(Because it will result in weird behavior that's > *much* harder to debug.) > Import aliasing detection and relative import errors, on the other hand, > don't rely on guessing. Umm, if people screw up their filesystem layouts and *lie* to the interpreter about whether or not something is a package, how is that our fault? "Oh, they told me something, but they might not mean it, so I'll choose to ignore the information they've given me" is the part that sounds like guessing to me. If we error *immediately*, telling them what's wrong with their filesystem, that's the *opposite* of guessing. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From pje at telecommunity.com Thu Nov 17 03:00:20 2011 From: pje at telecommunity.com (PJ Eby) Date: Wed, 16 Nov 2011 21:00:20 -0500 Subject: [Import-SIG] PEP 395 (Module aliasing) and the namespace PEPs In-Reply-To: References: Message-ID: On Wed, Nov 16, 2011 at 8:47 PM, Nick Coghlan wrote: > On Thu, Nov 17, 2011 at 10:10 AM, PJ Eby wrote: > > So, I'm wondering a bit about the detailed use cases people have about > using > > modules as scripts and vice versa. Are they writing scripts, then > turning > > them into modules? Trying to run somebody else's modules? Copying > example > > code from somewhere? > > (The part that confuses me is, if you *know* there's a difference > between a > > script and a module, then presumably you either know about __name__, OR > you > > wouldn't have any reason to run your module as a script. Conversely, if > you > > don't know about __name__, then how would you conceive of making your > script > > into a module? ISTM that in order to even have this problem you have to > at > > least be knowledgeable enough to realize there's *some* difference > between > > moduleness and scriptness.) > > Anyway, understanding the *details* of this process (of how people end up > > making the sort of errors PEP 395 aims to address) seems important to me > for > > pinning down precisely what problem to solve and how. > > The module->script process comes from wanting to expose useful command > line functionality from a Python module in a cross-platform way > without any additional packaging effort (as exposing system-native > scripts is a decidedly *non* trivial task, and also doesn't work from > a source checkout). > No, I mean how do the people who PEP 395 is supposed to be helping, find out that they even want to run a script as a module? Or are you saying that the central use case the PEP is aimed at is running stdlib modules? ;-) > It's also fairly common for test definition modules to support > execution via "-m" (by invoking unittest.main() from an "if __name__" > guarded suite). > Right... so are these modules not *documented* as being run by -m? Are people running them as scripts by mistake? I'm still not seeing how people end up making their own scripts into modules or vice versa, *without* some explicit documentation about the process. I mean, how do you even know that a file can be both, without realizing that there's a difference between the two? The most common confusion I've seen among newbies is the ones who don't grok that module != file. That is, they don't understand why you replace directory separators with '.' (which is how they think of it) or they want to use exec/runfile instead of import, or they expect import to run the code, or similar confusions of "file" and "module". However, I don't grok how people with *that* confusion would end up writing code that has a problem when run as a combination script/module, because they already think scripts and modules are the same thing and are rather unlikely to create a package in the first place. So who *is* PEP 395's target audience, and what is their mental model? That's the question I'd like to come to grips with before proposing a full solution. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Nov 17 04:50:34 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 17 Nov 2011 13:50:34 +1000 Subject: [Import-SIG] PEP 395 (Module aliasing) and the namespace PEPs In-Reply-To: References: Message-ID: On Thu, Nov 17, 2011 at 12:00 PM, PJ Eby wrote: > So who *is* PEP 395's target audience, and what is their mental model? > ?That's the question I'd like to come to grips with before proposing a full > solution. OK, I realised that the problem I want to solve with this part of the PEP isn't limited to direct execution of scripts - It's a general problem with figuring out an appropriate value for sys.path[0] that also affects the interactive interpreter and the -m switch. The "mission statement" for this part of PEP 395 is then clearly stated as: the Python interpreter should *never* automatically place a Python package directory on sys.path. Adding package directories to sys.path creates undesirable aliasing that may lead to multiple imports of the same module under different names, unexpected shadowing of standard library (and other) modules and packages, and frequently confusing errors where a module works when imported but not when executed directly and vice-versa. Letting the import system get into that state without even a warning is letting an error pass silently and we shouldn't do it. However, it's also true that, in many cases, this slight error in the import state is actually harmless, so *always* failing in this situation would be an unacceptable breach of backwards compatibility. While we could issue a warning and demand that the user fix it themselves (by invoking Python differently), there's no succinct way to explain what has gone wrong - it depends on a fairly detailed understanding of how import system gets initialised. And, as noted, there isn't actually a easy mechanism for users to currently fix it themselves in the general case - using the -m switch means also you have to get the current working directory right, losing out on one of the main benefits of direct execution. And such a warning is assuredly useless if you actually ran the script by double-clicking it in a file browser... Accordingly, PEP 395 proposes that, when such a situation is encountered, Python should just use the nearest containing *non*-package directory as sys.path[0] rather than naively blundering ahead and corrupting the import system state, regardless of how the proposed value for sys.path[0] was determined (i.e. the current working directory or the location of a specific Python file). Any module that currently worked correctly in this situation should continue to work, and many others that previously failed (because they were inside packages) will start to work. The only new failures will be early detection of invalid filesystem layouts, such as "__init__.py" files in directories that are not valid Python package names, and scripts stored inside package directories that *only* work as scripts (effectively relying on the implicit relative imports that occur due to __name__ being set to "__main__"). This problem most often arises during development (*not* after deployment), when developers either start python to perform some experiments, or place quick tests or sanity checks in "if __name__ == '__main__':" blocks at the end of their modules (this is a common practice, described in a number of Python tutorials. Our own docs also recommend this practice for test modules: http://docs.python.org/library/unittest#basic-example). The classic example from Stack Overflow looked like this: project/ package/ __init__.py foo.py tests/ __init__.py test_foo.py Currently, the *only* correct way to invoke test_foo is with "project" as the current working directory and the command "python -m package.tests.test_foo". Anything else (such as "python package/tests/test_foo.py", ./package/tests/test_foo.py", clicking the file in a file browser or, while in the tests directory, invoking "python test_foo.py", "./test_foo.py" or "python -m test_foo") will still *try* to run test_foo, but fail in a completely confusing manner. If test_foo uses absolute imports, then the error will generally be "ImportError: No module named package", if it uses explicit relative imports, then the error will be "ValueError: Attempted relative import in non-package". Neither of these is going to make any sense to a novice Python developer, but there isn't any obvious way to make those messages self-explanatory (they're completely accurate, they just involve a lot of assumed knowledge regarding how the import system works and sys.path gets initialised). If foo.py is set up to invoke its own test suite: if __name__ == "__main__": import unittest from .tests import test_foo unittest.main(test_foo.__name__) Then you can get similarly confusing errors when attempting to run foo itself. However, those errors are comparatively obvious compared to the AttributeErrors (and ImportErrors) that can arise if you get unexpected name shadowing. For example, suppose you have a helper module called "package.json" for dealing with JSON serialisation in your library, and you start an interactive session while in the package directory, or attempting to invoke 'foo.py' directly in order to run its test suite (as described above). Now "import json" is giving you the version from your package, even though that version is *supposed* to be safely hidden away inside your package namespace. By silently allowing a package directory onto sys.path, we're doing our users a grave disservice. So my perspective is this: we're currently doing something by default that's almost guaranteed to be the wrong thing to do. There's a reasonably simple alternative that's almost always the *right* thing to do. So let's switch the default behaviour to get the common case right, and leave the confusing errors for the situations where something is actually *broken* (i.e. misplaced __init__.py files and scripts in package directories that are relying on implicit relative imports). And if that means requiring that package directories always be marked explicitly (either by an __init__.py file or by a ".pyp" extension) and forever abandoning the concepts in PEP 402, so be it. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From pje at telecommunity.com Thu Nov 17 06:48:52 2011 From: pje at telecommunity.com (PJ Eby) Date: Thu, 17 Nov 2011 00:48:52 -0500 Subject: [Import-SIG] PEP 395 (Module aliasing) and the namespace PEPs In-Reply-To: References: Message-ID: On Wed, Nov 16, 2011 at 8:52 PM, Nick Coghlan wrote: > Umm, if people screw up their filesystem layouts and *lie* to the > interpreter about whether or not something is a package, how is that > our fault? "Oh, they told me something, but they might not mean it, so > I'll choose to ignore the information they've given me" is the part > that sounds like guessing to me. > Er, what? They're not lying, they just made a mistake -- a mistake that could've occurred at any point during a project's development, which would then only surface later. As I said, I've seen projects where people had unnecessary __init__.py files floating around -- mainly because at some point they were trying any and everything to get package imports to work correctly, and somewhere along the line decided to just put __init__.py files everywhere just to be "sure" that things would work. (i.e. the sort of behavior PEP 402 is supposed to make unnecessary.) If we error *immediately*, telling them what's wrong with their > filesystem, that's the *opposite* of guessing. > I'm all in favor of warning or erroring out on aliasing __main__ or relative imports from __main__. It's silently *succeeding* in doing something that might not have been intended on the basis of coincidental __init__.py placement that I have an issue with. There exist projects that *intentionally* alias their modules as both a package and non-package (*cough* PIL *cough*), to name just *one* kind of *intentionally* weird sys.path setups, not counting unintentional ones like I mentioned. The simple fact is that you cannot unambiguously determine the intended meaning of a given script, and you certainly can't do it *before* the script executes (because it might already be doing some sys.path munging of its own. Saying that people who made one kind of mistake or intentional change are lying, while a different set of people making mistakes deserve to have their mistake silently corrected doesn't seem to make much sense to me. But even if I granted that people with extra __init__.py's floating around should be punished for this (and I don't), this *still* wouldn't magically remove the existing ambiguity-of-intention in today's Python projects. Without some way for people to explicitly declare their intention (e.g. explicitly setting __qname__), you really have no way to definitely establish what the user's *intention* is. (Especially since the user who wrote the code and the user trying to use it might be different people.... and sys.path might've been set up by yet another party.) IOW, it's ambiguous already, today, with or without 382, 402, or any other new PEP. (Heck, it was ambiguous before PEP 302 came around!) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Nov 17 08:00:00 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 17 Nov 2011 17:00:00 +1000 Subject: [Import-SIG] PEP 395 (Module aliasing) and the namespace PEPs In-Reply-To: References: Message-ID: On Thu, Nov 17, 2011 at 3:48 PM, PJ Eby wrote: > I'm all in favor of warning or erroring out on aliasing __main__ or relative > imports from __main__. ?It's silently *succeeding* in doing something that > might not have been intended on the basis of coincidental __init__.py > placement that I have an issue with. This is the part I don't get - you say potentially unintentional success is bad, but you're ok with silently succeeding by *ignoring* the presence of an __init__.py file and hence performing implicit relative imports, exactly the behaviour that PEP 328 set out to eliminate. Currently, by default, a *correct* package layout breaks under direct execution. I am proposing that we make it work by preventing implicit relative imports from __main__, just as we do from any other module. As a consequence, scripts that already support direct execution from inside a package would need to be updated to use explicit relative imports in Python 3.3+, since their implicit relative imports will break, just as they already do when you attempt to import such a module. I'm happy to fix things for novices and put the burden of a workaround on the people that know what they're doing. The workaround: if __name__ == "__main__" and sys.version_info < (3, 3): import peer # Implicit relative import else: from . import peer # explicit relative import Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Nov 19 13:59:24 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 19 Nov 2011 22:59:24 +1000 Subject: [Import-SIG] Updated PEP 395 ("Qualified Names for Modules" aka "Implicit Relative Imports Must Die!") Message-ID: The updated version is includes below and has also been updated on python.org if you prefer a nicely formatted version: http://www.python.org/dev/peps/pep-0395/ The recent discussion regarding imports from main really crystallised for me what I think is currently wrong with imports from main modules - I was cheering when the Django folks updated their default site template to avoid putting a package directory on sys.path (due to all the problems it causes), but that thread made me realise how easy we make it for beginners to do that by accident, with no real payoff of any kind to justify it. So the PEP now spends a lot of time talking about the fact that our current system for initialising sys.path[0] is almost always just plain wrong as soon as packages are involved, but the explicit markers on package directories make it possible for us to do the right thing instead of being dumb about it. Cheers, Nick. ----------------------------------------------- PEP: 395 Title: Qualifed Names for Modules Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 4-Mar-2011 Python-Version: 3.3 Post-History: 5-Mar-2011, 19-Nov-2011 Abstract ======== This PEP proposes new mechanisms that eliminate some longstanding traps for the unwary when dealing with Python's import system, as well as serialisation and introspection of functions and classes. It builds on the "Qualified Name" concept defined in PEP 3155. Relationship with Other PEPs ---------------------------- This PEP builds on the "qualified name" concept introduced by PEP 3155, and also shares in that PEP's aim of fixing some ugly corner cases when dealing with serialisation of arbitrary functions and classes. It also builds on PEP 366, which took initial tentative steps towards making explicit relative imports from the main module work correctly in at least *some* circumstances. This PEP is also affected by the two competing "namespace package" PEPs (PEP 382 and PEP 402). This PEP would require some minor adjustments to accommodate PEP 382, but has some critical incompatibilities with respect to the implicit namespace package mechanism proposed in PEP 402. Finally, PEP 328 eliminated implicit relative imports from imported modules. This PEP proposes that implicit relative imports from main modules also be eliminated. What's in a ``__name__``? ========================= Over time, a module's ``__name__`` attribute has come to be used to handle a number of different tasks. The key use cases identified for this module attribute are: 1. Flagging the main module in a program, using the ``if __name__ == "__main__":`` convention. 2. As the starting point for relative imports 3. To identify the location of function and class definitions within the running application 4. To identify the location of classes for serialisation into pickle objects which may be shared with other interpreter instances Traps for the Unwary ==================== The overloading of the semantics of ``__name__`` have resulted in several traps for the unwary. These traps can be quite annoying in practice, as they are highly unobvious and can cause quite confusing behaviour. A lot of the time, you won't even notice them, which just makes them all the more surprising when they do come up. Why are my imports broken? -------------------------- There's a general principle that applies when modifying ``sys.path``: *never* put a package directory directly on ``sys.path``. The reason this is problematic is that every module in that directory is now potentially accessible under two different names: as a top level module (since the package directory is on ``sys.path``) and as a submodule of the package (if the higher level directory containing the package itself is also on ``sys.path``). As an example, Django (up to and including version 1.3) is guilty of setting up exactly this situation for site-specific applications - the application ends up being accessible as both ``app`` and ``site.app`` in the module namespace, and these are actually two *different* copies of the module. This is a recipe for confusion if there is any meaningful mutable module level state, so this behaviour is being eliminated from the default site set up in version 1.4 (site-specific apps will always be fully qualified with the site name). However, it's hard to blame Django for this, when the same part of Python responsible for setting ``__name__ = "__main__"`` in the main module commits the exact same error when determining the value for ``sys.path[0]``. The impact of this can be seen relatively frequently if you follow the "python" and "import" tags on Stack Overflow. When I had the time to follow it myself, I regularly encountered people struggling to understand the behaviour of straightforward package layouts like the following:: project/ setup.py package/ __init__.py foo.py tests/ __init__.py test_foo.py I would actually often see it without the ``__init__.py`` files first, but that's a trivial fix to explain. What's hard to explain is that all of the following ways to invoke ``test_foo.py`` *probably won't work* due to broken imports (either failing to find ``package`` for absolute imports, complaining about relative imports in a non-package for explicit relative imports, or issuing even more obscure errors if some other submodule happens to shadow the name of a top-level module, such as a ``package.json`` module that handled serialisation or a ``package.tests.unittest`` test runner):: # working directory: project/package/tests ./test_foo.py python test_foo.py python -m test_foo python -c "from test_foo import main; main()" # working directory: project/package tests/test_foo.py python tests/test_foo.py python -m tests.test_foo python -c "from tests.test_foo import main; main()" # working directory: project package/tests/test_foo.py python package/tests/test_foo.py # working directory: project/.. project/package/tests/test_foo.py python project/package/tests/test_foo.py # The -m and -c approaches don't work from here either, but the failure # to find 'package' correctly is pretty easy to explain in this case That's right, that long list is of all the methods of invocation that will almost certainly *break* if you try them, and the error messages won't make any sense if you're not already intimately familiar not only with the way Python's import system works, but also with how it gets initialised. For a long time, the only way to get ``sys.path`` right with that kind of setup was to either set it manually in ``test_foo.py`` itself (hardly something a novice, or even many veteran, Python programmers are going to know how to do) or else to make sure to import the module instead of executing it directly:: # working directory: project python -c "from package.tests.test_foo import main; main()" Since the implementation of PEP 366 (which defined a mechanism that allows relative imports to work correctly when a module inside a package is executed via the ``-m`` switch), the following also works properly:: # working directory: project python -m package.tests.test_foo The fact that most methods of invoking Python code from the command line break when that code is inside a package, and the two that do work are highly sensitive to the current working directory is all thoroughly confusing for a beginner, and I personally believe it is one of the key factors leading to the perception that Python packages are complicated and hard to get right. This problem isn't even limited to the command line - if ``test_foo.py`` is open in Idle and you attempt to run it by pressing F5, then it will fail in just the same way it would if run directly from the command line. There's a reason the general ``sys.path`` guideline mentioned above exists, and the fact that the interpreter itself doesn't follow it when determining ``sys.path[0]`` is the root cause of all sorts of grief. Importing the main module twice ------------------------------- Another venerable trap is the issue of importing ``__main__`` twice. This occurs when the main module is also imported under its real name, effectively creating two instances of the same module under different names. If the state stored in ``__main__`` is significant to the correct operation of the program, or if there is top-level code in the main module that has non-idempotent side effects, then this duplication can cause obscure and surprising errors. In a bit of a pickle -------------------- Something many users may not realise is that the ``pickle`` module sometimes relies on the ``__module__`` attribute when serialising instances of arbitrary classes. So instances of classes defined in ``__main__`` are pickled that way, and won't be unpickled correctly by another python instance that only imported that module instead of running it directly. This behaviour is the underlying reason for the advice from many Python veterans to do as little as possible in the ``__main__`` module in any application that involves any form of object serialisation and persistence. Similarly, when creating a pseudo-module (see next paragraph), pickles rely on the name of the module where a class is actually defined, rather than the officially documented location for that class in the module hierarchy. For the purposes of this PEP, a "pseudo-module" is a package designed like the Python 3.2 ``unittest`` and ``concurrent.futures`` packages. These packages are documented as if they were single modules, but are in fact internally implemented as a package. This is *supposed* to be an implementation detail that users and other implementations don't need to worry about, but, thanks to ``pickle`` (and serialisation in general), the details are often exposed and can effectively become part of the public API. While this PEP focuses specifically on ``pickle`` as the principal serialisation scheme in the standard library, this issue may also affect other mechanisms that support serialisation of arbitrary class instances and rely on ``__module__`` attributes to determine how to handle deserialisation. Where's the source? ------------------- Some sophisticated users of the pseudo-module technique described above recognise the problem with implementation details leaking out via the ``pickle`` module, and choose to address it by altering ``__name__`` to refer to the public location for the module before defining any functions or classes (or else by modifying the ``__module__`` attributes of those objects after they have been defined). This approach is effective at eliminating the leakage of information via pickling, but comes at the cost of breaking introspection for functions and classes (as their ``__module__`` attribute now points to the wrong place). Forkless Windows ---------------- To get around the lack of ``os.fork`` on Windows, the ``multiprocessing`` module attempts to re-execute Python with the same main module, but skipping over any code guarded by ``if __name__ == "__main__":`` checks. It does the best it can with the information it has, but is forced to make assumptions that simply aren't valid whenever the main module isn't an ordinary directly executed script or top-level module. Packages and non-top-level modules executed via the ``-m`` switch, as well as directly executed zipfiles or directories, are likely to make multiprocessing on Windows do the wrong thing (either quietly or noisily, depending on application details) when spawning a new process. While this issue currently only affects Windows directly, it also impacts any proposals to provide Windows-style "clean process" invocation via the multiprocessing module on other platforms. Qualified Names for Modules =========================== To make it feasible to fix these problems once and for all, it is proposed to add a new module level attribute: ``__qualname__``. This abbreviation of "qualified name" is taken from PEP 3155, where it is used to store the naming path to a nested class or function definition relative to the top level module. For modules, ``__qualname__`` will normally be the same as ``__name__``, just as it is for top-level functions and classes in PEP 3155. However, it will differ in some situations so that the above problems can be addressed. Specifically, whenever ``__name__`` is modified for some other purpose (such as to denote the main module), then ``__qualname__`` will remain unchanged, allowing code that needs it to access the original unmodified value. If a module loader does not initialise ``__qualname__`` itself, then the import system will add it automatically (setting it to the same value as ``__name__``). Eliminating the Traps ===================== The following changes are interrelated and make the most sense when considered together. They collectively either completely eliminate the traps for the unwary noted above, or else provide straightforward mechanisms for dealing with them. A rough draft of some of the concepts presented here was first posted on the python-ideas list [1]_, but they have evolved considerably since first being discussed in that thread. Further discussion has subsequently taken place on the import-sig mailing list [2]_. Fixing main module imports inside packages ------------------------------------------ To eliminate this trap, it is proposed that an additional filesystem check be performed when determining a suitable value for ``sys.path[0]``. This check will look for Python's explicit package directory markers and use them to find the appropriate directory to add to ``sys.path``. The current algorithm for setting ``sys.path[0]`` in relevant cases is roughly as follows:: # Interactive prompt, -m switch, -c switch sys.path.insert(0, '') :: # Valid sys.path entry execution (i.e. directory and zip execution) sys.path.insert(0, sys.argv[0]) :: # Direct script execution sys.path.insert(0, os.path.dirname(sys.argv[0])) It is proposed that this initialisation process be modified to take package details stored on the filesystem into account:: # Interactive prompt, -c switch in_package, path_entry, modname = split_path_module(os.getcwd(), '') if in_package: sys.path.insert(0, path_entry) else: sys.path.insert(0, '') # Start interactive prompt or run -c command as usual # __main__.__qualname__ is set to "__main__" :: # -m switch modname = <> in_package, path_entry, modname = split_path_module(os.getcwd(), modname) if in_package: sys.path.insert(0, path_entry) else: sys.path.insert(0, '') # modname (possibly adjusted) is passed to ``runpy._run_module_as_main()`` # __main__.__qualname__ is set to modname :: # Valid sys.path entry execution (i.e. directory and zip execution) modname = "__main__" path_entry, modname = split_path_module(sys.argv[0], modname) sys.path.insert(0, path_entry) # modname (possibly adjusted) is passed to ``runpy._run_module_as_main()`` # __main__.__qualname__ is set to modname :: # Direct script execution in_package, path_entry, modname = split_path_module(sys.argv[0]) sys.path.insert(0, path_entry) if in_package: # Pass modname to ``runpy._run_module_as_main()`` else: # Run script directly # __main__.__qualname__ is set to modname The ``split_path_module()`` supporting function used in the above pseudo-code would have the following semantics:: def _splitmodname(fspath): path_entry, fname = os.path.split(fspath) modname = os.path.splitext(fname)[0] return path_entry, modname def _is_package_dir(fspath): return any(os.exists("__init__" + info[0]) for info in imp.get_suffixes()) def split_path_module(fspath, modname=None): """Given a filesystem path and a relative module name, determine an appropriate sys.path entry and a fully qualified module name. Returns a 3-tuple of (package_depth, fspath, modname). A reported package depth of 0 indicates that this would be a top level import. If no relative module name is given, it is derived from the final component in the supplied path with the extension stripped. """ if modname is None: fspath, modname = _splitmodname(fspath) package_depth = 0 while _is_package_dir(fspath): fspath, pkg = _splitmodname(fspath) modname = pkg + '.' + modname return package_depth, fspath, modname This PEP also proposes that the ``split_path_module()`` functionality be exposed directly to Python users via the ``runpy`` module. Compatibility with PEP 382 ~~~~~~~~~~~~~~~~~~~~~~~~~~ Making this proposal compatible with the PEP 382 namespace packaging PEP is trivial. The semantics of ``_is_package_dir()`` are merely changed to be:: def _is_package_dir(fspath): return (fspath.endswith(".pyp") or any(os.exists("__init__" + info[0]) for info in imp.get_suffixes())) Incompatibility with PEP 402 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PEP 402 proposes the elimination of explicit markers in the file system for Python packages. This fundamentally breaks the proposed concept of being able to take a filesystem path and a Python module name and work out an unambiguous mapping to the Python module namespace. Instead, the appropriate mapping would depend on the current values in ``sys.path``, rendering it impossible to ever fix the problems described above with the calculation of ``sys.path[0]`` when the interpreter is initialised. While some aspects of this PEP could probably be salvaged if PEP 402 were adopted, the core concept of making import semantics from main and other modules more consistent would no longer be feasible. This incompatibility is discussed in more detail in the relevant import-sig thread [2]_. Potential incompatibilities with scripts stored in packages ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The proposed change to ``sys.path[0]`` initialisation *may* break some existing code. Specifically, it will break scripts stored in package directories that rely on the implicit relative imports from ``__main__`` in order to run correctly under Python 3. While such scripts could be imported in Python 2 (due to implicit relative imports) it is already the case that they cannot be imported in Python 3, as implicit relative imports are no longer permitted when a module is imported. By disallowing implicit relatives imports from the main module as well, such modules won't even work as scripts with this PEP. Switching them over to explicit relative imports will then get them working again as both executable scripts *and* as importable modules. To support earlier versions of Python, a script could be written to use different forms of import based on the Python version:: if __name__ == "__main__" and sys.version_info < (3, 3): import peer # Implicit relative import else: from . import peer # explicit relative import Fixing dual imports of the main module -------------------------------------- Given the above proposal to get ``__qualname__`` consistently set correctly in the main module, one simple change is proposed to eliminate the problem of dual imports of the main module: the addition of a ``sys.metapath`` hook that detects attempts to import ``__main__`` under its real name and returns the original main module instead:: class AliasImporter: def __init__(self, module, alias): self.module = module self.alias = alias def __repr__(self): fmt = "{0.__class__.__name__}({0.module.__name__}, {0.alias})" return fmt.format(self) def find_module(self, fullname, path=None): if path is None and fullname == self.alias: return self return None def load_module(self, fullname): if fullname != self.alias: raise ImportError("{!r} cannot load {!r}".format(self, fullname)) return self.main_module This metapath hook would be added automatically during import system initialisation based on the following logic:: main = sys.modules["__main__"] if main.__name__ != main.__qualname__: sys.metapath.append(AliasImporter(main, main.__qualname__)) This is probably the least important proposal in the PEP - it just closes off the last mechanism that is likely to lead to module duplication after the configuration of ``sys.path[0]`` at interpreter startup is addressed. Fixing pickling without breaking introspection ---------------------------------------------- To fix this problem, it is proposed to make use of the new module level ``__qualname__`` attributes to determine the real module location when ``__name__`` has been modified for any reason. In the main module, ``__qualname__`` will automatically be set to the main module's "real" name (as described above) by the interpreter. Pseudo-modules that adjust ``__name__`` to point to the public namespace will leave ``__qualname__`` untouched, so the implementation location remains readily accessible for introspection. If ``__name__`` is adjusted at the top of a module, then this will automatically adjust the ``__module__`` attribute for all functions and classes subsequently defined in that module. Since multiple submodules may be set to use the same "public" namespace, functions and classes will be given a new ``__qualmodule__`` attribute that refers to the ``__qualname__`` of their module. This isn't strictly necessary for functions (you could find out their module's qualified name by looking in their globals dictionary), but it is needed for classes, since they don't hold a reference to the globals of their defining module. Once a new attribute is added to classes, it is more convenient to keep the API consistent and add a new attribute to functions as well. These changes mean that adjusting ``__name__`` (and, either directly or indirectly, the corresponding function and class ``__module__`` attributes) becomes the officially sanctioned way to implement a namespace as a package, while exposing the API as if it were still a single module. All serialisation code that currently uses ``__name__`` and ``__module__`` attributes will then avoid exposing implementation details by default. To correctly handle serialisation of items from the main module, the class and function definition logic will be updated to also use ``__qualname__`` for the ``__module__`` attribute in the case where ``__name__ == "__main__"``. With ``__name__`` and ``__module__`` being officially blessed as being used for the *public* names of things, the introspection tools in the standard library will be updated to use ``__qualname__`` and ``__qualmodule__`` where appropriate. For example: - ``pydoc`` will report both public and qualified names for modules - ``inspect.getsource()`` (and similar tools) will use the qualified names that point to the implementation of the code - additional ``pydoc`` and/or ``inspect`` APIs may be provided that report all modules with a given public ``__name__``. Fixing multiprocessing on Windows --------------------------------- With ``__qualname__`` now available to tell ``multiprocessing`` the real name of the main module, it will be able to simply include it in the serialised information passed to the child process, eliminating the need for the current dubious introspection of the ``__file__`` attribute. For older Python versions, ``multiprocessing`` could be improved by applying the ``split_path_module()`` algorithm described above when attempting to work out how to execute the main module based on its ``__file__`` attribute. Explicit relative imports ========================= This PEP proposes that ``__package__`` be unconditionally defined in the main module as ``__qualname__.rpartition('.')[0]``. Aside from that, it proposes that the behaviour of explicit relative imports be left alone. In particular, if ``__package__`` is not set in a module when an explicit relative import occurs, the automatically cached value will continue to be derived from ``__name__`` rather than ``__qualname__``. This minimises any backwards incompatibilities with existing code that deliberately manipulates relative imports by adjusting ``__name__`` rather than setting ``__package__`` directly. Reference Implementation ======================== None as yet. References ========== .. [1] Module aliases and/or "real names" (http://mail.python.org/pipermail/python-ideas/2011-January/008983.html) .. [2] PEP 395 (Module aliasing) and the namespace PEPs (http://mail.python.org/pipermail/import-sig/2011-November/000382.html) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Thu Nov 24 00:05:53 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 24 Nov 2011 09:05:53 +1000 Subject: [Import-SIG] Updated PEP 395 ("Qualified Names for Modules" aka "Implicit Relative Imports Must Die!") In-Reply-To: References: Message-ID: On Sat, Nov 19, 2011 at 10:59 PM, Nick Coghlan wrote: > The updated version is includes below and has also been updated on > python.org if you prefer a nicely formatted version: > http://www.python.org/dev/peps/pep-0395/ > > The recent discussion regarding imports from main really crystallised > for me what I think is currently wrong with imports from main modules > - I was cheering when the Django folks updated their default site > template to avoid putting a package directory on sys.path (due to all > the problems it causes), but that thread made me realise how easy we > make it for beginners to do that by accident, with no real payoff of > any kind to justify it. > > So the PEP now spends a lot of time talking about the fact that our > current system for initialising sys.path[0] is almost always just > plain wrong as soon as packages are involved, but the explicit markers > on package directories make it possible for us to do the right thing > instead of being dumb about it. *crickets* No feedback at all on the prospect of changing the way we initialise sys.path[0] to respect the package information available on the filesystem? Also, ?ric Araujo raised an interesting point [1], automatically initialising sys.path[0] *at all* can be a problem in some circumstances, especially when symlinks are involved. PEP 395 won't really help with that (it may change some of the symptoms, but it won't fix the general problem), but it does make me wonder if the interpreter should have a flag to switch off sys.path[0] initialisation (similar to the existing flags to disable site processing, user site processing and processing of the PYTHONHOME and PYTHONPATH environment variables). [1] http://bugs.python.org/issue10318 (last couple of comments) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From pje at telecommunity.com Thu Nov 24 02:18:44 2011 From: pje at telecommunity.com (PJ Eby) Date: Wed, 23 Nov 2011 20:18:44 -0500 Subject: [Import-SIG] Updated PEP 395 ("Qualified Names for Modules" aka "Implicit Relative Imports Must Die!") In-Reply-To: References: Message-ID: On Wed, Nov 23, 2011 at 6:05 PM, Nick Coghlan wrote: > No feedback at all on the prospect of changing the way we initialise > sys.path[0] to respect the package information available on the > filesystem? > I gave you feedback previously: I think guessing based on __init__ files introduces new breakage potential at a distance for things that didn't break before. It'll also guess the wrong location when somebody bundles a dependency inside their package, and you try to run a script from the embedded package. I've tried coming up with other ways to guess the right thing to do, but fundamentally, they're all just guessing. What if we instead had something like this: import sys sys.set_script_module('foo.bar', __name__) And what it did was, if __name__ is '__main__', and sys.path[0] is pointing to the parent directory of the script file, then it fixes sys.path[0] to point to the right parent directory level. (Sanity checking whether you can then find the __main__ module using the given module name and the resulting sys.path[0].) Is it ugly? Yes. But it's *explicit*, and provides One Obvious Way to make a script that's also a module and will work correctly even if it's part of a package that's been embedded inside another package. I think that this or some other form of explicit declaration is needed to get around __init__ ambiguities that exist in the field today. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Nov 24 05:24:52 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 24 Nov 2011 14:24:52 +1000 Subject: [Import-SIG] Updated PEP 395 ("Qualified Names for Modules" aka "Implicit Relative Imports Must Die!") In-Reply-To: References: Message-ID: On Thu, Nov 24, 2011 at 11:18 AM, PJ Eby wrote: > > On Wed, Nov 23, 2011 at 6:05 PM, Nick Coghlan wrote: >> >> No feedback at all on the prospect of changing the way we initialise >> sys.path[0] to respect the package information available on the >> filesystem? > > I gave you feedback previously: I think guessing based on __init__ files > introduces new breakage potential at a distance for things that didn't break > before. ?It'll also guess the wrong location when somebody bundles a > dependency inside their package, and you try to run a script from the > embedded package. And you have yet to explain how that is in any way inferior to the status quo where we are consistently doing something that we *know* is wrong (i.e. putting a package directory directly on sys.path). Assuming people have their package layouts correct is *not* guessing, no matter how many times you try to claim it is. Calling that guessing is like saying that module name shadowing on sys.path (or any form of name shadowing) is guessing. It may not be what the user intended, but that doesn't mean the interpreter is wrong to believe the information the user is providing. The status quo sucks - as soon as you put a python file inside a package, almost *every* method we offer to invoke it breaks. Direct command line invocation breaks, double-clicking in a file browser breaks, running from Idle breaks, even importing it or using the -m switch only work if you're in the right working directory. All it takes is one perfectly reasonable assumption (that the filesystem package layout is correct), and we can *fix* all that just by being a bit smarter about the way we figure out sys.path[0]. Hypothetical "oh, but this bizarre situation with a clearly broken package layout that only worked by accident might now start breaking when it worked before" scenarios are a lousy argument for not fixing the behaviour of the interpreter for the vast majority of people that are doing the right thing. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Thu Nov 24 05:37:30 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 24 Nov 2011 14:37:30 +1000 Subject: [Import-SIG] Updated PEP 395 ("Qualified Names for Modules" aka "Implicit Relative Imports Must Die!") In-Reply-To: References: Message-ID: On Thu, Nov 24, 2011 at 11:18 AM, PJ Eby wrote: > It'll also guess the wrong location when somebody bundles a > dependency inside their package, and you try to run a script from the > embedded package. Oops, meant to reply to this part specifically. There are two legitimate ways of bundling a dependency inside a package: either embedding it in your module namespace, or by shipping a private directory (no __init__.py file) that you place on sys.path. In both cases, PEP 395 will do the right thing. In the first case, the parent directory of the embedding package ends up in sys.path[0] and the embedded copy is accessible as "package.embedded" (for example). The embedded copy *should* be using explicit relative imports (if it isn't, it's not safe to embed a copy as part of your module namespace in the first place). The explicit relative imports will all refer to the embedded copy as they should, and everything is fine. In the second case, the private directory will get placed in sys.path[0], the embedded copy is accessible at the top-level as "embedded" and everything is, once again, fine. You have yet to identify any case where a script will break for a reason other than reliance on implicit relative imports inside a package (which are *supposed* to be dead in 3.x, but linger in __main__ solely due to the way we initialise sys.path[0]). If a script is going to be legitimately shipped inside a package directory, it *must* be importable as part of that package namespace, and any script in Py3k that relies on implicit relative imports fails to qualify. This is in contrast to 2.x, where the implicit relative import support in all package submodules let you get away with that kind of approach. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From pje at telecommunity.com Thu Nov 24 06:32:52 2011 From: pje at telecommunity.com (PJ Eby) Date: Thu, 24 Nov 2011 00:32:52 -0500 Subject: [Import-SIG] Updated PEP 395 ("Qualified Names for Modules" aka "Implicit Relative Imports Must Die!") In-Reply-To: References: Message-ID: On Wed, Nov 23, 2011 at 11:37 PM, Nick Coghlan wrote: > You have yet to identify any case where a script will break for a > reason other than reliance on implicit relative imports inside a > package You're right; I didn't think of this because I haven't moved past Python 2.5 for production coding as yet. ;-) I still think extraneous __init__.py files still exist in the field, but I'll admit that both of these things are infrequent cases. However, if we're going on the basis of how many newbie errors can be solved by Just Working, PEP 402 will help more newbies than PEP 395, since you must first *have* a package in order for 395 to be meaningful. ;-) (which are *supposed* to be dead in 3.x, but linger in > __main__ solely due to the way we initialise sys.path[0]). If a script > is going to be legitimately shipped inside a package directory, it > *must* be importable as part of that package namespace, and any script > in Py3k that relies on implicit relative imports fails to qualify. > Wait a minute... What would happen if there were no implicit relative imports allowed in __main__? Or are you just saying that you get the *appearance* of implicit relative importing, due to aliasing? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Nov 24 09:12:43 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 24 Nov 2011 18:12:43 +1000 Subject: [Import-SIG] Updated PEP 395 ("Qualified Names for Modules" aka "Implicit Relative Imports Must Die!") In-Reply-To: References: Message-ID: On Thu, Nov 24, 2011 at 3:32 PM, PJ Eby wrote: > You're right; I didn't think of this because I haven't moved past Python 2.5 for production coding as yet. ;-) Yeah, there's absolutely no way we could have changed this in 2.x - with implicit relative imports in packages still allowed, there's too much code such a change in semantics could have broken. In Py3k though, most of that code is already going to break one way or another: if they don't change it, attempting to import it will fail (since implicit relative imports are gone), while if they *do* switch to explicit relative imports to make importing as a module work, then they're probably going to break direct invocation (since __name__ and __package__ will be wrong unless you use '-m' from the correct working directory). The idea behind PEP 395 is to make converting to explicit relative imports the right thing to do, *without* breaking dual-role modules for either use case. > However, if we're going on the basis of how many newbie errors can be solved > by Just Working, PEP 402 will help more newbies than PEP 395, since you must > first *have* a package in order for 395 to be meaningful. ?;-) Nope, PEP 402 makes it worse, because it permanently entrenches the current broken sys.path[0] initialisation with no apparent way out. That first list in the current PEP of "these invocations currently break for modules inside packages"? They all *stay* broken forever under PEP 402, because the filesystem no longer fully specifies the package structure - you need an *already* initialised sys.path to figure out how to translate a given filesystem layout into the Python namespace. With the package structure underspecified, there's no way to reverse engineer what sys.path[0] *should* be and it becomes necessary to put the burden back on the developer. Consider this PEP 382 layout (based on the example __init__.py based layout I use in PEP 395): project/ setup.py example.pyp/ foo.py tests.pyp/ test_foo.py There's no ambiguity there: We have a top level project directory containing an "example" package fragment and an "example.tests" subpackage fragment. Given the full path to any of "setup.py", "foo.py" and "test_foo.py", we can figure out that the correct thing to place in sys.path[0] is the "projects" directory. Under PEP 402, it would look like this: project/ setup.py example/ foo.py tests/ test_foo.py Depending on what you put on sys.path, that layout could be defining a "project" package, an "example" package or a "tests" package. The interpreter has no way of knowing, so it can't do anything sensible with sys.path[0] when the only information it has is the filename for "foo.py" or "test_foo.py". Your best bet would be the status quo: just use the directory containing that file, breaking any explicit relative imports in the process (since __name__ is correspondingly inaccurate). People already have to understand that Python modules have to be explicitly marked - while "foo" can be executed as a Python script, it cannot be imported as a Python module. Instead, the name needs to be "foo.py" so that the import process will recognise it as a source module. Explaining that importable package directories are similarly marked with either an "__init__.py" file or a ".pyp" extension is a fairly painless task - people can accept it and move on, even if they don't necessarily understand *why* it's useful to be explicit about package layouts. (Drawing that parallel is even more apt these days, given the ability to explicitly execute any directory containing a __main__.py file regardless of the directory name or other contents) The mismatch between __main__ imports and imports from everywhere else, though? That's hard to explain to *experienced* Python programmers, let alone beginners. My theory is that if we can get package layouts to stop breaking most invocation methods for modules inside those packages, then beginners should be significantly less confused about how imports work because the question simply won't arise. Once the behaviour of imports from __main__ is made consistent with imports from other modules, then the time when people need to *care* about details like how sys.path[0] gets initialised can be postponed until much later in their development as a Python programmer. >> (which are *supposed* to be dead in 3.x, but linger in >> __main__ solely due to the way we initialise sys.path[0]). If a script >> is going to be legitimately shipped inside a package directory, it >> *must* be importable as part of that package namespace, and any script >> in Py3k that relies on implicit relative imports fails to qualify. > > Wait a minute... ?What would happen if there were no implicit relative > imports allowed in __main__? > Or are you just saying that you get the *appearance* of implicit relative > importing, due to aliasing? The latter - because the initialisation of sys.path[0] ignores package structure information in the filesystem, it's easy to get the interpreter to commit the cardinal aliasing sin of putting a package directory on sys.path. In a lot of cases, that kinda sorta works in 2.x because of implicit relative imports, but it's always going to cause problems in 3.x. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ericsnowcurrently at gmail.com Thu Nov 24 11:38:45 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 24 Nov 2011 03:38:45 -0700 Subject: [Import-SIG] Updated PEP 395 ("Qualified Names for Modules" aka "Implicit Relative Imports Must Die!") In-Reply-To: References: Message-ID: On Wed, Nov 23, 2011 at 4:05 PM, Nick Coghlan wrote: > On Sat, Nov 19, 2011 at 10:59 PM, Nick Coghlan wrote: >> The updated version is includes below and has also been updated on >> python.org if you prefer a nicely formatted version: >> http://www.python.org/dev/peps/pep-0395/ >> >> The recent discussion regarding imports from main really crystallised >> for me what I think is currently wrong with imports from main modules >> - I was cheering when the Django folks updated their default site >> template to avoid putting a package directory on sys.path (due to all >> the problems it causes), but that thread made me realise how easy we >> make it for beginners to do that by accident, with no real payoff of >> any kind to justify it. >> >> So the PEP now spends a lot of time talking about the fact that our >> current system for initialising sys.path[0] is almost always just >> plain wrong as soon as packages are involved, but the explicit markers >> on package directories make it possible for us to do the right thing >> instead of being dumb about it. > > *crickets* > > No feedback at all on the prospect of changing the way we initialise > sys.path[0] to respect the package information available on the > filesystem? > > Also, ?ric Araujo raised an interesting point [1], automatically > initialising sys.path[0] *at all* can be a problem in some > circumstances, especially when symlinks are involved. PEP 395 won't > really help with that (it may change some of the symptoms, but it > won't fix the general problem), but it does make me wonder if the > interpreter should have a flag to switch off sys.path[0] > initialisation (similar to the existing flags to disable site > processing, user site processing and processing of the PYTHONHOME and > PYTHONPATH environment variables). That sounds good to me. Relatedly, and this will reflect my relative inexperience here, I still don't have a clear picture of why we do the sys.path[0] initialization in the first place. I'm sure there's a good reason, but I've never (knowingly) met it. :) Is it so that modules on which the script relies don't have to be put in a package under some sys.path directory (to be explicitly imported), saving a little time during development or simplifying packaging a little? Seems to me like that sort of thing is addressable in better, more obvious ways, but especially that the benefits of the implicit initialization don't outweigh the problems it causes (hence PEP 395). Unless you think to read the sys.path doc entry, you won't know about the sys.path[0] initialization for scripts, and wonder why you keep getting implicit relative imports when you aren't expecting them. I would think that would be more impactful across the spectrum of Python experience than (what seem like minor) benefits gained from the implicit initialization. So, at this point, is it just an artifact of early Python, when better solutions weren't around yet? What am I missing here? -eric > > [1] http://bugs.python.org/issue10318 (last couple of comments) > > Cheers, > Nick. > > -- > Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > http://mail.python.org/mailman/listinfo/import-sig > From ncoghlan at gmail.com Thu Nov 24 12:33:15 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 24 Nov 2011 21:33:15 +1000 Subject: [Import-SIG] Updated PEP 395 ("Qualified Names for Modules" aka "Implicit Relative Imports Must Die!") In-Reply-To: References: Message-ID: On Thu, Nov 24, 2011 at 8:38 PM, Eric Snow wrote: > So, at this point, is it just an artifact of early Python, when better > solutions weren't around yet? ?What am I missing here? sys.path[0] initialisation is essential for making the interactive interpreter useful - when you're developing new Python code, you want to be able to import whatever you're working on into an interactive session without having to mess about with sys.path[0] manually. The rest pretty much than follows from a desire to maintain some level of consistency with the interactive prompt behaviour. (I don't know Guido's original reasoning though - the current behaviour was well established long before I started using the language. As far as I know, it's been this way since the earliest public releases) Regardless, your question did just make me realise that my current proposal for new -m semantics in PEP 395 is broken. It assumes that the module is going to be found in a subdirectory of the current directory and that's just plain wrong (e.g. for cases like "python -m timeit"). However, fixing it is pretty easy, and addresses a slight concern I had with what it allowed (i.e. the "python -m tests.test_foo" and "python -m test_foo" cases). The solution is simply to *not* adjust modname in the "-m" case (i.e. keep the initialisation completely consistent with the interactive prompt and -c, as it is now). Then the effect of PEP 395 will just be to allow "python -m packages.tests.test_foo" to work from anywhere within the package hierarchy, *without* allowing the abbreviated forms. That's a much better, more consistent outcome. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ericsnowcurrently at gmail.com Thu Nov 24 21:32:30 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 24 Nov 2011 13:32:30 -0700 Subject: [Import-SIG] Updated PEP 395 ("Qualified Names for Modules" aka "Implicit Relative Imports Must Die!") In-Reply-To: References: Message-ID: On Thu, Nov 24, 2011 at 4:33 AM, Nick Coghlan wrote: > On Thu, Nov 24, 2011 at 8:38 PM, Eric Snow wrote: >> So, at this point, is it just an artifact of early Python, when better >> solutions weren't around yet? ?What am I missing here? > > sys.path[0] initialisation is essential for making the interactive > interpreter useful - when you're developing new Python code, you want > to be able to import whatever you're working on into an interactive > session without having to mess about with sys.path[0] manually. > The rest pretty much than follows from a desire to maintain some level > of consistency with the interactive prompt behaviour. So the behavior of Python execution can be grouped thusly: - scripts (