[Import-SIG] [Python-ideas] My objections to implicit package directories

Eric Snow ericsnowcurrently at gmail.com
Tue Mar 13 03:50:56 CET 2012


On Mon, Mar 12, 2012 at 7:43 PM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> On Mon, Mar 12, 2012 at 5:03 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> It seems the consensus at the PyCon US sprints is that implicit
>> package directories are a wonderful idea and we should have more of
>> those. I still disagree (emphatically), but am prepared to go along
>> with it so long as my documented objections are clearly and explicitly
>> addressed in the new combined PEP, and the benefits ascribed to
>> implicit package directories in the new PEP are more compelling than
>> "other languages do it that way, so we should too".
>>
>> To save people having to trawl around various mailing list threads and
>> reading through PEP 395, I'm providing those objections in a
>> consolidated form here. If reading these objections in one place
>> causes people to have second thoughts about the wisdom of implicit
>> package directories, even better.
>>
>> 1. Implicit package directories go against the Zen of Python
>>
>> Getting this one out of the way first. As I see it, implicit package
>> directories violate at least 4 of the design principles in the Zen:
>> - Explicit is better than implicit (my calling them implicit package
>> directories is a deliberate rhetorical ploy to harp on this point,
>> although it's also an accurate name)
>> - If the implementation is hard to explain, it's a bad idea (see the
>> section about backwards compatibility challenges)
>> - Readability counts (see the section about introducing ambiguity into
>> filesystem layouts)
>> - Errors should never pass silently (see the section about implicit
>> relative imports from main)
>>
>> 2. Implicit package directories pose awkward backwards compatibility challenges
>>
>> It concerns me gravely that the consensus proposal MvL posted is
>> *backwards incompatible with Python 3.2*, as it deliberately omits one
>> of the PEP 402 features that provided that backwards compatibility.
>> Specifically, under the consensus, a subdirectory "foo" of a directory
>> on sys.path will shadow a "foo.py" or "foo/__init__.py" that appears
>> later on sys.path. As Python 3.2 would have found that latter
>> module/package correctly, this is an unacceptable breach of the
>> backwards compatibility requirements. PEP 402 at least got this right
>> by always executing the first "foo.py" or "foo/__init__.py" it found,
>> even if
>> another "foo" directory was found earlier in sys.path.
>>
>> We can't just wave that additional complexity away if an implicit
>> package directory proposal is going to remain backwards compatible
>> with current layouts (e.g. if an application's starting directory
>> included a "json" subfolder containing json files rather than Python
>> code, the consensus approach as posted by MvL would render the
>> standard library's json module inaccessible)
>>
>> 3. Implicit package directories introduce ambiguity into filesystem layouts
>>
>> With the current Python package design, there is a clear 1:1 mapping
>> between the filesystem layout and the module hierarchy. For example:
>>
>>    parent/  # This directory goes on sys.path
>>        project/  # The "project" package
>>            __init__.py  # Explicit package marker
>>            code.py  # The "project.code" module
>>            tests/  # The "project.tests" package
>>                __init__.py  # Explicit package marker
>>                test_code.py  # The "projects.tests.test_code" module
>>
>> Any explicit package directory approach will preserve this 1:1
>> mapping. For example, under PEP 382:
>>
>>    parent/  # This directory goes on sys.path
>>        project.pyp/  # The "project" package
>>            code.py  # The "project.code" module
>>            tests.pyp/  # The "project.tests" package
>>                test_code.py  # The "projects.tests.test_code" module
>>
>> With implicit package directories, you can no longer tell purely from
>> the code structure which directory is meant to be added to sys.path,
>> as there are at least two valid mappings to the Python module
>> hierarchy:
>>
>>    parent/  # This directory goes on sys.path
>>        project/  # The "project" package
>>            code.py  # The "project.code" module
>>            tests/  # The "project.tests" package
>>                test_code.py  # The "projects.tests.test_code" module
>>
>>    parent/
>>        project/  # This directory goes on sys.path
>>            code.py  # The "code" module
>>            tests/  # The "tests" package
>>                test_code.py  # The "tests.test_code" module
>>
>> What are implicit package directories buying us in exchange for this
>> inevitable ambiguity? What can we do with them that can't be done with
>> explicit package directories? And no, "Java does it that way" is not a
>> valid argument.
>>
>> 4. Implicit package directories will permanently entrench current
>> newbie-hostile behaviour in __main__
>>
>> It's a fact of life that Python beginners learn that they can do a
>> quick sanity check on modules they're writing by including an "if
>> __name__ == '__main__':" section at the end and doing one of 3 things:
>> - run "python mymodule.py"
>> - hit F5 (or the relevant hot key) in their IDE
>> - double click the module in their filesystem browser
>> - start the Python REPL and do "import mymodule"
>>
>> However, there are some serious caveats to that as soon as you move
>> the module inside a package:
>> - if you use explicit relative imports, you can import it, but not run
>> it directly using any of the above methods
>> - if you rely on implicit relative imports, the above direct execution
>> methods should work most of the time, but you won't be able to import
>> it
>> - if you use absolute imports for your own package, nothing will work
>> (unless the parent directory for your package is already on sys.path)
>> - if you only use absolute imports for *other* packages, everything
>> should be fine
>>
>> The errors you get in these cases are *horrible*. The interpreter
>> doesn't really know what is going on, so it gives the user bad error
>> messages.
>>
>> In large part, the "Why are my imports broken?" section in PEP 395
>> exists because I sat down to try to document what does and doesn't
>> work when you attempt to directly execute a module from inside a
>> package directory. In building the list of what would work properly
>> ("python -m" from the parent directory of the package) and what would
>> sometimes break (everything else), I realised that instead of
>> documenting the entire hairy mess, the 1:1 mapping from the filesystem
>> layout to the Python module hierarchy meant we could *just fix it* to
>> not do the wrong thing by default. If implicit package directories are
>> blessed for inclusion in Python 3.3, that opportunity is lost forever
>> - with the loss of the unambiguous 1:1 mapping from the filesystem
>> layout to the module hierarchy, it's no longer possible for the
>> interpreter to figure out the right thing to do without guessing.
>>
>> PJE proposed that newbies be instructed to add the following
>> boilerplate to their modules if they want to use "if __name__ ==
>> '__main__':" for sanity checking:
>>
>>    import pkgutil
>>    pkgutil.script_module(__name__, 'project.code.test_code')
>>
>> This completely defeats the purpose of having explicit relative
>> imports in the language, as it embeds the absolute name of the module
>> inside the module itself. If a package subtree is ever moved or
>> renamed, you will have to manually fix every script_module()
>> invocation in that subtree. Double-keying data like this is just plain
>> bad design. The package structure should be recorded explicitly in
>> exactly one place: the filesystem.
>>
>> PJE has other objections to the PEP 395 proposal, specifically
>> relating to its behaviour on package layouts where the directories
>> added to sys.path contain __init__.py files, such that the developer's
>> intent is not accurately reflected in their filesystem layout. Such
>> layouts are *broken*, and the misbehaviour under PEP 395 won't be any
>> worse than the misbehaviour with the status quo (sys.path[0] is set
>> incorrectly in either case, it will just be fixable under PEP 395 by
>> removing the extraneous __init__.py files). A similar argument applies
>> to cases where a parent package __init__ plays games with sys.path
>> (although the PEP 395 algorithm could likely be refined to better
>> handle that situation). Regardless, if implicit package directories
>> are accepted into Python 3.3 in any form, I *will* be immediately
>> marking PEP 395 as Rejected due to incompatibility with an accepted
>> PEP. I'll then (eventually, once I'm less annoyed about the need to do
>> so) write a new PEP to address a subset of the issues previously
>> covered by PEP 395 that omits any proposals that rely on explicit
>> package directories.
>>
>> Also, I consider it a requirement that any implicit packages PEP
>> include an update to the tutorial to explain to beginners what will
>> and won't work when they attempt to directly execute a module from
>> inside a Python package. After all, such a PEP is closing off any
>> possibility of ever fixing the problem: it should have to deal with
>> the consequences.
>
> Hi Nick,
>
> The write-up was a little unclear on a main point and I think that's
> contributed to some confusion here.  The path search will continue to
> work in exactly the same way as it does now, with one difference.
> Instead of the current ImportError when nothing matches, the mechanism
> for namespace packages would be used.
>
> The mechanism would create a namespace package with a __path__
> matching the paths corresponding to all namespace package "portions".
> The likely implementation will simply track the namespace package
> __path__ during the initial (normal) path search and use it only when
> there are no matching modules nor regular packages.
>
> Packages without __init__.py would only be allowed for namespace
> packages.  So effectively namespace packages would be problematic for
> PEP 395, but not normal packages.
>
> Ultimately this is a form of PEP 402 without so much complexity.  The
> trade-off is it requires a new kind of package.  As far as I
> understand them, most of your concerns are based on the idea that
> namespace packages would be included in the initial traversal of
> sys.path, which is not the case.  It sounds like there are a couple
> points you made that may still need attention, but hopefully this at
> least helps clarify what we talked about.
>
> -eric

sorry (reply all failed me here :))  reposting to import-sig

-eric


More information about the Import-SIG mailing list