[Python-ideas] Packages and Import

Thu Feb 8 07:18:21 CET 2007

On 2/7/07, Ron Adam <rrr at ronadam.com> wrote:
> Brett Cannon wrote:
> > On 2/4/07, Ron Adam <rrr at ronadam.com> wrote:
> >>
> >> After exploring this a bit further on comp.lang.python, I was able to
> >> organize
> >> these ideas better.  The more I thought about it, the more '+'s I
> >> found, and
> >> about the only '-'s I can think of is the work required to actually
> >> make a patch
> >> to do it.
> >>
> >> It's also good to keep in mind that since most people still rely on
> >> the old
> >> relative import behavior, most people have not run into some of the
> >> issues I
> >> mention here.  But they will at some point.
> >>
> >> I did mean to keep this short, but clarity won out. (At least it's
> >> clear to me,
> >> but that's an entirely subjective opinion on my part.)
> >>
> >> Maybe someone will adopt this and make a real PEP out of it.  :-)
> >>
> >> Cheers,
> >>    Ron
> >>
> >>
> >>
> >> PROPOSAL
> >> ========
> >>
> >> Make pythons concept of a package, (currently an informal type), be
> >> stronger
> >> than that of the underlying file system search path and directory
> >> structure.
> >>
> >
> > So you mean make packages more of an official thing than just having a
> > __path__ attribute on a module, right?
>
> Currently in python 2.5, __path__ attributes are only in the imported package
> name spaces.  Running a module doesn't set a __path__ attribute, just the
> __file__ attribute.
>

True.

> It would be nice if __path__ were set on all modules in packages no matter how
> they are started.

There is a slight issue with that as the __path__ attribute represents
the top of a package and thus that it has an __init__ module.  It has
some significance in terms of how stuff works at the moment.

>  The real name could be worked out by comparing __path__ and
> __file__ if someone needs that.  But I think it would be better to just go ahead
> and add a __realname__ attribute for when __name__ is "__main__".
>
> __name__ == "__main__" can stay the same and still serve it's purpose to tell
> weather a script was started directly or imported.
>

I think the whole __main__ thing is the wrong thing to be trying to
keep alive for this.  I know it would break things, but it is probably
better to come up with a better way for a module to know when it is
being executed or do denote what code should only be run when it is
executed.

>
>
> >> Where the following hold true in python 3.X, or when absolute_import
> >> behavior is
> >> imported from __future__ in python 2.X:
> >>
> >>
> >> (1) Python first determines if a module or package is part of a
> >> package and then
> >> runs that module or package in the context of the package they belong
> >> to. (see
> >> items below)
> >>
> >
> > Don't quite follow this statement.  What do you mean by "runs" here?
> > You mean when using runpy or something and having the name set to
> > '__main__'?
>
> Yes
>
>
> >> (2)  import this_package.module
> >>       import this_package.sub_package
> >>
> >> If this_package is the same name as the current package, then do not
> >> look on
> >> sys.path. Use the location of this_package.
> >>
> >
> > Already does this (at least in my pure Python implementation).
> > Searches are done on __path__ when you are within a package.
>
> Cool! I don't think it's like that for the non-pure version, but it may do it
> that way if
> "from __future__ import absolute_import" is used.
>

It does do it both ways, there is just a fallback on the classic
import semantics in terms of trying it both as a relative and absolute
import.  But I got the semantics from the current implementation so it
is not some great inspiration of mine.  =)

> Are you setting __path__ for each module imported in a package too?
>

No.  As I said above, having __path__ set has some special meaning in
how imports work at the moment.  It stays on packages and not modules
within packages.

>
> >> (3)  import other_package.module
> >>       import other_package.sub_package
> >>
> >> If other_package is a different name from the current package
> >> (this_package),
> >> then do not look in this_package and exclude searches in sys.path
> >> locations that
> >> are inside this_package including the current directory.
> >
> >
> > This change would require importers to do more.  Since the absolute
> > import semantics automatically make this kind of import start at the
> > top-level (i.e., sys.path), each import for an entry on sys.path would
> > need to be told what package it is currently in, check if it handles
> > that package, and then skip it if it does have it.
>
> I don't think it will be as hard as this.  See below.
>
>
> > That seems like a lot of work that I know I don't want to have to
> > implement for every importer I ever write.
>
> Only getting the correct package location for the first module executed in the
> package will be a bit of work. (But not that much.) After that, it can be passed
> around.
>
> Here's something I used recently to get the full dotted name without importing.
> It could also return the base package path as well.  You probably don't need the
> cache.  These could be combined and shortened further for just finding a root
> package location.
>
>
> def path_type(path):
>      """ Determine what kind of thing path is.
>
>          Returns  ->  'module'|'package'|'dir'| None
>      """
>      if os.path.isfile(path) \
>          and  (path[-3:] == '.py' or \
>                path[-4:] in ('.pyw', '.pyc', '.pyd', '.pyo')):
>          return 'module'
>      if os.path.isdir(path):
>          for end in ['', 'w', 'c', 'o']:
>              if os.path.isfile(os.path.join(path, '__init__.py' + end)):
>                  return 'package'
>          return 'dir'
>
> def dotted_name(path, cache={}):
>      """ Get a full dotted module or package name from a path name.
>
>          Returns  ->  fully qualified (dotted) name | None
>      """
>      if path in cache:
>          return cache[path]
>      if path_type(path) in ('package', 'module'):
>          parent, name = os.path.split(path)
>          name, _ = os.path.splitext(name)
>          while 1:
>              if path_type(parent) != 'package':
>                  break
>              parent, nextname = os.path.split(parent)
>              name = '.'.join([nextname, name])
>          cache[path] = name
>          return name
>
>
>
> lets.. see  (untested)
>
> def package_path(path):
>      """ Get the package location of a module.
>      """
>      package = None
>      if path_type(path) in ('package', 'module'):
>          parent, name = os.path.split(path)
>          while 1:
>              if path_type(parent) != 'package':
>                  break
>             package = os.path.join(parent, name)
>              parent, name = os.path.split(parent)
>      return package
>

Or you could have copied the code I wrote for the filesystem
importer's find_module method that already does this classification.
=)

Part of the problem of working backwards from path to dotted name is
that it might not import that way.  __path__ can be tweaked, importers
and loaders can be written to interpret the directory structure or
file names differently, etc.  Plus what about different file types
like .ptl files from Quixote?

>
>
> >> (4)  import module
> >>       import package
> >>
> >> Module and package are not in a package, so don't look in any
> >> packages, even
> >> this one or sys.path locations inside of packages.
> >>
> >
> > This is already done.  Absolute imports would cause this to do a
> > shallow check on sys.path for the module or package name.
>
> Great! 2 down.  Almost half way there.  :-)
>
> But will it check the current directory if you run a module directly because
> currently it doesn't know if it's part of a package.  Is that correct?
>

Absolute import semantics go straight to sys.path, period.

>
> >> (5) For behaviors other than these, like when you do actually want to
> >> run a
> >> module belonging to a package in a different context, a mechanism such
> >> as a
> >> command line switch, or a settable import attribute should be used.
> >>
> >>
> >> MOTIVATION
> >> ==========
> >>
> >> (A) Added reliability.
> >>
> >> There will be much less chance of errors (silent or otherwise) due to
> >> path/import conflicts which are sometimes difficult to diagnose.
> >>
> >
> > Probably, but I don't know if the implementation complexity warrants
> > worrying about this.  But then again how many people have actually
> > needed to implement the import machinery.  =)  I could be labeled as
> > jaded.
>
> Well, I know it's not an easy thing to do.  But it's not finding the paths and
> or weather files are modules etc... that is hard.  From what I understand the
> hard part is making it work so it can be extended and customized.
>
> Is that correct?
>

Yes.  I really think ditching this whole __main__ name thing is going
to be the only solid solution.  Defining a __main__() method for
modules that gets executed makes the most sense to me.  Just import
the module and then execute the function if it exists.  That allow
runpy to have the name be set properly and does away with import
problems without mucking with import semantics.  Still have the name
problem if you specify a file directly on the command line, though.

>
> >> There may also be some added security benefits as well because it
> >> would much
> >> harder for someone to create a same named module or package and insert
> >> it by
> >> putting it on the path. Or by altering sys.path to do the same. [*]
> >>
> >> [* - If this can happen there are probably more serious security
> >> issues, but not
> >> everyone has the most secure setup, so this point is still probably a
> >> good
> >> point. General reliable execution of modules is the first concern,
> >> this may be a
> >> side benefit of that.]
> >>
> >>
> >> (B) Reduce the need for special checks and editing sys.path.
> >>
> >> Currently some authors have edit sys.path or do special if
> >> os.path.exists()
> >> checks to ensure proper operations in some situations such as running
> >> tests.
> >> These suggestions would reduce the need for such special testing and
> >> modifications.
> >>
> >
> > This might minimize some sys.path hacks in some instances, but it also
> > complicates imports overall in terms of implementation and semantics.
>
> I'm not sure why it would make it so much more complicated.  The contexts for
> which the imports are done will need to be done for cases of package imports,
> relative package imports, and modules in any case.  It's just a matter of
> determining which one to use from the start.  I guess I need to look into how
> pythons imports work in a little more detail.
>
> > Where is point C?
>
> Woops... I could make one up if you really want one.  ;-)
>

No, that's okay.  =)

>
> (It was moved elsewhere and I forgot to reletter.)
>
>
> >> (D) Easier editing and testing.
> >>
> >> While you are editing modules in a package, you could then run the module
> >> directly (as you can with old style relative imports) and still get
> >> the correct
> >> package-relative behavior instead of something else. (like an
> >> exception or wrong
> >> output). Many editors support running the file being edited, including
> >> idle.
> >> It's also can be difficult to write scripts for the editors to
> >> determine the
> >> correct context to run a module in.
> >>
> >
> > How is this directly solved, though?  You mentioned "running" a module
> > as if it is in a package, but there is no direct explanation of how
> > you would want to change the import machinery to pull this off.
> > Basically you need a way to have either modules with the name __main__
> > be able to get the canonical name for import purposes.  Or you need to
> > leave __name__ alone and set some other global or something to flag
> > that it is the __main__ module.
>
> Leave __name__ alone, yes.  Add a __path__ attribute for all modules that is set
> to the base package location. Add a __realname__ attribute only to modules who's
> __name__ is set to '__main__'.
>

I don't like this idea of having one attribute have the same meaning
as another attribute.  I don't think a good backwards-compatible
solution is going to crop up.

> The import machinery could then use those to determine how to handle imports in
> that module.
>
> Is that clearer?
>

It is, but I don't like it.  =)

> If __path__ exists, then it's module in a package.
> If __realname__ exists, then it was run as a script, but here's the actual name
> anyway.
>
> If __name__ is '__main__' then do what scripts do when __name__ == '__main__'.
>
>
>
> > Regardless, I am not seeing how you are proposing to go about solving
> > this problem.
>
> Discussing it is a good start to doing that,  isn't it?   ;-)
>

Yep.

-Brett