[Python-ideas] Packages and Import

Thu Feb 8 21:41:11 CET 2007

Brett Cannon wrote:
> On 2/7/07, Ron Adam <rrr at ronadam.com> wrote:
>> Brett Cannon wrote:
>> > On 2/4/07, Ron Adam <rrr at ronadam.com> wrote:

>> It would be nice if __path__ were set on all modules in packages no 
>> matter how
>> they are started.
> 
> There is a slight issue with that as the __path__ attribute represents
> the top of a package and thus that it has an __init__ module.  It has
> some significance in terms of how stuff works at the moment.

Yes, and after some reading I found __path__ isn't exactly what I was thinking.

It could be it's only a matter of getting that first initial import right.  An 
example of this is this recipe by Nick.

     http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/307772

>>  The real name could be worked out by comparing __path__ and
>> __file__ if someone needs that.  But I think it would be better to 
>> just go ahead
>> and add a __realname__ attribute for when __name__ is "__main__".
>>
>> __name__ == "__main__" can stay the same and still serve it's purpose 
>> to tell
>> weather a script was started directly or imported.
> 
> I think the whole __main__ thing is the wrong thing to be trying to
> keep alive for this.  I know it would break things, but it is probably
> better to come up with a better way for a module to know when it is
> being executed or do denote what code should only be run when it is
> executed.

I was trying to suggest things that would do the least harm as far as changing 
things in the eyes of the users.  If not keeping the "__main__" name in python 
3k is a real option then yes, then there may be more options.  Is it a real 
option?  Or is Guido set on keeping it?

If you remove the "__main__" name, then you will still need to have some 
attribute for python to determine the same thing.  What you would end up doing 
is just moving the [if __name__=="__main__": __main__()] line off the end of 
program so that all program have it automatically.  We just won't see it.  And 
instead of checking __name__, the interpreter would check some other attribute.

So what and where would that other attribute be?

Would it be exposed so we add if __ismain__: <body> to our programs for 
initialization purposes?

Or you could just replace it with an __ismain__ attribute then we can name our 
main functions anyhthing we want... like test().

if __ismain__:
    test()

That is shorter and maybe less confusing than the __name__ check.

>> >> (2)  import this_package.module
>> >>       import this_package.sub_package
>> >>
>> >> If this_package is the same name as the current package, then do not
>> >> look on
>> >> sys.path. Use the location of this_package.
>> >>
>> >
>> > Already does this (at least in my pure Python implementation).
>> > Searches are done on __path__ when you are within a package.
>>
>> Cool! I don't think it's like that for the non-pure version, but it 
>> may do it
>> that way if
>> "from __future__ import absolute_import" is used.
> 
> It does do it both ways, there is just a fallback on the classic
> import semantics in terms of trying it both as a relative and absolute
> import.  But I got the semantics from the current implementation so it
> is not some great inspiration of mine.  =)

I think there shouldn't be a fall back.. that will just confuse things. Raise an 
exception here because most likely falling back is not what you want.

If someone wants to import an external to a package module with the same name as 
the package, (or modules in some other package with the same name), then there 
needs to be an explicit way to do that.  But I really don't think this will come 
up that often.

<clipped general examples>

> Or you could have copied the code I wrote for the filesystem
> importer's find_module method that already does this classification.
> =)
> 
> Part of the problem of working backwards from path to dotted name is
> that it might not import that way.  

Maybe it should work that way?  If someone wants other than that behavior, then 
maybe there can be other ways to get it?

Hers's an example of a situation where you might think it would be a problem, 
but it isn't:

     pkg1:
       __init__.py
       m1.py
       spkg1:
          __init__.py
          m3.py
       dirA:
          m4.py
          pkg2:
             __init__.py
             m5.py

You might think it wouldn't work for pkg2.m5, but that's actually ok.  pkg2 is a 
package just being stored in dirA which just happens to be located inside 
another package.

Running m5.py directly will run it as a submodule of pkg2, which is what you 
want.  It's not in a sub-package of pkg1.  And m4.py is just a regular module.

Or are you thinking of other relationships?

>__path__ can be tweaked, importers
> and loaders can be written to interpret the directory structure or
> file names differently, etc.

Yes, and they will need a basic set of well defined default behaviors to build 
on.  After that, it's up to them to be sure their interpretation does what they 
want.

> Plus what about different file types
> like .ptl files from Quixote?

This is really a matter of using a corresponding file reader to get at it's 
contents or it's real (relative to python) type... Ie, is it really a module, a 
package, or a module in a package, or some other thing ... living inside of a 
zip, or some other device (or file) like container?

>> >> (4)  import module
>> >>       import package
>> >>
>> >> Module and package are not in a package, so don't look in any
>> >> packages, even
>> >> this one or sys.path locations inside of packages.
>> >>
>> >
>> > This is already done.  Absolute imports would cause this to do a
>> > shallow check on sys.path for the module or package name.
>>
>> Great! 2 down.  Almost half way there.  :-)
>>
>> But will it check the current directory if you run a module directly 
>> because
>> currently it doesn't know if it's part of a package.  Is that correct?
> 
> Absolute import semantics go straight to sys.path, period.

Which includes the current directory.  So in effect it will fall back to a 
relative type of behavior if a module with the same name is being imported exist 
in the current, inside this package direcotry, *if* you execute the module directly.

I think this should also give an error, it is the inverse of the situation 
above. (#2) In most cases (if not all) it's not what you want.

You wanted a module that is not part of this modules package, and got one that is.

>> >> MOTIVATION
>> >> ==========
>> >>
>> >> (A) Added reliability.
>> >>
>> >> There will be much less chance of errors (silent or otherwise) due to
>> >> path/import conflicts which are sometimes difficult to diagnose.
>> >>
>> >
>> > Probably, but I don't know if the implementation complexity warrants
>> > worrying about this.  But then again how many people have actually
>> > needed to implement the import machinery.  =)  I could be labeled as
>> > jaded.
>>
>> Well, I know it's not an easy thing to do.  But it's not finding the 
>> paths and
>> or weather files are modules etc... that is hard.  From what I 
>> understand the
>> hard part is making it work so it can be extended and customized.
>>
>> Is that correct?
> 
> Yes.  I really think ditching this whole __main__ name thing is going
> to be the only solid solution.  Defining a __main__() method for
> modules that gets executed makes the most sense to me.  Just import
> the module and then execute the function if it exists.  That allow
> runpy to have the name be set properly and does away with import
> problems without mucking with import semantics.  Still have the name
> problem if you specify a file directly on the command line, though.

I'll have to see more details of how this would work I think. Part of me says
sound good. And another part says, isn't this just moving stuff around? And what
exactly does that solve?

>> The import machinery could then use those to determine how to handle 
>> imports in
>> that module.
>>
>> Is that clearer?
> 
> It is, but I don't like it.  =)

It does't exactly have to work that way.  ;-)

It's the "does it do what I designed it to do" behavioral stuff of packages and 
modules that I want.  If the module, however it is run gives an error or does 
something other than what I intended, then that's a problem.

Ron