main vs official module name: distinct module instances

Sun Aug 2 20:57:44 EDT 2015

On 02Aug2015 17:41, Steven D'Aprano <steve at pearwood.info> wrote:
>On Sun, 2 Aug 2015 01:53 pm, Cameron Simpson wrote:
>> Maybe this should be over in python-ideas, since there is a proposal down
>> the bottom of this message. But first the background...
>>
>> I've just wasted a silly amount of time debugging an issue that really I
>> know about, but had forgotten.
>
>:-)
>
>
>> I have a number of modules which include a main() function, and down the
>> bottom this code:
>>
>>   if __name__ == '__main__':
>>     sys.exit(main(sys.argv))
>>
>> so that I have a convenient command line tool if I invoke the module
>> directly. I typically have tiny shell wrappers like this:
>>
>>   #!/bin/sh
>>   exec python -m cs.app.maildb -- ${1+"$@"}

TL;DR: pertinent discussion around my proposal is lower down. First I digress 
into Steven's shell query.

>I know this isn't really relevant to your problem, but why use "exec python"
>instead of just "python"?

Saves a process. Who needs a shell process just hanging around waiting? Think 
of this as tail recursion optimisation.

>And can you explain the -- ${1+"$@"} bit for somebody who knows just enough
>sh to know that it looks useful but not enough to know exactly what it
>does?

Ah.

In a modern shell one can just write $@. I prefer portable code.

The more complicated version, which I use everywhere because it is portable, 
has to do with the behaviour of the $@ special variable. As you know, $* is the 
command line arguments as a single string, which is useless if you need to 
preserve them intact. "$@" is the command line arguments correctly quoted.

Unlike every other "$foo" variable, which produces a single string, "$@" 
produces all the command line arguments as separate strings. Critical for 
passing them correctly to other commands. HOWEVER, if there are no arguments 
then "$@" produces a single empty string. Not desired. It is either a very old 
bug or a deliberate decision that no "$foo" shall utterly vanish.

Thus this:

  ${1+"$@"}

Consulting your nearest "man sh" in the PARAMETER SUBSTITUION section you will 
see that this only inserts "$@" if there is at least one argument, avoiding the 
"$@" => "" with no arguments. It does this by only inserting "$@" if $1 is 
defined. Sneaky and reliable.

I believe in a modern shell a _bare_ $@ acts like a correctly behaving "$@" 
should have, but I always use the incantation above for portability.

>> In short, invoke this module as a main program, passing in the command
>> line arguments. Very useful.
>>
>> My problem?
>>
>> When invoked this way, the module cs.app.maildb that is being executed is
>> actually the module named "__main__".
>
>Yep. Now, what you could do in cs.app.maildb is this:
>
># untested, but should work
>if __name__ = '__main__':
>    import sys
>    sys.modules['cs.app.maildb'] = sys.modules[__name__]
>    sys.exit(main(sys.argv))

Yes, but that is ghastly and complicated. And also relies on the boiler plate 
at the bottom knowing the module name.

>*** but that's the wrong solution ***

It is suboptimal. "Wrong" seems a stretch.

>The problem here is that by the time cs.app.maildb runs, some other part of
>cs or cs.app may have already imported it. The trick of setting the module
>object under both names can only work if you can guarantee to run this
>before importing anything that does a circular import of cs.app.maildb.

That can be done if it takes place in the python interpreter. But there are 
side effects which need to be considered.

My initial objective is that:

  python -m cs.app.maildb

should import cs.app.maildb under the supplied name instead of "__main__" so 
that a recursive import did not instantiate a second module instance. That is, 
I think, a natural thing for users to expect from the above command line: 
"import cs.app.maildb, run its main program".

On further thought last night I devised the logic below to implement python's 
"-m" option:

  # pseudocode, with values hardwired for clarity
  import sys
  M = new_empty_module(name='__main__', qualname='cs.app.maildb')
  sys.modules['cs.app.maildb'] = M
  M.execfile('/path/to/cs/app/maildb.py')   # you know what I mean...

The "qualname" above is an idea I thought of last night to allow introspection 
to cope with '__main__' and 'cs.app.maildb' at the same time, somewhat like the 
.__qualname__ attribute of a function as recently added to the language; under 
this scheme a module would get a __name__ and a __qualname__, normally the 
same, but __name__ set to '__main__' for the "main program module situation.

This should sidestep any issues with recursive imports by having the module in 
place in sys.modules ahead of the running of its code.

>The right existing solution is to avoid having the same module do
>double-duty as both runnable script and importable module.

I disagree. Supporting this double duty is, to me, a highly desirable feature.  
This is, in fact, a primary purpose of the present standard boilerplate.

I _like_ that: a single file, short and succinct.

>In a package,
>that's easy. Here's your package structure:
>
>cs
>+-- __init__.py
>+-- app
>    +-- __init__.py
>    +-- mailbd.py
>
>and possibly others. Every module that you want to be a runnable script
>becomes a submodule with a __main__.py file:
>
>cs
>+-- __init__.py
>+-- __main__.py
>+-- app
>    +-- __init__.py
>    +-- __main__.py
[...]

Yes, nicely separated, but massive structural overkill for simple things like 
single file modules.

>and now you can call:
>
>python -m cs
>python -m cs.app
>python -m cs.app.mailbd
>
>as needed. The __main__.py files look like this:
>
>if __name__ = '__main__':
>    import cs.app.maildb
>    sys.exit(cs.app.maildb.main(sys.argv))
>
>or as appropriate.
>
>Yes, it's a bit more work. If your package has 30 modules, and every one is
>runnable, that's a lot more work. But if your package is that, um,
>intricate, then perhaps it needs a redesign?

  [hg/css]fleet*> grep '__name__ == .__main__' cs/**/*.py|wc -l
        96

No, it is simply my personal kit. The design is ok for what it is. Pieces of it 
are slowly being published on PyPI as they become publishable (beta or better 
quality, proper distinfo metadata applied, checked to not import unpublished 
modules, not import gratuitous tissue paper modules, free of most debugging or 
off topic cruft, etc).

To be honest, the majority of those __main__ calls actually run the unit tests 
for that module, not a proper "main program". A better grep:

  [hg/css-nodedb]fleet*> grep 'main(sys.argv)' cs/**/*.py|wc -l
        14

says just 14. Far saner; those are modules/packages for which there really is 
an associated command line tool.

>The major use-case for this feature is where you have a package, and you
>want it to have a single entry point when running it as a script. (That
>would be "python -m cs" in the example above.) But it can be used when you
>have multiple entry points too.
>
>For a single .py file, you can usually assume that when you are running it
>as a stand alone script, there are no circular imports of itself:
>
># spam.py
>import eggs
>if __name__ == '__main__':
>    main()
>
># eggs.py
>import spam  # circular import
>
>If that expectation is violated, then you can run into the trouble you
>already did.

As described, that expectation was violated. In the normal course of affairs 
one rarely trips over it.

>So...
>* you can safely combine importable module and runnable script in
>  the one file, provided the runnable script functionality doesn't
>  depend on importing itself under the original name (either
>  directly or indirectly);
>
>* if you must violate that expectation, the safest solution is to
>  make the module a package with a __main__.py file that contains
>  the runnable script portion;

My proposal above is to solve this issue without requiring the breaking of a 
module into a multifile package just to address a counterintuitive edge case, 
and to avoid cognitive dissonance for Python users when they do traverse that 
edge case.

I want "python -m foo" to accomplish more closely what the naive user expects.

>* if you don't wish to do that, you're screwed, and I think that the
>  best you can do is program defensively by detecting the problem
>  after the event and bailing out:
>
>  # untested
>  import __main__
>  import myactualfilename
>  if os.path.samefile(__main__.__path__, myactualfilename.__path__):
>      raise RuntimeError

Nasty and defeatist! I rail against this mode of thought! :-)

Anyway, I'm about to raise my proposed implementation change higher up over on 
python-ideas with a plan to write a PEP if I don't get fundamental objections 
(i.e. "this breaks everything" versus your "you can work around it in these 
[cumbersome] ways").

Cheers,
Cameron Simpson <cs at zip.com.au>

"My manner of thinking, so you say, cannot be approved. Do you suppose I
care? A poor fool indeed is he who adopts a manner of thinking for others!
My manner of thinking stems straight from my considered  reflections; it
holds with my existence, with the way I am made. It is not in my power to
alter it; and were it, I'd not do so." Donatien Alphonse Francois de Sade

__main__ vs official module name: distinct module instances

main vs official module name: distinct module instances