[Python-ideas] start, test, init

Sun Dec 1 13:45:52 CET 2013

Hello,

This is a proposal and opinion I have thought at sending for a long time 
already, but did not because it is not exactly proper to Python (there are a few 
more of this category ;-). What decided me is the thread about the idiom "if 
__name__ == '__main__':", which had much success! Below a short summary or my 
views, followed by a longer series of comments.

Summary:
There should be 3 top-level functions predefined by the language:
* 'start' : start func for a module run as an app (or 'main')
* 'test'  : main func of a module's test suite
* 'init'  : module init code, rather when imported
In my view, all 3 correspond to clearly definite & distinct functionalities. 
Each of them provides clarity and all together permit getting rid of "lost code" 
roaming around at the top-level of modules; which I dislike, even for scripts; 
however, the proposal does not force anyone to follow such a style. Every module 
using such functions is then a set of definitions: assignments, def's, classes, 
plus imports and such. Possibly 1, 2, 3 of them are language-defined main 
functions. Very clean...

The execution logics would be:
* if imported module:
     ~ run init if any
     ~ else, run nothing
* if executed module:
     ~ run test, if any (see below)
     ~ else, run start, if any
     ~ else, run nothing (error?)

What do you think?

=== why start? ===

It looks nice to have hello world reduced to:

     print("Hello, world!")

However, this is a very superficial nicety. I prefere the (very big!) 
complication of:

     def start():
         print("Hello, world!")

This code is self-understanding even for a novice programmer and nicely 
introduces some structural feature of the language. [0]

This function 'start' would have an optional argument for command-line args; it 
may indeed also return an exit code.

About the name 'start', well, it's a question of meaning; and to avoid confusion 
I'd rather reserved 'main' for a package's main module. Also, every of these 3 
funcs is a 'main' one, depending on the actual case. But I would *not* fight on 
this point, call it 'main' if you like. (In assembly, it's commonly start as 
well: both ends of the expressivity scale meet here ;-).

When compiling with start, in standard the language may also automagically strip 
out development or control instructions, like assertions. (There also may be a 
differenciated debug print command... but that's another point. Also different 
error messages for end-users.)

=== why test? ===

If you practice testing (by code), you know why. Nearly all of my modules end up 
ending with:

# ===   t e s t   =======================

def test_abc ():
     ...
def test_def ():
     ...
def test_ghi ():
     ...
...
def test ():
     test_abc()
     test_def()
     test_ghi()
     ...
if __name__ == '__main__':
     test()

Then, I comment out the last 2 lines when all runs fine; or else, if it's the 
main module of an app, I replace it with a call to start.

This is a second form or running a module as a stand-alone prog (from the 
command-line). Actually, I guess, anyone practicing a minimum of testing may 
so-to-say press the 'test' button constantly, far more often than we lauch an 
app in usage or trial mode (if ever it's an app). We switch to normal usage 
execution once only per development phase, when all is fine and we prepare a 
user release. This is why, in standard, when both exist, 'test' has precedence 
over 'start'. There may be a builtin config var (eg __main__) to set (eg 
__main__ = start), or any other to switch to start (but preferably from code, 
see also [1]).

This function 'test' _may_ have an optional argument for command-line args; it 
may also return an exit code, here meaning test success / failure (why not 
number of failures?). Args may be handy to drive testing differently: exact 
funcs to run, form of output, depth of testing..., in a purely user-defined way 
(no language-defined meaning, else we'd never stop arguing on the topic; a 
typical bikeshed issue). [1]

=== why init? ===

There is much less use, in my personal practice, for such an init function 
typically run when a module is imported. (I know it from the D language.) But I 
guess whenever we need it, having it is very nice. I simulate it (1) to init 
program elements from external data (2) to import, scan, process, big data files 
like unicode tables, avoiding huge code-data files and/or security issues (3) 
for a usage similar to compile-time computations in static langs that provide 
that. Anyway, it is a clearly defined functionality, just like start & test.

=== flexibility ===

An init func may be run by test or start, conditionnally or not.
A test func may be run by init or start; maybe self-testing just once on first 
launch of a just-installed app or package.

=== alternative ===

Apart from clarity and practicality (see below), such flexibility is also why I 
prefere such predefined function names to a proposal by Steven D'Aprano, on the 
mentionned thread, of builtin conditionals like:
if __main__:
     ...
if is_main():
     ...
if is_main:
     ...

We could however trivially extend this proposal with eg is_tested & is_imported 
builtin conditionals. Why not chose the direct simplicity of builtin func names, 
though? This is a common Python usage (it would also introduce novices to this 
idea of special func names, then they are ready for "magic methods").

Another issue is practicle: writing or not such a function, or chosing between 
test and start, is something we control from code itself. To launch tests for 
instance, there is thus no need to run a module with a special command-line 
option, or to modify it in our favorite programming editor's settings, or to 
change environment variables or python config files... all pretty annoying 
things to do (and I never remember which is to be done, where, how exactly...). 
Instead, predefined conditionnals depend on such settings or data external to 
the code. [2]

=== why simple names? ===

Again, I would not fight for simple names; it may well in fact be preferable to 
have weird names with underscores. In my view, the need in python for names like 
__all__ or __str__ is due to the fact that there is no difference in the 
language between defining and redefining a symbol. If Python used eg ':=' to 
redefine symbols, then:

foo = ...	# error, symbol 'foo' already exists
foo := ...	# ok, intentional redefinition of builtin 'foo'

Waiting for this, it may indeed be preferable to have weird names.

=== trial mode ===

There is one mode not covered by this: what I call trial, as opposed to in-code 
tests. This corresponds more or less to a beta-version: a simulation of normal 
usage by developpers or (power-)users, to further find bugs or more generally 
control the software as is. I don't know of cases where this would require a 
special dedicated main func: this is rather opposite to the idea of checking the 
program really as is. But there may be code parts interpreted differently  or 
conditionnally compiled, differently from the case of an actual user release: 
assertions, debug prints, stats, benchmarking, profiling... all kinds of 
"meta-programming" issues.

=== all together ===

''' d o c
'''

import foo
__all__ = ...

code...
code...
code...

def test_abc ():
     ...
def test_def ():
     ...
def test_ghi ():
     ...
...
def test ():
     test_abc()
     test_def()
     test_ghi()
     ...

def init ():
     ...

# __main__ = start
def start (args):
     init()
     # process args
     ...

What do you think? (bis)

Denis

[0] Python may lack such higher-level code structure, more generally; this is an 
open question.

[1] This feature is for command-line args fans, but I prefere to control testing 
from code itself. For information, I have 2 std test config params, which I 
define in code; whether to:
* write successful checks also, for comparison
* continue on check failures, or else stop
(They affect a custom check func, similar to an assert, used for testing only.) 
Turning them to off transforms a module's diagnosis test suite into (part of) a 
regression test suite.

[2] More generally, I want to control all what concerns code and its development 
from code itself. Command-line args and such (better command-line args only) are 
good only for very special needs like producing doc, a parse-tree, compile-only, 
profiling, etc. I definitely dislike languages that force people to constantly 
change the command line to compile/run or worse change env vars or edit config 
files (and I don't even evoque makefiles and such...).