[Tutor] inter-module global variable

Sun Mar 28 15:15:47 CEST 2010

On Sun, 28 Mar 2010 21:50:46 +1100
Steven D'Aprano <steve at pearwood.info> wrote:

> On Sun, 28 Mar 2010 08:31:57 pm spir ☣ wrote:
> > Hello,
> >
> > I have a main module importing other modules and defining a top-level
> > variable, call it 'w' [1]. I naively thought that the code from an
> > imported module, when called from main, would know about w, 
> 
> Why would it?
> 
> If you write a module M, you can't control what names exist in the 
> calling module, and you shouldn't have to. Imagine if you wrote a 
> module containing a function f, and it was imported by another module 
> also containing f, and then suddenly all your module's functions 
> stopped working! This would be a disaster:

Right, this makes sense, indeed, thank you for showing it clearly.

> > but I 
> > have name errors. The initial trial looks as follows (this is just a
> > sketch, the original is too big and complicated):
> >
> > # imported "code" module
> > __all__ = ["NameLookup", "Literal", "Assignment", ...]
> >
> > # main module
> > from parser import parser
> 
> By the way, have you looked at PyParsing? This is considered by many to 
> be the gold standard in Python parsing libraries.

Yes, I used to know pyparsing very well... (Including most of its internal arcanes. In fact I wrote several matching/parsing/processing libraries after having worked with it. This because, on one hand, I like pyparsing base approach of a "live" code grammar, but on the other hand its way of doing does not fit my brain.)

> > from code import *
> 
> This is discouraged strongly. What happens if the code module has 
> something called parser? Or len?

Yes, I know. But as you can read above, the exported names are defined (and are all classes). Also, the importing module only does one thing, which is precisely to use thoses imported names. Similarly, a grammar/parser definition imports all pattern classes (from my own matching library, like it would from pyparsing) into a module dedicated to the definition of the language. I consider this a good practice as long as it is done consciously.
There is no difference, I guess, betweeen

# "matchBox" matching module
__all__ = [all pattern class names + "Parser"]
# parser module
from matchBox import *

and

# parser module
from matchBox import <all pattern classes + Parser>

> > from scope import Scope, World
> > w = World()
> >
> > This pattern failed as said above. 
> 
> What do you mean "failed"? Nothing you show is obviously broken.

I mean the name error about 'w'.

> > So, I tried to "export" w: 
> >
> > # imported "code" module
> > __all__ = ["NameLookup", "Literal", "Assignment", ...]
> >
> > # main module
> > from parser import parser
> > from scope import Scope, World
> > w = World()
> > import code		#    new
> > code.w = w		### "export"
> > from code import *
> >
> > And this works. I had the impression that the alteration of the
> > "code" module object would not propagate to objects imported from
> > "code". But it works. 
> 
> It sounds like you are trying to write PHP code in Python.

Know about nothing about PHP.

> > But I find this terribly unclear, fragile, and 
> > dangerous, for any reason. (I find this "dark", in fact ;-) Would
> > someone try to explain what actually happens in such case? 
> 
> Yep, sounds like PHP code :)
> 
> Every function and class in a module stores a reference to their 
> enclosing globals, so that when you do this:
> 
> # module A.py
> x = "Hello world"
> 
> def f():
>     print x
> 
> 
> # module B.py
> from A import f
> f()
> => prints "Hello world" as expected.
> 
> 
> You don't have to do anything to make this work: every class and 
> function knows what namespace it belongs to.

All right.

> I can only imagine you're trying to do this:
> 
> # module A.py
> x = "Hello world"
> 
> def f():
>     print x
> 
> 
> # module B.py
> x = "Goodbye cruel world!"
> from A import f
> f()
> => prints "Goodbye cruel world!"

Hem, more or less. but the main difference is that x is not and cannot be defined in A (except for a fake initialisation to None, like for an instance var). It's actual value can only come from the module that import A, here B.

> This is bad design. You might think you need it, but in the long run you 
> will regret it. You are mixing up arguments and globals. If you want 
> the result of f() to depend on the local value of x, then make it take 
> an argument:
> 
> def f(x):
>     print x
> 
> and call it:
> 
> f(x)
> 
> 
> http://c2.com/cgi/wiki?GlobalVariablesAreBad
> http://discuss.joelonsoftware.com/default.asp?design.4.249182.18

Gonna follow the links as soon as I have time.

> > Also, why 
> > is a global variable not actually global, but in fact only "locally"
> > global (at the module level)? It's the first time I meet such an
> > issue. What's wrong in my design to raise such a problem, if any?
> 
> In Python, that is a deliberate choice. All globals are deliberately 
> global to the module. The closest thing to "globally global" is the 
> builtins namespace, which is where builtins like len, map, str, etc. 
> are found.
> 
> Any design which relies on modifying global variables is flawed. Global 
> variables are a poor design:
> 
> http://weblogs.asp.net/wallen/archive/2003/05/08/6750.aspx

In the general case for sure. I 100% agree, and don't do it. But what if what I model is precisely something that must have a global. Like a language. Python has globals, in which every name undefined in locals (or other englobing scopes) is silently looked up. Mine instead has a thing called 'world' that explicitely holds references to top-level things. This is this world I need to pass to code objects (via their classes).

> Slightly better than global variables is a design where you use a module 
> or class as a namespace, put all your globals in that namespace, then 
> pass it to your other classes as an argument:
> 
> class SettingsNamespace:
>     pass
> 
> settings = SettingsNamespace()
> settings.x = 42
> settings.y = 23
> settings.z = "magic"
> 
> instance = MyOtherClass(a, b, c, settings)
> 
> This is still problematic. For example, if I change settings.x, will the 
> result of MyOtherClass be different? Maybe, maybe not... you have to 
> dig deep into the code to know which settings are used and which are 
> not, and you never know if an innocent-looking call to a function or 
> class will modify your settings and break things.

***
This is exactly analog to my model. And I cannot imagine a better way to do it (both in design of the language and in implementation). The only difference, which cause my issue, is that (*) the chunks of python code that use world are not in the same module.
So, to make the paralell more accurate, how would you do it if MyOtherClass were defined in a separate module?
And what about pointers to 'settings' in every instance of MyOtherClass? (wasting both time and space)?
***
(*) Not only for better code organisation. It could be a shared module, even an external library.

> > My view is a follow: From the transparency point of view (like for
> > function transparency), the classes in "code" should _receive_ as
> > general parameter a pointer to 'w', before they do anything.
> 
> Yes, this is better than "really global" globals, but not a lot better.

Right :-)

> > In other 
> > words, the whole "code" module is like a python code chunk
> > parameterized with w. If it would be a program, it would get w as
> > command-line parameter, or from the user, or from a config file.
> > Then, all instanciations should be done using this pointer to w.
> > Meaning, as a consequence, all code objects should hold a reference
> > to 'w'. This could be made as follows:
> 
> If every code object has a reference to the same object w, that defeats 
> the purpose of passing it as an argument. It might be local in name, 
> but in practice it is "really global", which is dangerous.

???
I think we must speak more accurately.
World  (what I called 'w') represents "what is defined" by a program written in the language my code interprets. It could be eg the game of the secret number and then world would hold (references to) only 2 things, namely the player and the master. Right? (The same as in python, except that (1) there is no "magic" globals, instead there is explicite world (2) one doesn't need a class to create a given thing).
The "code" module holds types of code objects (eg an assignment). When such an ibject is created and "run", it will most often read and/on write inside world. For this module itself, world is and can only be a variable, more precisely an input; it cannot be locally defined (created) in the module itself, but it needs to be known there.
For every run, a world is passed to the module, it's always a variable, never defined locally. But during a given run, not only there's a single, unique, world shared by everybody, but it never changes (its reference is constant).

So, what do you mean "that defeats the purpose of passing it as an argument". If ever it is "dangerous", then how could it be else?
The only alternative to having refs to it in the top class, in every class, in every code instance object, is, I guess, to have it as a global of the module. (What I do in the meantime, only because this avoids having refs everywhere).

> > # main module
> > import code
> > code.Code.w = w
> 
> Why not just this?
> 
> code.w = w

Yo, this is what I do as of now.

> And where does w come from in the first place? Shouldn't it be defined 
> in code.py, not the calling module?

Hem, I don't think it's possible, except if merging the code module with one or more other ones (including the main module of the parser or interpreter). I guess not only "code" needs a reference to 'w'. But I may be wrong, I must examine this point more closely. Not sure.
Maybe see also PS if you have time.

> > # "code" module
> > class Code(object):
> >     w = None	### to be exported from importing module
> 
> That sets up a circular dependency that should be avoided: Code objects 
> are broken unless the caller initialises the class first, but you can't 
> initialise the class unless you import it. Trust me, you WILL forget to 
> initialise it before using it, and then spend hours trying to debug the 
> errors.

Right. But how else can I do? And how can I check? Analogy: A class requires every instance to have an x. This can be done with a required param in __init__. Then, how can I require a module or a class itself to be properly initialised?
[The module itself should be an instance of something requiring an init! Or the class should be an instance of a metaclass requiring it? For now, I prefere to avoid playing with metaclasses.]

> >     def __init__(self, w=Code.w):
> >         # the param allows having a different w eg for testing
> >         self.w = w
> 
> This needlessly gives each instance a reference to the same w that the 
> class already has. Inheritance makes this unnecessary. You should do 
> this instead:
> 
> class Code(object):
>     w = None  # Better to define default settings here.
>     def __init__(self, w=None):
>         if w is not None:
>             self.w = w
> 
> If no w is provided, then lookups for instance.w will find the shared 
> class attribute w.

Right. But maybe it's better to have it at the module level if/when I don't need testing variants?

> [...]
> > But the '###' line looks like  an ugly trick to me. (Not the fact
> > that it's a class attribute; as a contrary, I often use them eg for
> > config, and find them a nice tool for clarity.) The issue is that
> > Code.w has to be exported. 
> 
> It is ugly, and fragile. It means any caller is *expected* to modify the 
> w used everywhere else, in strange and hard-to-predict ways.

Then, how to do it?

Anyway, thank you very much again, Steven (I really appreciate your replies, they help me thinking :-).

Denis

PS: Actually the parse tree builds a fully abstract representation of the source which relevant node are code objects. Each code type has both "execute" and "code" methods.

When the parser itself runs in mode ON, execute() methods run (eg a code object representation an assignment will create and put a new symbol in world, from running execute methods of its target and expression child nodes). This is like an interpreter.
Only in this case I need a world:
   if parser.MODE is ON: world = World()
   # --> and tell this to all modules that need it

When in OFF mode, nothing runs but the representaton exists as well. Printing the top node's code() should (it's still in progress) print the object code via a cascade of calls to child nodes' code(). This is like a compiler ;-) (But the object language is python!)
    objectFile.write(parser.parse(source).code()) # ==> translation into python

Comments welcome.
I know this may look strange, but using my lib it is easy. I just need match action that pass all relevant info when creating code objects. So, why not? (except for performance, sure, but as of now I don't mind, I'm learning the domain.)
________________________________

vit esse estrany ☣

spir.wikidot.com