too simple a question : forward declaration? (also, how to make Python segfault!)

Roy Smith roy at panix.com
Wed May 14 08:39:53 EDT 2003


"Helmut Jarausch" <jarausch at igpm.rwth-aachen.de> wrote:
> Yes, I thought about that, but in my C++ courses I always
> mention the example of two (recursive) functions calling
> each other.

Let me take a shot at this.  In C++, if you have two class that mention 
each other, you have problems.  When I try to compile this:

class foo {
public:
    bar *barPointer;   // this is line 3
};

class bar {
public:
    foo *fooPointer;
};

I get "circular.cc:3: syntax error before `*' token".  Why do you get a 
syntax error (different compilers may give different messages)?  Because 
at the point where the compiler read "bar", it didn't know what bar 
meant.

I can fix the problem by adding "class bar;" to the top of the file.  
This is the forward declaration you're talking about, and it tells the 
compiler that bar's the name of a class.

The compiler still doesn't know everything there is to know about bar.  
It doesn't know what methods or attributes it has, and thus knows 
nothing about how it's laid out in memory, but for the purposes of 
processing "bar *barPointer", it knows all it needs to know.  It knows 
it's a class, and it knows how to construct a pointer to a class 
regardless of the details of the class's contents.

There's an implicit promise that if the compiler will consent to be 
happy with partial knowledge (disclosure on a "need to know" basis?) for 
the time being, you'll fill in the rest of the story later.

Now, let's look at the corresponding Python situation.  Try the 
following:

class foo:
    name = "My name is foo"
    barReference = bar()

    def showBar (self):
        print self.barReference.name

class bar:
    name = "My name is bar"
    fooReference = foo()

    def showFoo (self):
        print self.fooReference.name

f = foo()
f.showBar()

If you run that, you'll get:

bash-2.05a$ ./circular1.py 
Traceback (most recent call last):
  File "./circular.py", line 3, in ?
    class foo:
  File "./circular.py", line 5, in foo
    barReference = bar()
NameError: name 'bar' is not defined

This looks a lot like the problem we had in C++.  The Python interpreter 
tries to call bar() during the execution of the "class foo" statement, 
and bar() hasn't been defined yet.  It's a little bit different because 
the problem didn't happen while parsing the code, it happened while 
*executing* the class statement.  It's a run-time error.

Anyway, the way to fix it up is to delay the execution of the call to 
create the cross-class references until after both classes have been 
defined.  My first attempt at this was:

class foo:
    def __init__ (self):
        self.name = "My name is foo"
        self.barReference = bar()

    def showBar (self):
        print self.barReference.name

class bar:
    def __init__ (self):
        self.name = "My name is bar"
        self.fooReference = foo()

    def showFoo (self):
        print self.fooReference.name

f = foo()
f.showBar()

which does something I've never managed to do in 6 years of writing 
Python code -- it crashes the interpreter with a segfault!

bash-2.05a$ ./circular2.py
Segmentation fault

I've been writing C++ code a bunch for the past couple of months.  I get 
segfaults all the time in C++, but never in Python.  Maybe some of that 
static typing compiled language type-bondage bad juju rubbed off on me?  
Nah, a few minutes of head scratching shows that it's nothing more than 
a plain old infinite recursion that eventually runs out of memory.  I'm 
surprised I didn't get a MemoryError, but that's a minor point.

The next step in the evolution was to further delay the execution of the 
call to create the other class instance, to break the recursion loop:

class foo:
    def __init__ (self):
        self.name = "My name is foo"
    
    def showBar (self):
        barReference = bar()
        print barReference.name

class bar:
    def __init__ (self):
        self.name = "My name is bar"

    def showFoo (self):
        fooReference = foo()
        print fooReference.name

f = foo()
f.showBar()

This works like I want:

bash-2.05a$ ./circular3.py
My name is bar

Now, let's take a closer look at what happened when I ran this.  The 
first thing that happened is my "class foo" statement was executed.  In 
the process of that, it executed "def showBar (self)", which has a call 
to bar().  Just like the forward reference in the C++ example gave the 
compiler enough information to keep going (without fully defining bar), 
so is the situation here.

At this point, the Python interpreter knows all it needs to know about 
bar to finish the task at hand, which is executing the def statement.
All it needs to know is that when showBar() gets called, it's going to 
call whatever bar is bound to.  It doesn't need to know that bar is a 
class.  In fact, it may not be:

class foo:
    def __init__ (self):
        self.name = "My name is foo"
    
    def showBar (self):
        barReference = bar()
        print barReference.name

def bar ():
    print "I'm a function!"

f = foo()
f.showBar()

When I run this, I get:

bash-2.05a$ ./circular4.py
I'm a function!
Traceback (most recent call last):
  File "./circular4.py", line 15, in ?
    f.showBar()
  File "./circular4.py", line 9, in showBar
    print barReference.name
AttributeError: 'NoneType' object has no attribute 'name'

The call to bar() worked perfectly fine.  Of course, it didn't return an 
object, so the subsequent print statement failed, but that's a different 
story.

The C++ compiler builds up state as it reads and processes your program, 
learning about symbols as you define them, and using that knowledge to 
process subsequent code.  This means symbols have to be defined before 
they can be referenced in code that's being compiled.

Python on the other hand pushes almost all interpretation of symbols to 
the latest possible moment.  Symbols still need to be defined (bound, if 
you want to be pedantic), but they only need to be defined by the time 
they're used in code that's being executed.

The real point is...

The solution to resolving circular data structures in C++ is the forward 
reference.  The solution in Python is delaying interpretation of symbols 
until they've had a chance to be defined at runtime.  Same problem, but 
solved differently by different tools.

While it's true you can drive a screw with a hammer if you try hard 
enough (and don't mind how ugly the result is), you'll also get very 
funny looks if you walk into a hardware store and ask for a hammer with 
a #2 Phillips head on it.




More information about the Python-list mailing list