[Doc-SIG] Finding cannonical names for objects

Edward D. Loper edloper@gradient.cis.upenn.edu
Sat, 21 Apr 2001 16:19:29 EDT


When writing a documentation tool, it would be nice to be able to
figure out what the "parent" of an object, where by parent, I mean:

  - for a module in a package, its package
  - for a function or class, the module it was originally defined in
  - for a member function, its class

Among other things, this is useful for trying to establish a unique
"cannonical" name for something that we're documenting, so we can
make sure that inter-documentation pointers are correct (e.g., if
we're converting docs to HTML).

However, it's not clear how to do this in several cases.  The cases
where it *is* stright-forward to do it are:

  - for non-builtin modules, extract the package information from
    the __name__ field.  Will this work for built-in packages, too?
    What's an example of a built-in package?
  - for non-builtin classes, consult the __module__ field
  - for non-builtin member functions, consult the im_class field
  - built-in classes all seem to have a __module__ field (e.g.,
    exception.Exception or sys.last_type).  Is this always true?

In the case of non-builtin functions, I can think of two ways to do 
it::

    def find_function_module_1(func):
        for module in sys.modules.values():
            if func.func_globals == module.__dict__:
                return module.__name__
        raise ValueError("Couldn't find the module for this function")

    def find_function_module_2(func):
        from os.path import basename, splitext
	try:
            return splitext(basename(inspect.getabsfile(func)))[0]
        except:
	    raise ValueError("Couldn't find the module for this func")

Is one of these approaches preferable?  Will they ever give different
results?  Is there a reason that non-builtin functions don't have a
__module__ field, like classes do?  (Or a reason that built-in methods
*do* have the __module__ field?)

The other difficult cases are built-in objects.  In general, I don't
see any way to get parents for built-in objects.  The relevant
built-in objects that I know of are:

  - built-in functions (e.g., len, min, sys.settrace)
  - built-in methods (e.g., [].append, file(...).read)
  - non-builtin methods with underlying builtin functions (e.g.,
    Exception.__str__)

Is there any way to get the "parents" for these objects?  (It would be
*nice* if doctools could process built-in objects as well as
non-builtin ones.)  

Another possible approach to finding cannonical names for objects is
to use their ids (as returned by the builtin function id()).  This
wouldn't be as nice, since it would result in basically arbitrary
names, but at least everything we document could be given a unique,
cannonical name (within a given session).  But I'm somewhat confused
about id().  In particular, it seems to return a value for integers..
But since the returned value is an integer, it seems like that implies
that at least 2 *different* values will have the same id..  Am I
missing something?  Is there somewhere I can read about what
guarantees are given about whether two values' ids will be different?
(e.g., if a value is GC'ed, can its id be recycled?  I assume yes..)

-Edward

p.s., Is there a reason that __builtins__.__name__ == '__builtin__'
instead of '__builtins__'?