How does pydoc parse code?

Fri Oct 17 19:14:09 EDT 2003

On Fri, 17 Oct 2003 11:34:05 GMT, Michael Hudson <mwh at python.net> wrote:

>greg at conifold.math.ucdavis.edu (Greg Kuperberg) writes:
>
>> I plan to use pydoc for my Python project.  After looking through the
>> standard documentation, I am not sure how pydoc interprets its input.
>> In its basic operation it evidently looks at the first string literal in
>> the module and in each function definition.  But there is more to the
>> story than that, obviously.  What other vestigial code does it detect?
>> Every string literal?  Variables of the form __xxx__, I gather?  Which of
>> these variables have a special meaning?  How does it divide the initial
>> string literal into the "name" and "description" sections?  What other
>> directives can I send to pydoc to alter its presentation?
>
>It would be nice if there was a concise, simple place in the
>documentation I could point you to to answer that question.
>
>Alas, it does what it does, and that's about all that can be said.
>
>Oh, and it *doesn't* parse the module: it imports and then introspects
>it.

I wonder if we shouldn't take that seriously. E.g., check the file's md5
against a set (or dict, so you could ask why if a name change) of known safe
module sources and issue a (untested)

    if not raw_input("""
WARNING: Module is not known to system.
There is a SECURITY RISK in proceeding, because pydoc executes the module
by importing it to make the doc info available to it via inspect.

Proceed anyway? (only typing exactly "Yes" w/o quotes will proceed): """
    )=='Yes': raise SystemExit, 'Unsafe pydoc inspection abandoned by user.'

if not found. You could also give the user the option to declare a given module
trusted by having the md5 set persist in site info, for convenience.

Regards,
Bengt Richter