[Doc-SIG] An 'apropos' utility for documentations

Wed, 9 Dec 1998 18:28:32 +0000 (GMT)

On Mon, 7 Dec 1998, Fred L. Drake wrote:

> asking for a more elaborate form of the traditional apropos command,

No, and because of a very simple reason. Although Linux apropos
gives you a vague idea of any term. It is never enough when
you need precise information. It's a good starting point and
that's all. And I think maintainers of the programs are
responsible of feeding the apropos database. So I think I
could do/gain very little if all the rest of the programers
of the world don't pay more attention in supplying good
apropos information in their programs. Example, if the programmers
of 'fetchmail' don't say to apropos what 'fetchmail' is, then
I can't do anything at all with a better apropos tool.

> or are you looking for an apropos that operates on the Python library?
> If the former, I can see it taking the form of an advanced
> manual-searching interface, hopefully tied in (somehow) with the
> standard man/apropos system.

Well , this is the first point, although Python Library 
documentation is very good. It can be better, and all the
information related with Python distribution can be better.

The idea is as simple as:
Imagine I want to do a communications program with sockets. 
The documentation is straight forward: Module Sockets, Module
selection.

Or not? There's a Module called SocketServer, and ftplib, telnetlib
has another examples, and in Sun, for example, it may be 
another types of sockets. And in some contrib directory 
or in a FAQ, it could be related info that could be interesting
for my problem.

So , instead of reading a great deal of documentation, scanning
another bigger deal and being suspicious about some hidden information
in some FAQ or lib or module. Why not to make the information
reveal itself ?  It's not a matter of more comments or more decoration
to the documentation. It's like all the information starts to say:
"Hei, I'm a sockets-related information, I'm waiting for you!"

For example, let's  take a look at ftplib.py :
The second line says :... RFC 959...
Another one says: ftp.login,
 a.o.s.: python .... localhost...
aos: import socket 
aos: SOCK_STREAM
aos: gethosbyname
aos: netrc
aos: macros
aos: sys.argv 
....

Fortunately, I'm a lamer, and a real newbie in most things. So
I can enjoy certain pleasures that most wizards enjoyed long ago:
- to learn new things.

So, perhaps that list of words says very little to you , or perhaps
you find it logical ( it's a ftplib , what do you expect ? )
Well, I can see in this list the following : 
RFC 959 is related to ftp. So If I find in a email RFC 959, or 
I do want to know what RFC is the ftp RFC. That information would be
interesting in both ways. 

The next thing: ftp.login   says several things: 
ftp is another user of the system
ftp.login seems the logical way of a 'low level' ftp.

Then you see 'localhost'. It's incredible but a newbie doesn't know
that his machine is a ftp server too!! And he doesn't know that 
localhost is the natural test room for his ftp-scripts!! I'd say
more, a newbie doesn't know what's the use of localhost !!! 

Next, I see sockets . Well obviously ftp are a communications utility
,and all of them are based on sockets . Not so obvious for the real
newbies. And even for the wizard that is looking for some example
of sockets, this information could be a great reminding. Much more
than many may think. 

SOCK_STREAM is a good piece of information that tell us :
"Hei, in ftplib you've got an example of SOCK_STREAM ". Think in
the reverse problem: You're looking for an example of SOCK_STREAM
and you start doing greps here and there.

As you see, a lib is not only a collection of useful routines, 
it's a source of information and examples.

But , the system I suggest goes further. Imagine we do in python FAQ
the  very same we've done with ftplib. So we got a library-FAQ apropos
system.  And
we do a 'search' for 
sockets: Now it'll appear information of problems people had with
sockets, solutions, examples, and related stuff in the library.
This sounds comprehensive. And if you keep a disgested file of emails
of comp.python.org for example , and you have that file attributed
with those keywords. You can benefit as library developer, not 
only of feedback , but also reducing drastically the time you spend
reading email. And supplying a knowledge database very interesting
for anyone. 

Well, let's resume point 1:

- The information is there, it's rich , but it's dispersed !

Now , let's go to point 2: 

Imagine now, that you want to know  how python deals with multitasking .
Uhmm, that's rather general . So you can't use the structure 
of FAQ, neither the HTML structure of documentation  of the libraries.
So , smile, you'll have to read all python documentation !!

Or ...
If when you visited Module Select , marked there 'asyncronous-calls',
'multitasking',... and when you visited SocketServer you marked 'fork',
'thread','multitasking',.... and when you visited Thread you mark
'multitasking'.  So what do you see? Well, It's like the 'information'
could be seen in several ways.

Another example, python allows arbitrary functions arguments, well
it says so in the tutorial and it gives an example , but as a matter
of fact in the modules you can find more and more examples, perhaps
richer or closer to what you want!

A question , do you know where is an example of polynomials in the
lib? Yes, in zmod !

But, in fact, when somebody does a program that does something.
Perhaps it's far more interesting for the community to see how
he's solved certain problems, that is, instead of the documentation
it'd be far more interesting see how he's focused sockets or GUI.

So, let's resume point 2:

- It doesn't matter Latex , HTML, XML,... always the information
will be HARDWIRED in some sort of way. People will want another
perspectives.

Now, let's go to point 3: 

Well, what's the system you are talking about ? :)

This is the hard part of this letter. And it has to do with 
linguistics. Yeah, they're semas ( in Spanish they're semas ).
If we have this sentence:
I'll go to the cinema tomorrow
or
I'm going to the cinema very soon .

The  sema decomposition will be: 
me go cinema future 

A sema is a unit of meaning. A very simple meaning , usually 
non-divisible, but not always. 
Attributing pieces of source with semas is what I mean. Supply meaning
to the sources. 
Semantics and semas are simply the universal HTML,XML, ... It's 
the universal reference for humans. You don't need pointers, nor links
nor sections,... just sets of semas attributing pieces of code. 
The final look of the information should depend on the required 
semas. 

What I'm saying is not new: indices, apropos, yahoo, man are a kind of 
sema dealers. At a very low level  I admit. 

Point 4: 

Is this craziness? Is this programmable ? Can it be done in an
automatic way ? Where are the limits of semantics ?

Let's start with the upper limits of semantics documentation, they're
these:
You sit in front of the computer, you tell him what you want, 
he consults his semantic knowledge database, extracts the related info and
it does the program ! ( Please don't mention me Godel, nor Turing ,
nobody is going to do such weird computations !! )

The lower limit is the unix apropos : The programer tells the system
: " hei system, you've got a new item related to sema 'graphics'",
for example. 

Well, a step further in apropos model would be to include in python
code such semas, in a fixed and structured way. For example :
Keyword : sockets, multitasking, nice-list-handling 
So when they're parsed they can be included in a database. 

Uhmm, is nice-list-handling a sema? If so how am I going to search
for nice-list-handling ?

Well, as a matter of fact, semas are the higher meta-information
man can think of ( because thinking is handling language ( semas) ,
 ) so nice-list-handling can be divided into simpler semas :
nice ,list,handling   . This division can take place in a dictionary
or analizing the structure of the sema . 

But, perhaps this system's far more interesting when you generate
the semas in an automatic way. Yes, you only have to scan files searching
for certain patterns . And the most you apply AI tecniques to 
attribute pieces of code , the most automated information you got. 

As a matter of fact yahoo does this .

Finally, the trick of all this is to find the appropiate semas .
And the appropiate patterns .
But even this is rewarding. Because you have a very literal description
( in the form of semas ) of the semantic world of python: Its faults
and its strongholds. The same than the several words in Inuktitut
(Eskimo ) for snow, but applied in computer sciences. 

In any case, any effort in this field pays, the information is 
getting bigger and bigger everyday and hardwiring methods of 
structuring are not enough.

Regards/Saludos
Manolo
-------------
My addresses / mis direcciones: 

www.ctv.es/USERS/irmina ---> lritaunas peki project/proyecto in python

www.ctv.es/USERS/irmina/pyttex.htm ---> page of spanish users of latex
                              / pagina de usuarios en espanyol de latex
www.ctv.es/USERS/irmina/texpython.htm --> page of drawing utility for tex 
                         / pagina de utilidad de dibujo para Latex

"...abandoneis el campo y vuestras casas y acudais a defender el
mar y la ciudad...no lamentarse por las casas o la tierra, sino por
las vidas humanas, pues aquellas no nos proporcionan hombres, 
sino los hombres aquellas "-Pericles