From Fred L. Drake, Jr." References: Message-ID: <13932.23646.942012.824759@weyr.cnri.reston.va.us> Manuel Gutierrez Algaba writes: > Well, the idea is to do an apropos command ( written in python > of course) containing all the 'concepts' and tips related with > the documentation of a program. Sorry for not responding to this sooner. I'm not sure if I understand what you want very well. Are you asking for a more elaborate form of the traditional apropos command, or are you looking for an apropos that operates on the Python library? If the former, I can see it taking the form of an advanced manual-searching interface, hopefully tied in (somehow) with the standard man/apropos system. If you're looking for an apropos that operates on the Python documentation, that's something for which support could be added to the logical markup of the documentation to some degree, and then an external utility could be used to build and query the database. This is certainly something we can consider as the source form of the documentation moves from LaTeX to SGML. Please elaborate / clarify on your idea; I'm interested! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From da@skivs.ski.org Mon Dec 7 23:04:18 1998 From: da@skivs.ski.org (David Ascher) Date: Mon, 7 Dec 1998 15:04:18 -0800 (Pacific Standard Time) Subject: [Doc-SIG] An 'apropos' utility for documentations In-Reply-To: <13932.23646.942012.824759@weyr.cnri.reston.va.us> Message-ID: > Manuel Gutierrez Algaba writes: > > Well, the idea is to do an apropos command ( written in python > > of course) containing all the 'concepts' and tips related with > > the documentation of a program. > > Sorry for not responding to this sooner. > I'm not sure if I understand what you want very well. Are you > asking for a more elaborate form of the traditional apropos command, > or are you looking for an apropos that operates on the Python library? > If the former, I can see it taking the form of an advanced > manual-searching interface, hopefully tied in (somehow) with the > standard man/apropos system. > If you're looking for an apropos that operates on the Python > documentation, that's something for which support could be added to > the logical markup of the documentation to some degree, and then an > external utility could be used to build and query the database. This > is certainly something we can consider as the source form of the > documentation moves from LaTeX to SGML. > Please elaborate / clarify on your idea; I'm interested! FYI: if I understand the idea, it's similar to something I did a long while back using simple tools on Unix (the .info version of the manuals, and spawing the TTY version of 'info' on them, w/ a little hacked-up index which mapped words to GNU info nodes). See the code at: http://starship.skyport.net/~da/ihelp/. Having the markup in the doc would make that kind of project maintainable in the long run (the reason why I haven't updated 'ihelp' in years). --david From Manuel Gutierrez Algaba Wed Dec 9 18:28:32 1998 From: Manuel Gutierrez Algaba (Manuel Gutierrez Algaba) Date: Wed, 9 Dec 1998 18:28:32 +0000 (GMT) Subject: [Doc-SIG] An 'apropos' utility for documentations In-Reply-To: <13932.23646.942012.824759@weyr.cnri.reston.va.us> Message-ID: On Mon, 7 Dec 1998, Fred L. Drake wrote: > asking for a more elaborate form of the traditional apropos command, No, and because of a very simple reason. Although Linux apropos gives you a vague idea of any term. It is never enough when you need precise information. It's a good starting point and that's all. And I think maintainers of the programs are responsible of feeding the apropos database. So I think I could do/gain very little if all the rest of the programers of the world don't pay more attention in supplying good apropos information in their programs. Example, if the programmers of 'fetchmail' don't say to apropos what 'fetchmail' is, then I can't do anything at all with a better apropos tool. > or are you looking for an apropos that operates on the Python library? > If the former, I can see it taking the form of an advanced > manual-searching interface, hopefully tied in (somehow) with the > standard man/apropos system. Well , this is the first point, although Python Library documentation is very good. It can be better, and all the information related with Python distribution can be better. The idea is as simple as: Imagine I want to do a communications program with sockets. The documentation is straight forward: Module Sockets, Module selection. Or not? There's a Module called SocketServer, and ftplib, telnetlib has another examples, and in Sun, for example, it may be another types of sockets. And in some contrib directory or in a FAQ, it could be related info that could be interesting for my problem. So , instead of reading a great deal of documentation, scanning another bigger deal and being suspicious about some hidden information in some FAQ or lib or module. Why not to make the information reveal itself ? It's not a matter of more comments or more decoration to the documentation. It's like all the information starts to say: "Hei, I'm a sockets-related information, I'm waiting for you!" For example, let's take a look at ftplib.py : The second line says :... RFC 959... Another one says: ftp.login, a.o.s.: python .... localhost... aos: import socket aos: SOCK_STREAM aos: gethosbyname aos: netrc aos: macros aos: sys.argv .... Fortunately, I'm a lamer, and a real newbie in most things. So I can enjoy certain pleasures that most wizards enjoyed long ago: - to learn new things. So, perhaps that list of words says very little to you , or perhaps you find it logical ( it's a ftplib , what do you expect ? ) Well, I can see in this list the following : RFC 959 is related to ftp. So If I find in a email RFC 959, or I do want to know what RFC is the ftp RFC. That information would be interesting in both ways. The next thing: ftp.login says several things: ftp is another user of the system ftp.login seems the logical way of a 'low level' ftp. Then you see 'localhost'. It's incredible but a newbie doesn't know that his machine is a ftp server too!! And he doesn't know that localhost is the natural test room for his ftp-scripts!! I'd say more, a newbie doesn't know what's the use of localhost !!! Next, I see sockets . Well obviously ftp are a communications utility ,and all of them are based on sockets . Not so obvious for the real newbies. And even for the wizard that is looking for some example of sockets, this information could be a great reminding. Much more than many may think. SOCK_STREAM is a good piece of information that tell us : "Hei, in ftplib you've got an example of SOCK_STREAM ". Think in the reverse problem: You're looking for an example of SOCK_STREAM and you start doing greps here and there. As you see, a lib is not only a collection of useful routines, it's a source of information and examples. But , the system I suggest goes further. Imagine we do in python FAQ the very same we've done with ftplib. So we got a library-FAQ apropos system. And we do a 'search' for sockets: Now it'll appear information of problems people had with sockets, solutions, examples, and related stuff in the library. This sounds comprehensive. And if you keep a disgested file of emails of comp.python.org for example , and you have that file attributed with those keywords. You can benefit as library developer, not only of feedback , but also reducing drastically the time you spend reading email. And supplying a knowledge database very interesting for anyone. Well, let's resume point 1: - The information is there, it's rich , but it's dispersed ! Now , let's go to point 2: Imagine now, that you want to know how python deals with multitasking . Uhmm, that's rather general . So you can't use the structure of FAQ, neither the HTML structure of documentation of the libraries. So , smile, you'll have to read all python documentation !! Or ... If when you visited Module Select , marked there 'asyncronous-calls', 'multitasking',... and when you visited SocketServer you marked 'fork', 'thread','multitasking',.... and when you visited Thread you mark 'multitasking'. So what do you see? Well, It's like the 'information' could be seen in several ways. Another example, python allows arbitrary functions arguments, well it says so in the tutorial and it gives an example , but as a matter of fact in the modules you can find more and more examples, perhaps richer or closer to what you want! A question , do you know where is an example of polynomials in the lib? Yes, in zmod ! But, in fact, when somebody does a program that does something. Perhaps it's far more interesting for the community to see how he's solved certain problems, that is, instead of the documentation it'd be far more interesting see how he's focused sockets or GUI. So, let's resume point 2: - It doesn't matter Latex , HTML, XML,... always the information will be HARDWIRED in some sort of way. People will want another perspectives. Now, let's go to point 3: Well, what's the system you are talking about ? :) This is the hard part of this letter. And it has to do with linguistics. Yeah, they're semas ( in Spanish they're semas ). If we have this sentence: I'll go to the cinema tomorrow or I'm going to the cinema very soon . The sema decomposition will be: me go cinema future A sema is a unit of meaning. A very simple meaning , usually non-divisible, but not always. Attributing pieces of source with semas is what I mean. Supply meaning to the sources. Semantics and semas are simply the universal HTML,XML, ... It's the universal reference for humans. You don't need pointers, nor links nor sections,... just sets of semas attributing pieces of code. The final look of the information should depend on the required semas. What I'm saying is not new: indices, apropos, yahoo, man are a kind of sema dealers. At a very low level I admit. Point 4: Is this craziness? Is this programmable ? Can it be done in an automatic way ? Where are the limits of semantics ? Let's start with the upper limits of semantics documentation, they're these: You sit in front of the computer, you tell him what you want, he consults his semantic knowledge database, extracts the related info and it does the program ! ( Please don't mention me Godel, nor Turing , nobody is going to do such weird computations !! ) The lower limit is the unix apropos : The programer tells the system : " hei system, you've got a new item related to sema 'graphics'", for example. Well, a step further in apropos model would be to include in python code such semas, in a fixed and structured way. For example : Keyword : sockets, multitasking, nice-list-handling So when they're parsed they can be included in a database. Uhmm, is nice-list-handling a sema? If so how am I going to search for nice-list-handling ? Well, as a matter of fact, semas are the higher meta-information man can think of ( because thinking is handling language ( semas) , ) so nice-list-handling can be divided into simpler semas : nice ,list,handling . This division can take place in a dictionary or analizing the structure of the sema . But, perhaps this system's far more interesting when you generate the semas in an automatic way. Yes, you only have to scan files searching for certain patterns . And the most you apply AI tecniques to attribute pieces of code , the most automated information you got. As a matter of fact yahoo does this . Finally, the trick of all this is to find the appropiate semas . And the appropiate patterns . But even this is rewarding. Because you have a very literal description ( in the form of semas ) of the semantic world of python: Its faults and its strongholds. The same than the several words in Inuktitut (Eskimo ) for snow, but applied in computer sciences. In any case, any effort in this field pays, the information is getting bigger and bigger everyday and hardwiring methods of structuring are not enough. Regards/Saludos Manolo ------------- My addresses / mis direcciones: www.ctv.es/USERS/irmina ---> lritaunas peki project/proyecto in python www.ctv.es/USERS/irmina/pyttex.htm ---> page of spanish users of latex / pagina de usuarios en espanyol de latex www.ctv.es/USERS/irmina/texpython.htm --> page of drawing utility for tex / pagina de utilidad de dibujo para Latex "...abandoneis el campo y vuestras casas y acudais a defender el mar y la ciudad...no lamentarse por las casas o la tierra, sino por las vidas humanas, pues aquellas no nos proporcionan hombres, sino los hombres aquellas "-Pericles From Fred L. Drake, Jr." References: <13932.23646.942012.824759@weyr.cnri.reston.va.us> Message-ID: <13934.49360.77242.695000@weyr.cnri.reston.va.us> Manuel Gutierrez Algaba writes: > No, and because of a very simple reason. Although Linux apropos [Really long explanation elided.] > structuring are not enough. Manuel, I think I see what you're looking for. (For context: I have studied traditional information retrieval, but not natural language processing approaches.) Let me try to boil down what you've described to a (much) more concise description, and then follow on with my comments. If I misunderstand what you're asking for, please clarify. My summary of what you explained: You are looking for a concept-based search mechanism, which can preferably described what sorts of relationships the located items have to each other ("this is an example of that", etc.). You indicate an advantage of automatic concept extraction based on the content. From a user interface perspective, it sounds like each "chunk" of documentation presented should have some sort of entry box or button that searches for other chunks related to the chunk on that page. My response: I think this would be really nice to have. As far as I'm aware, such systems are still largely research projects, with some applications having reached deployment (you point to good examples). To do this for the Python documentation (defined as broadly as needed), the most-needed thing to accomplish this is someone who can donate time and know-how. I don't know enough about the AI aspects or the natural language processing aspects. The user interface issues are also non-trivial (esp. if the interface can be distilled all the way down to a single button and maybe a text-entry box). But I'd be glad to work with someone regarding interpretation of the existing documentation and any improvements that could be made to make the processing more effective. There are two aspects to this which are related but not tightly bound: extraction of "concepts" and use of concepts to locate interesting information. Concepts can be extracted from the text using AI/NLP tools or can be marked explicitly in the documentation source. I must admit a bias toward the latter approach, but automated techniques may have progressed sufficiently to make them viable. I do not see any reason for the approaches to concept extraction to be mutually exclusive. What constitutes a "chunk" needs to be clearly defined, both for purposes of hyper-navigation and percolation of concept assignments up and down the document structure hierarchy. Use of a concept-to-chunk database may need to know about the extraction techniques (at least the explicit vs. automatic dichotomy), especially for purposes of ranking or presentation. I think we can go a long way using techniques based on explicit markup in the documentation. The index construction markup is one example of "meta" information being located in the documents, and other aspects of the markup are becoming increasingly "logical" rather than presentation-based. There is no reason that two things can't both happen: 1) additional meta information be added to the documents to allow explicit encoding of concept-like information, and 2) processing software imply relationships between chunks based on existing markup. With the coming conversion of the documentation to SGML, I expect some information present in the documentation today will become more explicit, making it somewhat easier to create processing software that doesn't have to make as many basic inferences as it has to today. (Yes, I realize that this doesn't come from SGML, but the conversion is an excellent opportunity for us to refine the markup in more useful ways than has been the case with the existing markup.) I'm quite interested in hearing from people about what information would be useful if marked explicitly, and how it could be used. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives 1895 Preston White Dr. Reston, VA 20191 From Manuel Gutierrez Algaba Thu Dec 10 23:33:06 1998 From: Manuel Gutierrez Algaba (Manuel Gutierrez Algaba) Date: Thu, 10 Dec 1998 23:33:06 +0000 (GMT) Subject: [Doc-SIG] Possible Howto gui, narrative functions in computer , sciences Message-ID: In www.ctv.es/USERS/irmina I've put the 'documentation' of my project Lritaunas Peki. Well, it's a very peculiar documentation. In fact, I've done it keeping in mind that it could be as a STRUCTURED-PROGRAMMING-HOWTO. I mean, I'd like you to take a look at it and that you tell me if it's worth as a HOWTO, or how could it be improved to become to a HOWTO ? It's not finished but I'd like your opinion. It illustrates the cicle of software programming. And it brings another interesting points. First one: Should all GUI program documentations be similar to Lritaunas Peki's ? I think so, because it's a very straight forward method of knowing how the programmer did the program and what were his thoughts and strategies. I mean it's documentation for Maintainers. Second point: Right now , we're discusing the posibility of making an advanced 'apropos' utility (in the doc list), so the documentation of python library/programs get much better. Well, doing this piece of documentation ( Lritaunas Peki ) I've realised something. Do you know what 'narrative functions' are ? Well, early in this century ( or in lates 1800 ) a Russian studied the traditional Russian stories and he found that they have a similar structure, in fact all they were composed of such functions as: initial-situation, unbalancing-action, balancing-action,... and so. If you think that the millions of the stories of the world are composed of such small set of things, then it makes me wonder that programs , specially GUI can be described in sets of computational functions. Well, what about start talking about ( for example in the context of Lritaunas ) : input-section , labels output-section, words transform-section, words, grammar direct-gui-section If we find such set, and we apply it to the improved-apropos idea, we can get all the functions of a class or of a program sorted by its role in the system. You could have a very direct idea, just reading how are placed this functions. Just think in a class with many functions, some of them will read the arguments of the class, some others will be internal interfaces in the own class. Some other just print results... As a matter of fact when we read a program we get that information, what about putting it explicity for higher level documentation programs? Regards/Saludos Manolo ------------- My addresses / mis direcciones: www.ctv.es/USERS/irmina ---> lritaunas peki project/proyecto in python www.ctv.es/USERS/irmina/pyttex.htm ---> page of spanish users of latex / pagina de usuarios en espanyol de latex www.ctv.es/USERS/irmina/texpython.htm --> page of drawing utility for tex / pagina de utilidad de dibujo para Latex "...abandoneis el campo y vuestras casas y acudais a defender el mar y la ciudad...no lamentarse por las casas o la tierra, sino por las vidas humanas, pues aquellas no nos proporcionan hombres, sino los hombres aquellas "-Pericles From skaller@maxtal.com.au Fri Dec 11 03:33:06 1998 From: skaller@maxtal.com.au (John Skaller) Date: Fri, 11 Dec 1998 13:33:06 +1000 Subject: [Doc-SIG] An 'apropos' utility for documentations Message-ID: <1.5.4.32.19981211033306.00f66778@triode.net.au> At 13:26 9/12/98 -0500, Fred L. Drake wrote: >My response: > I think this would be really nice to have. As far as I'm aware, >such systems are still largely research projects, with some >applications having reached deployment (you point to good examples). >To do this for the Python documentation (defined as broadly as >needed), the most-needed thing to accomplish this is someone who can >donate time and know-how. I've been thinking for a while on the following somewhat related problem: a lot of library docstrings are missing. It's all very well for Guido to ask for patches, but this has to be the most inconvenient update mechanism around. And then I saw the TODO list Ken put up. Which is excellent! So, if there were a way to make an _online_ list of every function, class, and module in the library, with the current docstring, and allow everyone to add docstrings over the Internet, like the TODO list, we'd have more people able to make small contributions. [There are some interesting issues here, such as foreign language docstrings. Why English?] Anyhow, the same mechanism could help to build a Python 'thesarus', at least for the standard distribution. Do you have any idea how long it took me to find the function that copies files?? I knew it was there, somewhere. But the silly thing is called 'filecopy' not 'copyfile', and it's hidden away in a module with an apparently unrelated name. 'shutils'. ------------------------------------------------------- John Skaller email: skaller@maxtal.com.au http://www.maxtal.com.au/~skaller phone: 61-2-96600850 snail: 10/1 Toxteth Rd, Glebe NSW 2037, Australia From tim_one@email.msn.com Fri Dec 11 04:06:01 1998 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 10 Dec 1998 23:06:01 -0500 Subject: [Doc-SIG] An 'apropos' utility for documentations In-Reply-To: <1.5.4.32.19981211033306.00f66778@triode.net.au> Message-ID: <000501be24bb$9077b0c0$839e2299@tim> [John Skaller] > I've been thinking for a while on the following > somewhat related problem: a lot of library docstrings are > missing. It's all very well for Guido to ask for patches, > but this has to be the most inconvenient update mechanism > around. Would be interesting to know how many docstring patches Guido has received -- expect it would be a strong confirmation of the ineffectiveness of this approach. > And then I saw the TODO list Ken put up. > Which is excellent! So, if there were a way to > make an _online_ list of every function, class, and module > in the library, with the current docstring, and allow > everyone to add docstrings over the Internet, like the TODO list, > we'd have more people able to make small contributions. I have no idea how to implement that (specifically getting the latest contributions back into the sources), but it's a truly wonderful idea! If I can whine at anybody to help this along, just point me in the poor bastard's direction. > [There are some interesting issues here, such as foreign > language docstrings. Why English?] Because only a handful of misfits speak Australian . roofully y'rs - tim From Edward Welbourne Fri Dec 11 10:25:59 1998 From: Edward Welbourne (Edward Welbourne) Date: Fri, 11 Dec 1998 10:25:59 GMT Subject: [Doc-SIG] An 'apropos' utility for documentations In-Reply-To: <1.5.4.32.19981211033306.00f66778@triode.net.au> References: <1.5.4.32.19981211033306.00f66778@triode.net.au> Message-ID: <199812111025.KAA01104@lsls4p.lsl.co.uk> A Wiki would be a pretty good way to collect docstrings. http://c2.com/cgi/wiki?FrontPage Eddy. From fredrik@pythonware.com Fri Dec 11 11:12:19 1998 From: fredrik@pythonware.com (Fredrik Lundh) Date: Fri, 11 Dec 1998 12:12:19 +0100 Subject: [Doc-SIG] An 'apropos' utility for documentations Message-ID: <00f101be24f7$1f10a860$f29b12c2@pythonware.com> >A Wiki would be a pretty good way to collect docstrings. > >http://c2.com/cgi/wiki?FrontPage in python, that's a "groupmind": http://starship.skyport.net/crew/scharf/GroupMind/GroupMind.cgi?req=home /F From guido@CNRI.Reston.VA.US Fri Dec 11 13:01:35 1998 From: guido@CNRI.Reston.VA.US (Guido van Rossum) Date: Fri, 11 Dec 1998 08:01:35 -0500 Subject: [Doc-SIG] An 'apropos' utility for documentations In-Reply-To: Your message of "Fri, 11 Dec 1998 12:12:19 +0100." <00f101be24f7$1f10a860$f29b12c2@pythonware.com> References: <00f101be24f7$1f10a860$f29b12c2@pythonware.com> Message-ID: <199812111301.IAA13355@eric.cnri.reston.va.us> > >A Wiki would be a pretty good way to collect docstrings. > > > >http://c2.com/cgi/wiki?FrontPage > > in python, that's a "groupmind": > http://starship.skyport.net/crew/scharf/GroupMind/GroupMind.cgi?req=home Funny how the circle comes around: GroupMind is derived from my FAQ wizard, which was first mentioned in this thread. BTW, GroupMind seems almost empty -- what happened? E.g. I can't find a single item on the promised TODO list! --Guido van Rossum (home page: http://www.python.org/~guido/) From skaller@maxtal.com.au Sat Dec 12 01:18:01 1998 From: skaller@maxtal.com.au (John Skaller) Date: Sat, 12 Dec 1998 11:18:01 +1000 Subject: [Doc-SIG] An 'apropos' utility for documentations Message-ID: <1.5.4.32.19981212011801.00f6353c@triode.net.au> At 23:06 10/12/98 -0500, Tim Peters wrote: >> And then I saw the TODO list Ken put up. >> Which is excellent! So, if there were a way to >> make an _online_ list of every function, class, and module >> in the library, with the current docstring, and allow >> everyone to add docstrings over the Internet, like the TODO list, >> we'd have more people able to make small contributions. > >I have no idea how to implement that (specifically getting the latest >contributions back into the sources), I agree. This is nasty. But I think it can be done for _python_ modules fairly easily. For C modules, it will require manual patching. For Python modules, the parser can identify not only what a docstring is, and which object to plug it into (module, class, function) but also when there is NO docstring .. and hence a place where one COULD be put. If the parser was temporarily stuffed with by ing a lot at it, it could generate a file of all the docstring insertion points: be they existing docstrings, or a place docstrings could be put. See below for an alternative. >> [There are some interesting issues here, such as foreign >> language docstrings. Why English?] > >Because only a handful of misfits speak Australian . 'viously, notu. ucntevn spell Strine right, mate. Youse'dseemorecrickt. -------------------- I've been thinking about internationalisation. Really, a docstring should be a dictionary, keyed by language. There's another way to do this, however, which requires a bit of internal Python support and provides other advantages. Suppose, whenever an interned string literal was about to be used, it was looked up in a table for an equivalent in the users native language. If no equivalent is found, the original is used. But if one is found, it is substituted. For example: print "Hello",name will print "Guten morgen John" if the locale sets the native language to German, and the following function (or something like it) has been executed: set_equivalent_literal("English", "Hello", "German", "Guten morgen") The general idea is that you can write your program in your native language, and someone can provide a table of equivalents for one or more other languages -- not necessarily the author. With this idea, it should be possible to have docstrings in more than one language, even if the author only used their native language. It would also support 'internationalising' programs _without_ having to replace every literal in the program with a function call. This is only an idea! Hmm. Another idea. Just for docstrings. How about a special __get_docstring__ hook that would do the conversion for docstrings? In particular, it could lookup docstrings in places _other_ than the standard place (bound to the object). ------------------------------------------------------- John Skaller email: skaller@maxtal.com.au http://www.maxtal.com.au/~skaller phone: 61-2-96600850 snail: 10/1 Toxteth Rd, Glebe NSW 2037, Australia