From hinsen@cnrs-orleans.fr Thu Jun 17 10:10:29 1999 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Thu, 17 Jun 1999 11:10:29 +0200 Subject: [Doc-SIG] DocBook formatter for pythondoc Message-ID: <199906170910.LAA17920@chinon.cnrs-orleans.fr> I have put a first version of a DocBook formatter for pythondoc 0.6 on my ftp site (ftp://dirac.cnrs-orleans.fr/pub/DocBookFormatter.py). It doesn't yet support all possible markup, and probably never will, because pythondoc is much too layout-oriented for DocBook. For example, it is impossible to have just a "title" in DocBook, titles are always associated with structural units (chapters, sections, etc.). Nevertheless, the DocBook formatter is quite useful when used on appropriate doc strings. The goal of my formatter is to produce reference chapters for end user documentation. Therefore I have introduced an option that I think would be useful for other formatters as well: when you specify the Option "DocBook_undocumented=0", no documentation will be generated for items that have no doc string. I use this to exclude methods like __add__ from end user documentation; instead I mention in the doc string of the class that the class supports addition. If you look at the code, you will notice two methods that are commented out and replaced with different ones. These are the methods that generate function and method documentation. The original (now commented) versions produce the "proper" markup according to the DocBook reference, but unfortunately the existing DocBook stylesheets were made with C in mind and produce C syntax for every function call! I hope this will change in later versions, but until then I prefer to use less specific markup that produces Python syntax. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From Fred L. Drake, Jr." Thanks to Moshe Zadka, a number of modules are making they're way to the "documented" list. I've appended an updated list of undocumented modules; I generate the from the Doc/ directory using the command: ./tools/listmodules -cbi paper-letter/modlib.idx -i paper-letter/modmac.idx (This requires that the DVI or PDF version of the documentation has been built, to get the .idx files that list what's been documented.) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives pcre strop _locale cl curses dl fpectl fpetest nis pure sgi stdwin sv timing asynchat asyncore audiodev bdb chunk codeop dircmp dospath dump find grep htmlentitydefs ihooks knee macurl2path mutex ntpath nturl2path packmail pipes posixpath pty reconvert regex_syntax rlcompleter sched statvfs sunau sunaudio toaiff tty tzparse lib-tk Canvas Dialog FileDialog FixTk ScrolledText SimpleDialog Tkconstants Tkdnd Tkinter tkColorChooser tkCommonDialog tkFileDialog tkFont tkMessageBox tkSimpleDialog turtle plat-* CD CDIO CDROM CL CL_old ERRNO FCNTL FILE GET GLWS IN IOCTL SOCKET STROPTS SUNAUDIODEV SV WAIT cddb cdplayer panel panelparser readcd torgb From guido@CNRI.Reston.VA.US Fri Jun 18 20:31:05 1999 From: guido@CNRI.Reston.VA.US (Guido van Rossum) Date: Fri, 18 Jun 1999 15:31:05 -0400 Subject: [Doc-SIG] Undocumented modules update. In-Reply-To: Your message of "Thu, 17 Jun 1999 14:54:08 EDT." <14185.17488.598311.283482@weyr.cnri.reston.va.us> References: <14185.17488.598311.283482@weyr.cnri.reston.va.us> Message-ID: <199906181931.PAA12276@eric.cnri.reston.va.us> Fred asked me to comment on his list of undocumented modules. Here are my brief comments. > pcre Internal for the "re" module, may go away > strop Internal for the "string" module, will go away > _locale Internal for the "_locale" module > cl SGI compression library; may be out of touch with reality > curses Important, deserves to be documented > dl A bit dangerous and platform specific, but worth documenting for those that need it > fpectl > fpetest Very specialized for those that want better error control over floating point operations; a few people at LLNL are probably the only who know (or care!) about this. Only they know enough to document it > nis Useful, but incomplete (doesn't do NIS+). > pure Useful when you are using Purify. Twist Barry Warsaw's arm (he wrote it). > sgi SGI specific miscellanea (nap(), _getpty()); worth documenting > stdwin I won't support this in Python 1.6, so not worth documenting. Don't write new code that uses this (but don't remove it from the distribution just yet; there are some users out there) > sv SGI Indigo video; may be out of touch with reality > timing Hmm... I think there's nothing this module does that you can't do by calling time.time(), plus this module is Unix specific. There are probably some users but it ought to disappear in 1.6 and doesn't deserve documentation > asynchat > asyncore Someone should dig up the documentation that I hope Sam Rushing has sitting somewhere on noghtmare.com > audiodev This deserves to be documented, it implements a platform-independent API for playing audio (the platform-specific code it contains should *not* be documented, only the portable API) > bdb Important building block for creating your own Python debugger; deserves docs > chunk Building block for aifc, could also be used (perhaps) in other iff-like file format readers; deserves docs > codeop Deserves docs (separated out from code because JPython has a different implementation) > dircmp Very old, should probably become obsolete or be turned into an application or demo > dospath Internal name for os.path on DOS platforms; could also be used to manipulate DOS pathnames on a non-DOS platform so might be documented as such, but it's important to note that typically you should use os.path > dump We just made this obsolete; use pickle or marshal instead (it would make a nice demo though) > find This is more deserving of a Demo or Tool; perhaps it should become obsolete > grep Useful enough, but more deserving of being a Demo or Tool > htmlentitydefs Deserves documentation (is it up-to-date?) > ihooks Used by rexec, so could be documented; this is very hairy! (Also I can't promise not to change it or to make it obsolete later; the plans for improving import in Python 1.6 will probably make it obsolete.) > knee This is sample code only; perhaps it should become a demo. > macurl2path Internal for urllib, no need to document (but urllib needs to grow another case for Win/CE if that has sockets) > mutex Very weird -- obsolete it? > ntpath See dospath > nturl2path See macurl2path > packmail A simple "shar archive" creator which probably should become a tool or demo > pipes A framework used by toaiff (read the comments) > posixpath See dospath > pty Useful at times > reconvert Should become a tool > regex_syntax Helper for the old regex module; will become obsolete in 1.6 so no need to document > rlcompleter Very cool code that ought to be documented (and cross-referenced with the readline module) > sched See mutex > statvfs To be documented; this is the counterpart of the "stat" module for the os.statvfs() call present in Linux and some other systems > sunau To be documented; read/write Sun style audio files > sunaudio Huh? This seems to be duplicating 5% of the functionality of sunau. Probably ought to be obsolete or become a tool > toaiff This ought to become a tool; it requires the external program "sox" > tty Like pty, this is useful > tzparse Someone ought to finish this; perhaps it is too specific to be a standard module and should become a tool or demo > lib-tk Canvas To be documented; on the other hand one could argue that it's better to use the Tkinter.Canvas class directly and plan for this module's obsolescence > Dialog To be documented (it interfaces to tk_dialog) > FileDialog This should probably become obsolete, but I think it still has plenty of users. It's better to use tkFileDialog, which interfaces to tk_getOpenFile and tk_getSaveFile > FixTk Internal for Tkinter on Windows > ScrolledText Hmm... This is incomplete and not very robust; but people use it. It should eventually be phased out in favor of something more like Pmw > SimpleDialog Like FileDialog, it's better to use tkSimpleDialog > Tkconstants Internal for Tkinter > Tkdnd To be documented > Tkinter To be documented (can you say "albatross"? :-) > tkColorChooser To be documented > tkCommonDialog To be documented > tkFileDialog To be documented > tkFont To be documented > tkMessageBox To be documented > tkSimpleDialog To be documented > turtle To be documented > > plat-* CD > CDIO > CDROM > CL > CL_old > FCNTL > FILE > GET > GLWS > IN > IOCTL > SOCKET > STROPTS > SUNAUDIODEV > SV > WAIT These all define platform-specific constants needed for using certain interfaces (e.g. ioctl). They probably need to be documented in a sense, but not every constant in them needs to be documented; instead, the modules that they belong to should mention "the constants are over there" > ERRNO Obsolete! Use builtin module errno (SGI only) > cddb > cdplayer Should become a tool or demo (SGI only) > panel > panelparser For the *very* old SGI panel library probably not usable any more > readcd SGI specific CD-ROM player control; documentation exists in readcd.doc! But probably should become a demo or tool > torgb Another client of the pipes module; SGI specific, probably not used any more --Guido van Rossum (home page: http://www.python.org/~guido/) From Moshe Zadka Sat Jun 19 14:34:40 1999 From: Moshe Zadka (Moshe Zadka) Date: Sat, 19 Jun 1999 16:34:40 +0300 (IDT) Subject: [Doc-SIG] Undocumented modules update. In-Reply-To: <199906181931.PAA12276@eric.cnri.reston.va.us> Message-ID: (I commented a bit on Guido's comments) On Fri, 18 Jun 1999, Guido van Rossum wrote: > > curses > Important, deserves to be documented I'm working on it. Hopefully I'll have it complete today. > > dl > A bit dangerous and platform specific, > but worth documenting for those that need it Fred already has my docs for that one, but (rightly!) wants to go over it himself. > > grep > Useful enough, but more deserving of being > a Demo or Tool Especially since it uses the old regex, and from a preliminary look I'm not at all sure how to convert it to "re". > > htmlentitydefs > Deserves documentation (is it up-to-date?) This module is pretty trivial: ------------------cut here-------------------- # Proposed entity definitions for HTML, taken from # http://www.w3.org/hypertext/WWW/MarkUp/html-spec/html-spec_14.html entitydefs is a dictionary mapping entity names to characters. ------------------cut here-------------------- (Should be LaTeXed, of course, but doesn't seem like a big deal) > > packmail > A simple "shar archive" creator > which probably should become a tool or demo And in any way, Fred has my docs for it, in case it doesn't. > > rlcompleter > Very cool code that ought to be documented > (and cross-referenced with the readline > module) I couldn't find any documentation for the readline module either. Huh??? -- Moshe Zadka . #!/usr/bin/tail -1 Just another tail hacker. From gerrit.holl@pobox.com Sat Jun 19 16:29:39 1999 From: gerrit.holl@pobox.com (Gerrit Holl) Date: Sat, 19 Jun 1999 17:29:39 +0200 Subject: [Doc-SIG] Undocumented modules update. In-Reply-To: <14185.17488.598311.283482@weyr.cnri.reston.va.us>; from Fred L. Drake on Thu, Jun 17, 1999 at 02:54:08PM -0400 References: <14185.17488.598311.283482@weyr.cnri.reston.va.us> Message-ID: <19990619172939.A21516@optiplex> On Thu, Jun 17, 1999 at 02:54:08PM -0400, Fred L. Drake wrote: > From: "Fred L. Drake" > Date: Thu, 17 Jun 1999 14:54:08 -0400 (EDT) > To: Doc-SIG List > Subject: [Doc-SIG] Undocumented modules update. > > > > asynchat > asyncore > audiodev > bdb > chunk > codeop > dircmp > dospath > dump > find > grep > htmlentitydefs > ihooks > knee > macurl2path > mutex > ntpath > nturl2path > packmail > pipes > posixpath > pty > reconvert > regex_syntax > rlcompleter > sched > statvfs > sunau > sunaudio > toaiff > tty > tzparse > huh? Is there documentation for the turtle module already?? groeten/regards, Gerrit. -- The Dutch Linuxgames website. De Nederlandse Linuxgames pagina. Site address: http://linuxgames.nl.linux.org <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Discoverb: learn words and definitions Discoverb: leer woordjes en definities Site address: http://nl.linux.org/~gerrit/discoverb/ <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Personal homepage: http://nl.linux.org/~gerrit/ From ovidiu@cup.hp.com Sat Jun 19 19:34:44 1999 From: ovidiu@cup.hp.com (ovidiu@cup.hp.com) Date: Sat, 19 Jun 1999 11:34:44 -0700 Subject: [Doc-SIG] Undocumented modules update. In-Reply-To: Your message of "Sat, 19 Jun 1999 16:34:40 +0300." Message-ID: <199906191834.LAA15426@hercules.cup.hp.com> On Sat, 19 Jun 1999 16:34:40 +0300 (IDT), Moshe Zadka wrote: > > > rlcompleter > > Very cool code that ought to be documented > > (and cross-referenced with the readline > > module) > > I couldn't find any documentation for the readline module either. > Huh??? I have a complete re-implementation of the readline module. It has bindings to all the functions defined by the GNU readline library except for some functions for operating on keymaps. One thing that still needs to be done is to test it thoroughly especially with things like the asynchronous interface and redefining keys. The package works just fine with the rlcompleter module although the low level interface to completion could still be used. Is there any interest in using it instead of the limited readline module that comes with Python? If that's the case, what would be the license and copyright that would fit the package the best for inclusion with the Python distribution? The package is now copyrighted by HP and I was thinking to make it available under LGPL. Is this OK from the Python's distribution point of view? Another option would be to make it available as a separate package, but that would be a little bit annoying for normal users of readline. Greetings, -- Ovidiu Predescu http://www.geocities.com/SiliconValley/Monitor/7464/ From skip@mojam.com (Skip Montanaro) Mon Jun 28 00:01:37 1999 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Sun, 27 Jun 1999 19:01:37 -0400 (EDT) Subject: [Doc-SIG] soundex module status? Message-ID: <14198.43964.125827.437468@cm-24-29-94-19.nycap.rr.com> At one point Guido talked about obsoleting the soundex module. I didn't see it on the list of undocumented modules but also didn't see an "this module is obsolete" comment in the code. It's a very simple modules (defines two module methods, defines no objects, makes no significant external calls). I've been meaning to get around to using it in Musi-Cal, but it's always been low enough priority that it's never gotten done. (As Don Beaudry says, "so much code, so little time".) What's its current status? I'll be happy to whip up a simple latex file for it if that saves its life. If it's deemed unworthy of continued module-hood, I propose it simply be moved into the Demo directory. Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/~skip/ 518-372-5583 From Fred L. Drake, Jr." References: <14198.43964.125827.437468@cm-24-29-94-19.nycap.rr.com> Message-ID: <14199.34169.2008.359356@weyr.cnri.reston.va.us> Skip Montanaro writes: > What's its current status? I'll be happy to whip up a simple latex file for > it if that saves its life. If it's deemed unworthy of continued > module-hood, I propose it simply be moved into the Demo directory. Skip, It's still obsolete. There is a documentation file for it; if you build documentation from the LaTeX sources, you can uncomment one line in lib/lib.tex to restore it to the documentation. But it's just too obscure to be considered "standard". It doesn't appear to exactly match any description I've seen of the algorithm (it produces more result than it should!). There's no reason to doubt it's module-hood; if someone tells Guido they'll adopt it, I'm sure he'll consider it a welcome offering. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From hinsen@cnrs-orleans.fr Mon Jun 28 15:58:57 1999 From: hinsen@cnrs-orleans.fr (Konrad Hinsen) Date: Mon, 28 Jun 1999 16:58:57 +0200 Subject: [Doc-SIG] Patches to pythondoc 0.6 Message-ID: <199906281458.QAA13960@chinon.cnrs-orleans.fr> There were two bugs in pythondoc that I found sufficiently annoying to warrant an exploration: 1) Markup in oneliners was not taken into account. 2) Markup for a single letter didn't work. Here's the patch: diff -r -c pythondoc-0.6/docregex.py pythondoc-0.6-fixed/docregex.py *** pythondoc-0.6/docregex.py Sat May 1 03:01:15 1999 --- pythondoc-0.6-fixed/docregex.py Mon Jun 28 16:52:34 1999 *************** *** 37,47 **** # Strong: # format: "**strong text**" ! strong_regex=re.compile(startm + "\*\*(?P[^ \t][^\n*]*[^ \t])\*\*" + endm, re.MULTILINE) # Emphasized: # format: "*emphasized*" ! emph_regex = re.compile(startm + "\*(?P[^ \t][^\n*]*[^ \t])\*" + endm, re.MULTILINE) # Bullet: # format: "* bulleted list" --- 37,47 ---- # Strong: # format: "**strong text**" ! strong_regex=re.compile(startm + "\*\*(?P[^ \t]([^\n*]*[^ \t])?)\*\*" + endm, re.MULTILINE) # Emphasized: # format: "*emphasized*" ! emph_regex = re.compile(startm + "\*(?P[^ \t]([^\n*]*[^ \t])?)\*" + endm, re.MULTILINE) # Bullet: # format: "* bulleted list" diff -r -c pythondoc-0.6/stdmarkup.py pythondoc-0.6-fixed/stdmarkup.py *** pythondoc-0.6/stdmarkup.py Sat May 1 03:09:09 1999 --- pythondoc-0.6-fixed/stdmarkup.py Mon Jun 28 16:48:00 1999 *************** *** 97,103 **** def create_tree_Object(self, docobj): """Default implementation for generating doctrees.""" self.trace_msg("Creating tree for %s %s" % (docobj.tag(), docobj.name()), 2) ! oneliner, doc = docstring.split_doc(docobj.docstring()) try: oneliner = eval("self.fix_oneliner_%s(oneliner)" % docobj.tag()) except AttributeError: --- 97,103 ---- def create_tree_Object(self, docobj): """Default implementation for generating doctrees.""" self.trace_msg("Creating tree for %s %s" % (docobj.tag(), docobj.name()), 2) ! oneliner, doc = docstring.split_doc(docobj.docstring()) try: oneliner = eval("self.fix_oneliner_%s(oneliner)" % docobj.tag()) except AttributeError: *************** *** 107,113 **** st = StructuredText.StructuredText(doc) if oneliner: ! self.__doctree.add_child(doctree.Oneliner(oneliner)) marker = doctree.DocNode('', 'Marker') self.__doctree.add_child(marker) --- 107,115 ---- st = StructuredText.StructuredText(doc) if oneliner: ! node = doctree.Oneliner('') ! self._create_markup_node(oneliner, node) ! self.__doctree.add_child(node) marker = doctree.DocNode('', 'Marker') self.__doctree.add_child(marker) -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From skip@mojam.com (Skip Montanaro) Mon Jun 28 16:32:11 1999 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 28 Jun 1999 11:32:11 -0400 (EDT) Subject: [Doc-SIG] soundex module status? In-Reply-To: <14199.34169.2008.359356@weyr.cnri.reston.va.us> References: <14198.43964.125827.437468@cm-24-29-94-19.nycap.rr.com> <14199.34169.2008.359356@weyr.cnri.reston.va.us> Message-ID: <14199.37420.840989.983915@cm-24-29-94-19.nycap.rr.com> Fred> But it's just too obscure to be considered "standard". It doesn't Fred> appear to exactly match any description I've seen of the algorithm Fred> (it produces more result than it should!). It does indeed produce longer results than other implementations. It appears that modulo any bugs, it could be brought into "spec" by just lopping off the last two characters. I compared its output with the examples in the Perl soundex module documentation at http://language.perl.com/newdocs/lib/Text/Soundex.html Except for lopping off the last two characters, the only difference I found was in the mappings for Lukasiewicz and Lissajous. The Perl version yields L222, while the soundex module yields L200. I think the Perl version has a bug, because duplicate digits should be avoided, though I haven't got Knuth's algorithm to refer to. NIST's C implementation at http://physics.nist.gov/cuu/Reference/soundex.html does avoid duplicate digits. Fred> There's no reason to doubt it's module-hood; if someone tells Fred> Guido they'll adopt it, I'm sure he'll consider it a welcome Fred> offering. I'll be happy to "adopt" the module, especially if it will keep it "in the fold". I tried sending mail to the original author but it bounced (not really surprising). Should I do something formal to take it over? Perhaps add my name to the code and send it in for update? Skip From guido@CNRI.Reston.VA.US Mon Jun 28 21:00:26 1999 From: guido@CNRI.Reston.VA.US (Guido van Rossum) Date: Mon, 28 Jun 1999 16:00:26 -0400 Subject: [Doc-SIG] soundex module status? In-Reply-To: Your message of "Mon, 28 Jun 1999 11:32:11 EDT." <14199.37420.840989.983915@cm-24-29-94-19.nycap.rr.com> References: <14198.43964.125827.437468@cm-24-29-94-19.nycap.rr.com> <14199.34169.2008.359356@weyr.cnri.reston.va.us> <14199.37420.840989.983915@cm-24-29-94-19.nycap.rr.com> Message-ID: <199906282000.QAA01520@eric.cnri.reston.va.us> > I'll be happy to "adopt" the module, especially if it will keep it "in the > fold". I tried sending mail to the original author but it bounced (not > really surprising). Should I do something formal to take it over? Perhaps > add my name to the code and send it in for update? Sigh. I wish that module went away -- it offends my sense of what should be a standard library facility (like several other very old modules, including some I wrote, mind you -- it's just that this one's in C, which makes it more of a liability). If anything, it should be rewritten in Python, I really don't see why it has to be in C. It also makes sense to dig out the official algorithm from somewhere. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@mojam.com (Skip Montanaro) Mon Jun 28 21:20:25 1999 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 28 Jun 1999 16:20:25 -0400 (EDT) Subject: [Doc-SIG] soundex module status? In-Reply-To: <199906282000.QAA01520@eric.cnri.reston.va.us> References: <14198.43964.125827.437468@cm-24-29-94-19.nycap.rr.com> <14199.34169.2008.359356@weyr.cnri.reston.va.us> <14199.37420.840989.983915@cm-24-29-94-19.nycap.rr.com> <199906282000.QAA01520@eric.cnri.reston.va.us> Message-ID: <14199.54775.271169.10105@cm-24-29-94-19.nycap.rr.com> >>>>> "Guido" == Guido van Rossum writes: Guido> Sigh. I wish that module went away -- it offends my sense of Guido> what should be a standard library facility (like several other Guido> very old modules, including some I wrote, mind you -- it's just Guido> that this one's in C, which makes it more of a liability). If Guido> anything, it should be rewritten in Python, I really don't see Guido> why it has to be in C. It also makes sense to dig out the Guido> official algorithm from somewhere. I'd be happy to shepherd it contrib and into Python as well. I'm sure it can safely be recoded in Python. I doubt my needs for it would suffer if that happened. It sounds to me like Python really needs a way to allow people to easily incorporate "contrib" stuff into their installations. This is way off-topic for this list, so I'll only toss out one dumb comment about how easily I'd like things to be, then shut up. I like the side-effect documentation of using configure to install Python and associated bits. I can glance at the top of config.status and see how I configured Python six months ago (or egcs or binutils or anything else). I'd love to see something like ./configure --with-thread --with-contrib="soundex msql fred" in config.status that tells me not only how I configured Python, but what non-core modules I downloaded and installed. CPAN works similarly, though doesn't necessarily provide the concise documentation of the installation. The side-effect of --with-contrib would be to download the named modules from PPAN (or wherever) and install them along with the rest of Python. If downloading, configuring and installing non-core modules was this easy, I wouldn't care much if you decided to cut the cgi module out of the core distribution. (That said, I know Greg Ward has been working on installation issues.) Once something fairly transparent is available, I think Guido should partition the core distribution into disjoint "offensive" and "defensive" sets, with the "offensive" stuff winding up in contrib and the "defensive" stuff remaining in the core. Skip From tim_one@email.msn.com Tue Jun 29 05:16:28 1999 From: tim_one@email.msn.com (Tim Peters) Date: Tue, 29 Jun 1999 00:16:28 -0400 Subject: [Doc-SIG] soundex module status? In-Reply-To: <14199.54775.271169.10105@cm-24-29-94-19.nycap.rr.com> Message-ID: <000101bec1e6$2980d360$229e2299@tim> Oh, you people are so charmingly naive . There are at least a dozen algorithms *called* "Soundex" out there, and unless you're eager to dig into the historical archives the original algorithm is mostly likely nowhere to be found. I've always been particularly fond of Python's, because it's missing a case stmt for the most popular letter in the language . In any case, 1) Soundex certainly doesn't deserve to be a std C module! It makes an OK demo of a Python extension, though. 2) If Skip promises to take it over, I'll attach a Python implementation of Knuth's version of Soundex. This isn't the same algorithm as the current module implements (this one gives better results due to its funky treatment of w and h and sensible treatment of vowels), but I must protest that "Knuth's version" isn't entirely well-defined! This implementation at least matches the concrete examples he gives. P362-ly y'rs - tim PS: > Except for lopping off the last two characters, the only difference I > found was in the mappings for Lukasiewicz and Lissajous. The Perl > version yields Z222, while the soundex module yields L200. The Perl result is "correct" here -- any decent variation of Soundex will treat voiced vowels as breaking runs of otherwise-similar letters. This isn't magic: it's supposed to given a crude encoding of how a word *sounds*. Voiced vowels are confusable so don't deserve their own encoding, but they certainly break up the consonant sounds on either side. Speaking of which, the Knuthian rules pretty much suck for many non-English languages -- let's make /F write a Unicode version of this . Module follows: NDIGITS = 3 import string _upper = string.upper _translate = string.translate # B for Break -- all characters assumed simply to break runs _tran = ["B"] * 256 def _setcode(letters, value): for ch in letters: _tran[ord(_upper(ch))] = _tran[ord(ch)] = value _setcode("bfpv", "1") _setcode("cgjkqsxz", "2") _setcode("dt", "3") _setcode("l", "4") _setcode("mn", "5") _setcode("r", "6") # B for Break -- these guys break runs _setcode("aeiouy", "B") # I for Invisible -- they don't count for anything except as first char _setcode("hw", "I") assert len(filter(lambda ch: ch != "B", _tran)) == \ (26 - len("aeiouy")) * 2, \ "Soundex initialization screwed up" _tran = string.join(_tran, "") del string, _setcode def soundex(name): """name -> Soundex code, following Knuth Vol 3 Ed 2 pg 394.""" if not name: raise ValueError("soundex requires non-empty name argument") coded = _translate(name, _tran) out = _upper(name[0]) lastrealcode = coded[0] ignore_same = 1 for thiscode in coded[1:]: if thiscode == "B": ignore_same = 0 continue if thiscode == "I": continue if ignore_same and lastrealcode == thiscode: continue out = out + thiscode lastrealcode = thiscode ignore_same = 1 if len(out) > NDIGITS: break if len(out) < NDIGITS + 1: out = out + "0" * (NDIGITS + 1 - len(out)) return out def _test(): global nerrors def check(name, expected): global nerrors got = soundex(name) if got != expected: nerrors = nerrors + 1 print "error in Soundex test: name", name, \ "expected", expected, "got", got nerrors = 0 check("Euler", "E460") check("Ellery", "E460") check("guass", "G200") check("gauss", "G200") check("Ghosh", "G200") check("HILBERT", "H416") check("Heilbronn", "H416") check("Knuth", "K530") check("K ** n U123t9247h ", "K530") check("Kant", "K530") check("Lloyd", "L300") check("Liddy", "L300") check("Lukasiewicz", "L222") check("Lissajous", "L222") check("Wachs", "W200") check("Waugh", "W200") check("HYHYH", "H000") check("kkkkkkkwwwwkkkkkhhhhhkkkkemmnmnhmn", "K500") check("Rogers", "R262") check("Rodgers", "R326") check("Sinclair", "S524") check("St. Clair", "S324") check("Tchebysheff", "T212") check("Chebyshev", "C121") if nerrors: raise SystemError("soundex test failed with " + `nerrors` + " errors") if __name__ == "__main__": _test() From gerrit.holl@pobox.com Mon Jun 28 21:56:12 1999 From: gerrit.holl@pobox.com (Gerrit Holl) Date: Mon, 28 Jun 1999 22:56:12 +0200 Subject: [Doc-SIG] turtle.py Message-ID: <19990628225611.A3757@optiplex.palga.uucp> Hallo, why isn't turtle.py in the list of undocumented modules? groeten, Gerrit. -- The Dutch Linuxgames homepage: http://linuxgames.nl.linux.org Personal homepage: http://www.nl.linux.org/~gerrit/ Discoverb is a python program (in several languages) which tests the words you learned by asking it. Homepage: http://www.nl.linux.org/~gerrit/discoverb/ From Fred L. Drake, Jr." References: <19990628225611.A3757@optiplex.palga.uucp> <14201.5000.755339.656899@weyr.cnri.reston.va.us> <19990629223042.A3790@optiplex.palga.uucp> Message-ID: <14201.14370.916413.852910@weyr.cnri.reston.va.us> Gerrit Holl writes: > I would like to help, but I don't know La(tex). Is that needed? Gerrit, It certainly helps, but the LaTeX that we use is sufficiently stylized that copying the style from existing documentation goes a long way. Since you appear to be a Linux user, you probably have one TeX installation or another installed, or can do so easily. (I'd recommend teTeX if you don't already have one.) If working with LaTeX is not doable for some reason, I'd be glad to mark up documentation for the standard documents. If you include untested LaTeX based on the examples, that would be better than no markup at all. Examples of whole documents and single-module sections are in the Doc/examples/ directory of the documentation source archive. > Anyway, there is another issue. I can try to document the turtle.py > module, but I don't understand everything myself ;-). That would be a problem. ;-) I can't help there; I've never looked at turtle.py myself. > But what do I have to know if I want do document anything? Or is that > somewhere at the page (sorry, no internet connection this evening). There is a preliminary version of a document called "Documenting Python" at http://www.python.org/doc/current/; that gives some information about writing documentation for Python. It's not complete, and I'm not sure when I'll have any real time to work on it, but specific questions or comments will certainly help me flesh out the most critical parts of it. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Fred L. Drake, Jr." References: <14199.54775.271169.10105@cm-24-29-94-19.nycap.rr.com> <000101bec1e6$2980d360$229e2299@tim> Message-ID: <14201.8736.407650.137344@weyr.cnri.reston.va.us> --d503sbm0jq Content-Type: text/plain; charset=us-ascii Content-Description: message body text Content-Transfer-Encoding: 7bit Tim Peters writes: > Oh, you people are so charmingly naive . There are at least a dozen > algorithms *called* "Soundex" out there, and unless you're eager to dig into Tim, Some of us strive very hard to reach our nirvana of naivte! ;-) It wasn't that long ago that I added the reference to Knuth to the documentation that I forgot forging through his history of the approach taken. > 1) Soundex certainly doesn't deserve to be a std C module! It makes an OK > demo of a Python extension, though. When I spoke with Guido about soundex, maybe a year ago, and what we should do about it, his comment wasn't that different from yours. Having a Python implementation made sense to support JPython, so I cooked one up. Once that was done, Guido prompty rejected it because he didn't want to add new code (I think he was close to a release at the time), even though I strove to make it match the existing module in both results and interface. > 2) If Skip promises to take it over, I'll attach a Python implementation of > Knuth's version of Soundex. This isn't the same algorithm as the current And I'll attach my version as well, since it's compatible with the existing module. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives --d503sbm0jq Content-Type: text/x-python Content-Description: soundex module in Python Content-Disposition: inline; filename="soundex2.py" Content-Transfer-Encoding: 7bit """The soundex algorithm takes an English word, and returns an easily-computed hash of it; this hash is intended to be the same for words that sound alike. This module provides an interface to the soundex algorithm. Note that the soundex algorithm is quite simple-minded, and isn't perfect by any measure. Its main purpose is to help looking up names in databases, when the name may be misspelled -- soundex hashes common misspellings together. """ import string def get_soundex(string): """Return the soundex hash value for a word; it will always be a 6-character string. `string' must contain the word to be hashed, with no leading whitespace; the case of the word is ignored. (Note that the original algorithm produces a 4-character result.)""" s = string.upper(string) if not s: return '000000' r = s[0] s = s[1:] while len(r) < 6 and s: c = s[0] s = s[1:] if c in "WHAIOUY": pass elif c in "BFPV": if r[-1] != '1': r = r + '1' elif c in "CGJKQSXZ": if r[-1] != '2': r = r + '2' elif c in "DT": if r[-1] != '3': r = r + '3' elif c == "L": if r[-1] != '4': r = r + '4' elif c in "MN": if r[-1] != '5': r = r + '5' elif c == "R": if r[-1] != '6': r = r + '6' return r + '0' * (6 - len(r)) def sound_similar(s1, s2): """Returns true if both arguments have the same soundex code.""" return get_soundex(s1) == get_soundex(s2) --d503sbm0jq-- From Fred L. Drake, Jr." References: <19990628225611.A3757@optiplex.palga.uucp> Message-ID: <14201.5000.755339.656899@weyr.cnri.reston.va.us> Gerrit Holl writes: > why isn't turtle.py in the list of undocumented modules? Gerrit, If you'd like to submit a patch relative to the current version in the CVS repository, I'm sure it can be incorportated. There are a lot of other Tk-related modules that aren't documented or listed as well. On the other hand, if you'd like to submit documentation for any or all Tk-related modules, I'd be glad to give you a chapter in the Library Reference! ;-) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gerrit.holl@pobox.com Tue Jun 29 21:30:42 1999 From: gerrit.holl@pobox.com (Gerrit Holl) Date: Tue, 29 Jun 1999 22:30:42 +0200 Subject: [Doc-SIG] turtle.py In-Reply-To: <14201.5000.755339.656899@weyr.cnri.reston.va.us>; from Fred L. Drake on Tue, Jun 29, 1999 at 02:42:16PM -0400 References: <19990628225611.A3757@optiplex.palga.uucp> <14201.5000.755339.656899@weyr.cnri.reston.va.us> Message-ID: <19990629223042.A3790@optiplex.palga.uucp> On Tue, Jun 29, 1999 at 02:42:16PM -0400, Fred L. Drake wrote: > From: "Fred L. Drake" > Date: Tue, 29 Jun 1999 14:42:16 -0400 (EDT) > To: Gerrit Holl > Cc: doc-sig@python.org > Subject: Re: [Doc-SIG] turtle.py > > Gerrit Holl writes: > > why isn't turtle.py in the list of undocumented modules? > > Gerrit, > If you'd like to submit a patch relative to the current version in > the CVS repository, I'm sure it can be incorportated. There are a lot > of other Tk-related modules that aren't documented or listed as well. > On the other hand, if you'd like to submit documentation for any or > all Tk-related modules, I'd be glad to give you a chapter in the > Library Reference! ;-) > I would like to help, but I don't know La(tex). Is that needed? Anyway, there is another issue. I can try to document the turtle.py module, but I don't understand everything myself ;-). But what do I have to know if I want do document anything? Or is that somewhere at the page (sorry, no internet connection this evening). groeten, Gerrit. -- The Dutch Linuxgames homepage: http://linuxgames.nl.linux.org Personal homepage: http://www.nl.linux.org/~gerrit/ Discoverb is a python program (in several languages) which tests the words you learned by asking it. Homepage: http://www.nl.linux.org/~gerrit/discoverb/