[Tutor] Find files without __doc__ strings

spir denis.spir at free.fr
Sun May 17 08:40:12 CEST 2009


Le Sat, 16 May 2009 21:46:02 -0400,
David <david at abbottdavid.com> s'exprima ainsi:

> I am doing an exercise in Wesley Chun's book. Find files in the standard 
>   library modules that have doc strings. Then find the ones that don't, 
> "the shame list". I came up with this to find the ones with;
> #!/usr/bin/python
> import os
> import glob
> import fileinput
> import re
> 
> pypath = "/usr/lib/python2.6/"
> fnames = glob.glob(os.path.join(pypath, '*.py'))
> 
> def read_doc():
>      pattern = re.compile('"""*\w')
>      for line in fileinput.input(fnames):
>          if pattern.match(line):
>              print 'Doc String Found: ', fileinput.filename(), line
> 
> read_doc()

It seems to me that your approach is moderately wrong ;-)

> There must have been an easier way :)

Not sure. As I see it the problem is slightly more complicated. A module doc is any triple-quoted string placed before any code. But it must be closed, too.
You'll have to skip blank and comment lines, then check whether the rest matches a docstring. It could be done with a single complicated pattern, but you could also go for it step by step.
Say I have a file 'dummysource.py' with the following text:
==============
# !/usr/bin/env python
# coding: utf8

# comment
# ''' """

''' foo module
	doc
	'''
def foofunc():
	''' foofuncdoc '''
	pass
==============

Then, the following doc-testing code
==============
import re
doc = re.compile(r'(""".+?""")|(\'\'\'.+?\'\'\')', re.DOTALL)

def checkDoc(sourceFileName):
    sourceFile = file(sourceFileName, 'r')
    # move until first 'code' line
    while True:
        line = sourceFile.readline()
        strip_line = line.strip()
        print "|%s|" % strip_line
        if (strip_line != '') and (not strip_line.startswith('#')):
            break
    # check doc (keep last line read!)
    source = line + sourceFile.read()
    result = doc.match(source)
    if result is not None:
        print "*** %s *******" % sourceFileName
        print result.group()
        return True
    else:
        return False

sourceFile = file("dummysource.py",'r')
print checkDoc(sourceFile)
==============

will output:

==============
|# !/usr/bin/env python|
|# coding: utf8|
||
|# comment|
|# ''' """|
||
|''' foo module|
*** dummysource.py *******
''' foo module
	doc
	'''
True
==============

It's just for illustration; you can probably make things simpler or find a better way.

> Now I have a problem, I can not figure out how to compare the fnames 
> with the result fileinput.filename() and get a list of any that don,t 
> have doc strings.

You can use a func like the above one to filter out (or in) files that answer yes/no to the test.
I would start with a list of all files, and just populate 2 new lists for "shame" and "fame" files ;-) according to the result of the test.

You could use list comprehension syntax, too:
    fameFileNames = [fileName for fileName in fileNames if checkDoc(fileName)]
But if you do this for shame files too, then every file gets tested twice.

> thanks

Denis
------
la vita e estrany


More information about the Tutor mailing list