[Tutor] Find files without __doc__ strings
spir
denis.spir at free.fr
Sun May 17 08:40:12 CEST 2009
Le Sat, 16 May 2009 21:46:02 -0400,
David <david at abbottdavid.com> s'exprima ainsi:
> I am doing an exercise in Wesley Chun's book. Find files in the standard
> library modules that have doc strings. Then find the ones that don't,
> "the shame list". I came up with this to find the ones with;
> #!/usr/bin/python
> import os
> import glob
> import fileinput
> import re
>
> pypath = "/usr/lib/python2.6/"
> fnames = glob.glob(os.path.join(pypath, '*.py'))
>
> def read_doc():
> pattern = re.compile('"""*\w')
> for line in fileinput.input(fnames):
> if pattern.match(line):
> print 'Doc String Found: ', fileinput.filename(), line
>
> read_doc()
It seems to me that your approach is moderately wrong ;-)
> There must have been an easier way :)
Not sure. As I see it the problem is slightly more complicated. A module doc is any triple-quoted string placed before any code. But it must be closed, too.
You'll have to skip blank and comment lines, then check whether the rest matches a docstring. It could be done with a single complicated pattern, but you could also go for it step by step.
Say I have a file 'dummysource.py' with the following text:
==============
# !/usr/bin/env python
# coding: utf8
# comment
# ''' """
''' foo module
doc
'''
def foofunc():
''' foofuncdoc '''
pass
==============
Then, the following doc-testing code
==============
import re
doc = re.compile(r'(""".+?""")|(\'\'\'.+?\'\'\')', re.DOTALL)
def checkDoc(sourceFileName):
sourceFile = file(sourceFileName, 'r')
# move until first 'code' line
while True:
line = sourceFile.readline()
strip_line = line.strip()
print "|%s|" % strip_line
if (strip_line != '') and (not strip_line.startswith('#')):
break
# check doc (keep last line read!)
source = line + sourceFile.read()
result = doc.match(source)
if result is not None:
print "*** %s *******" % sourceFileName
print result.group()
return True
else:
return False
sourceFile = file("dummysource.py",'r')
print checkDoc(sourceFile)
==============
will output:
==============
|# !/usr/bin/env python|
|# coding: utf8|
||
|# comment|
|# ''' """|
||
|''' foo module|
*** dummysource.py *******
''' foo module
doc
'''
True
==============
It's just for illustration; you can probably make things simpler or find a better way.
> Now I have a problem, I can not figure out how to compare the fnames
> with the result fileinput.filename() and get a list of any that don,t
> have doc strings.
You can use a func like the above one to filter out (or in) files that answer yes/no to the test.
I would start with a list of all files, and just populate 2 new lists for "shame" and "fame" files ;-) according to the result of the test.
You could use list comprehension syntax, too:
fameFileNames = [fileName for fileName in fileNames if checkDoc(fileName)]
But if you do this for shame files too, then every file gets tested twice.
> thanks
Denis
------
la vita e estrany
More information about the Tutor
mailing list