a website information gathering script
Emile van Sebille
emile at fenx.com
Fri Mar 22 15:45:30 EST 2002
Scott Hathaway
> I have a simple python script which gathers information about a
website
> and produces an html report about what it finds. The way it works is
> very clunky and I would appreciate some feedback and help improving
it.
----------
if (file.endswith('html')==1) or (file.endswith('htm')==1) or
(file.endswith('asp')==1) or (file.endswith('php')==1) or
(file.endswith('js')==1) or (file.endswith('vbs')==1) or
(file.endswith('inc')==1) or (file.endswith('css')==1) :
checkedFiles.append(file)
Why not:
if file.split('.')[-1] in
('html','htm','asp','php','js','vbs','inc','css'):
checkedFiles.append(file)
----------
Concatenating strings is expensive so you might eliminate it, changing
to lists:
a = "<html><head>\n"
a += "<basefont ...
becomes:
a =[ """<html><head>
<basefont ..."""]
and changing cases of a += ... to a.append(...)
and perhaps even changing from:
a += "<tr><td...-1'>" + str(len(referencedFiles)) +
"</font></td></tr>\n"
a += "<tr><td...-1'>" + str(len(unReferencedFiles)) +
"</font></td></tr>\n"
to:
a.extend(["<tr><td...-1'>"
,str(len(referencedFiles)),"</font></td></tr>\n"])
although I'm not sure if that would be preferred over:
a.append("<tr><td...-1'>%s</font></td></tr>\n" %
str(len(referencedFiles)))
finally, do the string conversion something like:
html = str(a) + str(b)
becomes:
html = ''.join(a+b)
----------
You really, really want this to work, don't you? ;-)
try:
fc = open(cFile,'r').read()
except:
fc = open(cFile,'r').read()
----------
I'm not real strong on regular expressions so check my code, but it
seems to me that the code that sets found could be something like:
(posting this is a good way to entice someone who knows into cleaning it
up... ;-) )
>>> for afile in allFiles:
... if re.search('''['"\\/]'''+'file', afile): found = 1
...
HTH,
Emile van Sebille
emile at fenx.com
---------
More information about the Python-list
mailing list