a website information gathering script

Fri Mar 22 15:45:30 EST 2002

Scott Hathaway
> I have a simple python script which gathers information about a
website
> and produces an html report about what it finds.  The way it works is
> very clunky and I would appreciate some feedback and help improving
it.

----------

if (file.endswith('html')==1) or (file.endswith('htm')==1) or
(file.endswith('asp')==1) or (file.endswith('php')==1) or
(file.endswith('js')==1) or (file.endswith('vbs')==1) or
(file.endswith('inc')==1) or (file.endswith('css')==1) :
            checkedFiles.append(file)

Why not:

if file.split('.')[-1] in
('html','htm','asp','php','js','vbs','inc','css'):
    checkedFiles.append(file)

----------

Concatenating strings is expensive so you might eliminate it, changing
to lists:

a = "<html><head>\n"
a += "<basefont ...

becomes:

a =[ """<html><head>
<basefont ..."""]

and changing cases of a += ... to a.append(...)

and perhaps even changing from:

a += "<tr><td...-1'>"  + str(len(referencedFiles)) +
"</font></td></tr>\n"
a += "<tr><td...-1'>"  + str(len(unReferencedFiles)) +
"</font></td></tr>\n"

to:

a.extend(["<tr><td...-1'>"
,str(len(referencedFiles)),"</font></td></tr>\n"])

although I'm not sure if that would be preferred over:

a.append("<tr><td...-1'>%s</font></td></tr>\n" %
str(len(referencedFiles)))

finally, do the string conversion something like:

html = str(a) + str(b)

becomes:

html = ''.join(a+b)

----------

You really, really want this to work, don't you?  ;-)

    try:
        fc = open(cFile,'r').read()
    except:
        fc = open(cFile,'r').read()

----------

I'm not real strong on regular expressions so check my code, but it
seems to me that the code that sets found could be something like:
(posting this is a good way to entice someone who knows into cleaning it
up... ;-) )

>>> for afile in allFiles:
...     if re.search('''['"\\/]'''+'file', afile): found = 1
...

HTH,

Emile van Sebille
emile at fenx.com

---------