file handling

Alex Martelli alex at magenta.com
Thu Aug 10 07:49:34 EDT 2000


<tjernstrom at my-deja.com> wrote in message
news:8mtt9m$90j$1 at nnrp1.deja.com...
> I'm new to Python and have trouble finding an answer to a simple
> question.
>
> I have written a small script that process html-files. All I need to do
> is find a way to send all html files in a directory to this file while
> ignoring the rest of the files in this directory (and do the same for
> all the subdirectories).
> Here's my short script:
>
> import re
> def ProcessFile(file, spath, tpath):
    [snip]
> I'd really apreciate help or some tips on where in the documentation to
> look for answers.

The Library Reference is what you want -- it's both online and also
part of the Python installation.

If it weren't for the 'all the subdirectories' part, then:

def processAll(spath, tpath):
    import glob
    import os.path
    for file in glob.glob(os.path.join(spath,"*.html")):
        ProcessFile(file, '', tpath)

might work.  Note that glob.glob() conserves the path,
and you probably don't want to be troubled to have to
os.path.split it again, whence the '' 2nd arg to ProcessFile.
[ProcessFile would also be well-advised to use os.path
rather than string-level operations, by the way].

Since you need to look to all subdirectories too, you'll
need os.path.walk to walk the tree and fnmatch.fnmatch
to do the selection.  Here's a decent approach:

import os.path
import fnmatch

def ishtml(file):
    return fnmatch.fnmatch(file,'*.html')
def processadir(tpath,path,files):
    for file in filter(ishtml,files):
        ProcessFile(file,path,tpath)
def processAll(spath,tpath):
    os.path.walk(spath,processadir,tpath)


Alex






More information about the Python-list mailing list