recursive file editing

Peter Otten __peter__ at web.de
Sun Apr 4 06:11:25 EDT 2004


TaeKyon wrote:

> I'm a python newbie; here are a few questions relative to a
> problem I'm trying to solve; I'm wandering if python is the best
> instrument or if awk or a mix of bash and sed would be better:
> 
> 1) how would I get recursively descend
> through all files in all subdirectories of
> the one in which my script is called ?
> 
> 2) each file examined should be edited; IF a string of this type is found
> 
> foo.asp=dev?bar (where bar can be a digit or an empty space)
> 
> it should always be substituted with this string
> 
> foo-bar.html (if bar is an empty space the new string is foo-.html)
> 
> 3) the names of files read may themselves be of the sort foo.asp=dev?bar;
> the edited output file should also be renamed according to the same rule
> as above ... or would this be better handled by a bash script ?
> 
> Any hints appreciated
> 

The following code comes with no warranties. Be sure to backup valuable data
before trying it. You may need to edit the regular expressions. Call the
script with the directory you want to process.

Peter

import os, re, sys

class Path(object):
    def __init__(self, folder, name):
        self.folder = folder
        self.name = name

    def _get_path(self):
        return os.path.join(self.folder, self.name)
    path = property(_get_path)

    def rename(self, newname):
        if self.name != newname:
            os.rename(self.path, os.path.join(self.folder, newname))
            self.name = newname

    def processContents(self, operation):
        data = file(self.path).read()
        newdata = operation(data)
        if data != newdata:
            file(self.path, "w").write(newdata)

    def __str__(self):
        return self.path

def files(rootfolder):
    for folder, folders, files in os.walk(rootfolder):
        for name in files:
            yield Path(folder, name)

fileExpr = re.compile(r"^(.+?)\.asp\=dev\?(.*)$")
filePattern = r"\1-\2.html"

textExpr = re.compile(r"([/\'\"])(.+?)\.asp\=dev\?(.*?)([\'\"])")
textPattern = r"\1\2-\3.html\4"

if __name__ == "__main__":
    for f in files(sys.argv[1]):
        f.rename(fileExpr.sub(filePattern, f.name))
        f.processContents(lambda s: textExpr.sub(textPattern, s))




More information about the Python-list mailing list