comparing huge files

s99999999s2003 at yahoo.com s99999999s2003 at yahoo.com
Wed Mar 15 21:13:25 EST 2006


hi
i wrote some code to compare 2 files. One is the base file, the other
file i got from somewhere. I need to compare this file against the
base,
eg base file
abc
def
ghi

eg another file
abc
def
ghi
jkl

after compare , the base file will be overwritten with "jkl". Also both
files tend to grow towards > 20MB ..

Here is my code...using difflib.

pat = re.compile(r'^\+') ## i want to get rid of the '+' from the
difflib output...
def difference(filename,basename):
        import difflib
        base = open(basename)
        a = base.readlines()
        input = open(filename)
        b = input.readlines()
        d = difflib.Differ()
        diff = list(d.compare(a, b))
        if len(diff) > 0:
                os.remove(basename)
                o = open(basename, "aU")
                for i in diff:
                        if pat.search(i):
                                i = i.lstrip("\+ ")
                                o.writelines(i)  ## write a new base
file...
                o.close()
        g = open(basename)
        return g.readlines()

Whenever the 2 files get very large, i find that it's very slow
comparing...any good advice to speed things up.? I thought of removing
readlines() method, and use line by line compare. Is it a better way?
thanks




More information about the Python-list mailing list