using python to parse md5sum list

Michael Hoffman cam.ac.uk at mh391.invalid
Sun Mar 6 04:49:29 EST 2005


Ben Rf wrote:

> I'm new to programming and i'd like to write a program that will parse
> a list produced by md5summer and give me a report in a text file on
> which md5 sums appear more than once and where they are located.

This should do the trick:

"""
import fileinput

md5s = {}
for line in fileinput.input():
     md5, filename = line.rstrip().split()
     md5s.setdefault(md5, []).append(filename)

for md5, filenames in md5s.iteritems():
     if len(filenames) > 1:
         print "\t".join(filenames)
"""

Put this in md5dups.py and you can then use
md5dups.py [FILE]... to find duplicates in any of the files you
specify. They'll then be printed out as a tab-delimited list.

Key things you might want to look up to understand this:

* the dict datatype
* dict.setdefault()
* dict.iteritems()
* the fileinput module
-- 
Michael Hoffman



More information about the Python-list mailing list