os.walk/list

Peter Otten __peter__ at web.de
Sun Mar 20 04:28:38 EDT 2011


ecu_jon wrote:

> so i am trying to add md5 checksum calc to my file copy stuff, to make
> sure the source and dest. are same file.
> i implemented it fine with the single file copy part. something like :
> for files in sourcepath:
>         f1=file(files ,'rb')
>         try:
>             shutil.copy2(files,
> os.path.join(destpath,os.path.basename(files)))
>         except:
>             print "error file"
>         f2=file(os.path.join(destpath,os.path.basename(files)), 'rb')
>         truth = md5.new(f1.read()).digest() ==
> md5.new(f2.read()).digest()
>         if truth == 0:
>             print "file copy error"
> 
> this worked swimmingly. i moved on to my backupall function, something
> like
> for (path, dirs, files) in os.walk(source):
>         #os.walk drills down thru all the folders of source
>         for fname in dirs:
>            currentdir = destination+leftover
>             try:
>                os.mkdir(os.path.join(currentdir,fname),0755)
>             except:
>                 print "error folder"
>         for fname in files:
>             leftover = path.replace(source, '')
>             currentdir = destination+leftover
>             f1=file(files ,'rb')
>             try:
>                 shutil.copy2(os.path.join(path,fname),
>                              os.path.join(currentdir,fname))
>                 f2 = file(os.path.join(currentdir,fname,files))
>             except:
>                 print "error file"
>             truth = md5.new(f1.read()).digest() ==
> md5.new(f2.read()).digest()
>             if truth == 0:
>                 print "file copy error"
> 
> but here, "fname" is a list, not a single file.i didn't really want to
> spend a lot of time on the md5 part. thought it would be an easy add-
> on. i don't really want to write the file names out to a list and
> parse through them one a time doing the calc, but it sounds like i
> will have to do something like that.

If you have something working for one file, don't copy the code into the 
os.walk() for-loop, put it into a function, say:

def safe_copy(sourcefile, destfolder):
    # your code

Then call that thoroughly tested function from within the os.walk() loop

for path, folders, files in os.walk(sourceroot):
    destfolder = ... # os.path.relpath() might help here
    # ... (make subdirectories)
    for name in files:
        sourcefile = os.path.join(path, name)
        safe_copy(sourcefile, destfolder)

If you find a bug in safe_copy() you'll only have to fix it in one place.
Also, you can test it with a single file which should be easier and faster 
than processing a whole directory tree.

Generally speaking breaking code into small functions that can be tested 
individually is a powerful technique. And you don't have to stop here, you 
can break safe_copy() into

def safe_copy(sourcefile, destfolder):
    destfile = ...
    copyfile(sourcefile, destfile)
    if not equal_content(sourcefile, destfile):
        # print a warning or raise an exception

Sometimes you'll even find that the smaller more specialized routines 
already exist in the standard library.




More information about the Python-list mailing list