Code Review

Chris Torek nospam at torek.net
Wed May 25 03:37:51 EDT 2011


In article
<37ba7b40-3663-4094-b507-696fc598bf48 at l26g2000yqm.googlegroups.com>
ad  <adsquaired at gmail.com> wrote:
>Please review the code pasted below. I am wondering what other ways
>there are of performing the same tasks. ...  I imagine one enhancement
>would be to create a function out of some of this.

Indeed -- "recommended styles" include defining a main()
function and calling main at the bottom, e.g., with:

    if __name__ == '__main__':
        main()

or:

    if __name__ == '__main__':
        main(sys.argv)

Something has also double-spaced your code; I will undo that below.

>import os
>import time
>import shutil
>import argparse

So far so good.

Let me start by putting the next parts into main().  I will use
the no-argument format since we can simply let parser.parse_args()
fetch sys.argv the same way you did (we could def main(argv) and
pass argv; this is sometimes useful for testing purposes, but here
it makes little difference one way or another).

def main():
>    CurrentTime = time.time()
>    epocDay = 86400   # seconds

(You could write 24 * 60 * 60, but again this is sort of a six-of-one
half-a-dozen-of-the-other situation.)  What is not clear is why you
are setting these now, rather than later, closer to where they are
going to be used.

Many style guides, including Guido's and pylint's, dislike
UppercaseVariableName and camelCase style names within functions.
(UppercaseNames are reserved to classes.)  I am leaving these alone
for now though.

>    parser = argparse.ArgumentParser(description = "Delete files and
>    folders in a directory N days old", add_help=False,
>    prog='directorycleaner', usage='%(prog)s 7 c:\\temp')

Line wrap has done bad things to this.  Still, there is something
odd about it other than the line wrap:

  - sometimes you write:
        param_name = value
    but sometimes you use:
        param_name=value
    and in one case you even use both:
        usage= [string]

  - you switch back and forth between single and double quoted
    strings, without much reason.

Consistency is not *required* but it is generally a good idea.
(This is also minor, like deciding between 86400 or 24*60*60.
Eventually, though, lots of minor things add up.)

(I have to admit here that I tend to switch back and forth on
quotes too. :-) )

Another "sixes" case occurs here with the backslashes: to get
a literal backslash in a string, you can use "raw strings".  Here
it makes no difference but I will go ahead and do it just for
illustration:

    parser = argparse.ArgumentParser(
        description='Delete files and folders in a directory N days old',
        add_help=False,
        prog='directorycleaner',
        usage=r'%(prog)s 7 c:\temp')

Finally, I am not sure why you do not want to allow a -h / --help 
option (but if you take out the add_help=False, it's probably best
to take out the call to parser.print_help() too), and there is no
need to supply a usage= argument at all -- argparse() will build
one for you.

>    parser.add_argument('days', type=int, help="Numeric value: delete
>    files and folders older then N days")
>    parser.add_argument('directory', help="delete files and folders in
>    this directory")

(Again, line wrap has broken these; the fixes are obvious so I skip
over them here.)

>    parser.print_help()
>    args = parser.parse_args()

(So far so good, although again, you probably want to remove the call
to print_help() if you allow argparse to add a -h / --help option,
at least.)

>    dictKeys = (vars(args))

There is no *need* to do this, although it is documented as allowed.
I prefer to just use args.<field> myself:

>    HowManyDays = dictKeys['days']
>    WhatDirectory = dictKeys['directory']

so this would become:

    HowManyDays = args.days
    WhatDirectory = args.directory

>    print (HowManyDays)
>    print (WhatDirectory)

These are presumably debug statements and should be removed, but
until then, it might be good to prefix the output with what is
being printed (i.e., a debug message).  (I have taken them out
of my copy, for output shown below.)

(In a fancier program, you could use the logging module and
logging.debug().)

>    DaysToDelete = HowManyDays * epocDay

Right before this would be a good place to create epocDay.

>    dirExists = os.path.exists(WhatDirectory)
>    if dirExists == False: print ("The directory is missing")

An explicit "if expr == False" is generally a bad idea -- if an
expression can be "considered boolean" (and the return value of
os.path.exists certainly can), just write "if not expr".

Most style guides suggest putting subsequent statements on new
lines, rather than right after the ":".

Checking that the directory exists seems reasonable enough.  However,
after printing that it does not, you continue on with code that is
going to immediately raise an OSError exception:

>    DirListing = os.listdir(WhatDirectory)

In general, it is better to try to do the operation, and catch the
failure and do something about it at that point, than to test to
see if the operation is going to succeed.  (Among other things,
this avoids a "race" between program 1 that says "does some directory
exist" and program 2 that says "delete the directory".  If program
1 "wins" this race, the directory does exist at the point of the
test, then program 2 deletes it, then program 1 goes on to access
the now-deleted directory ... and crashes.)

I am using a Unix-like system so what I get may not be quite the
same as what you would get on a Windows-like system, but:

    % cd /tmp
    % python foo.py 1 /tmp/nosuchdir
    The directory is missing
    Traceback (most recent call last):
     ...
    OSError: [Errno 2] No such file or directory: '/tmp/nosuchdir'
    %

More significantly, consider this:

    % python foo.py 1 /tmp/foo.py
    Traceback (most recent call last):
     ...
    OSError: [Errno 20] Not a directory: '/tmp/foo.py'
    %

So instead of the three previous lines, consider:

    try:
        DirListing = os.listdir(WhatDirectory)
    except OSError as err:
        sys.exit("can't read %s: %s" % (WhatDirectory, err))

(you will need to "import sys", and I am using an older version of
Python and the "except OSError, err" syntax, but the effect is the
same).  Now the second example results in:

    % python foo.py 1 /tmp/foo.py
    can't read /tmp/foo.py: [Errno 20] Not a directory: '/tmp/foo.py'
    %

>    for files in DirListing:
>        # Get the absolute path of the file name
>        abspath = (os.path.join(WhatDirectory, files))

This is not necessarily an absolute path -- for instance, if the
program is run as:

    python foo.py 7 rel\ative\path

the joined file names (on a Windows-like system) will be things
like "rel\ative\path\file.txt" and so on.  I would suggest shortening
the variable name to just "path".

The outermost set of parentheses are unnecessary, too.

>        # Get the current creation time of the file in epoc format
>        # (midnight 1/1/1970)
>        FileCreationTime = (os.path.getctime(abspath))

Beware: on Unix-like systems, this gets a "time of last change"
rather than a "time of create".  Even if you are sure you will use
Windows you may wish to use the file's mtime (time of last
"modification"; on Unix-like systems, the distinction between a
"modification" and a "change" is that "change" covers alterations
to the file's meta-data as well as the data, e.g., "chmod a-w file",
making it read-only, changes its ctime but not its mtime, while
"echo foo >> file" changes both its ctime and its mtime -- provided
it is not read-only, of course :-) ).

Again, the outermost parentheses here are unnecessary.

>        # time.ctime converts epoch to a normal date
>        #print (time.ctime(CurrentTime))
>        # Get the date from seven days ago
>        WeekOldFileDate = CurrentTime - DaysToDelete
>        #print (CurrentTime)
>        #print (FileCreationTime)
>        #print (WeekOldFileDate)
>
>        #If the file is older than seve days doe something

Apparently, the program has evolved somewhat: originally you had
"seven days" hardcoded, now you have a variable number.  The
comments, however, have not changed -- and the final variable name
is no longer appropriate.

It is probably also time to ditch the commented-out debug print
statements (and fix the comments, including the typo on the last
one above).

>        if FileCreationTime < WeekOldFileDate:
>            #check if the object is a file
>            if os.path.isfile(abspath): os.remove(abspath)
>            # It is not a file it is a directory
>            elif os.path.isdir(abspath): shutil.rmtree(abspath)

Again, the comment and code do not quite agree: the comment says
"if it is not a file it *is* a directory" but the code says "if it
is not a file, check to see if it is a directory", which leaves
open the possibility that it is some other kind of entity (this is
definitely possible on a Unix-like system, where it could be a
socket, symbolic link, or device node, for instance).

In this particular program, written the way it is, there is no
actual benefit (yet) to doing this, but I suggest moving the guts
of the "clean out a directory" process into a function.  What
this allows is the ability to list more than one directory,
provided of course you also change the argument parser a bit.

Having put this all together (and neutered the actual file-or-directory
removing code) gives me the code below.  There are still plenty of
things you could do with it -- for instance, exiting partway through
processing a list of directories if one in the middle does not exist
is perhaps not optimal:

    % python foo.py 7 /tmp/dir1 /tmp/oops /tmp/dir2

(where /tmp/dir1 and /tmp/dir2 do exist, but /tmp/oops does not)
will clean out /tmp/dir1 but then exit without ever processing
/tmp/dir2.  (There are lots of ways to handle this; you would
have to choose one and implement it.)  Or, instead of the kind of
processing done here, you could generalize it into a Unix-like
"find" command.  (Perhaps with a less-ugly syntax. :-) )  The
"find" command can do what this script does:

    find DIRLIST -ctime +N ( -type d -o -type f ) -exec rm -rf {} \;

but can also a great deal more since (a) it has many other options
than just -ctime, and (b) -exec will execute any arbitrary command.

                ---------------------------

import os
import time
import shutil
import argparse
import sys

def main():
    """
    main program: parse arguments, and clean out directories.
    """
    parser = argparse.ArgumentParser(
        description="Delete files and folders in a directory N days old",
        prog="directorycleaner")
    parser.add_argument("days", type=int,
        help="Numeric value: delete files and folders older than N days")
    parser.add_argument("directory", nargs="+",
        help="delete files and folders in this directory")

    args = parser.parse_args()

    for dirname in args.directory:
        clean_dir(dirname, args.days)

def clean_dir(dirname, n_days):
    """
    Clean one directory of files / subdirectories older than
    the given number of days.
    """
    time_to_live = n_days * 86400 # 86400 = seconds-per-day
    current_time = time.time()

    try:
        contents = os.listdir(dirname)
    except OSError, err:
        sys.exit("can't read %s: %s" % (dirname, err))
    for filename in contents:
        # Get the path of the file name
        path = os.path.join(dirname, filename)
        # Get the creation time of the file
        # NOTE: this only works on Windows-like systems
        when_created = os.path.getctime(path)
        # If the file/directory has expired, remove it
        if when_created + time_to_live < current_time:
            if os.path.isfile(path):
                print "os.remove(%s)" % path
            # It is not a file it is a directory
            elif os.path.isdir(path):
                print "shutil.rmtree(%s)" % path

if __name__ == "__main__":
    main()
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)      http://web.torek.net/torek/index.html



More information about the Python-list mailing list