Strategy/ Advice for How to Best Attack this Problem?

Thu Apr 2 19:43:01 EDT 2015

On Thursday, April 2, 2015 at 5:11:20 PM UTC-4, Dave Angel wrote:
> On 04/02/2015 09:06 AM, Saran A wrote:
> 
> >
> > Thanks for your help on this homework assignment. I started from scratch last night. I have added some comments that will perhaps help clarify my intentions and my thought process. Thanks again.
> >
> > from __future__ import print_function
> 
> I'll just randomly comment on some things I see here.  You've started 
> several threads, on two different forums, so it's impractical to figure 
> out what's really up.
> 
> 
>     <snip some code I'm not commenting on>
> >
> > #Helper Functions for the Success and Failure Folder Outcomes, respectively
> >
> >      def file_len(filename):
> 
> This is an indentation error, as you forgot to start at the left margin
> 
> >          with open(filename) as f:
> >              for i, l in enumerate(f):
> >                  pass
> >              return i + 1
> >
> >
> >      def copy_and_move_File(src, dest):
> 
> ditto
> 
> >          try:
> >              shutil.rename(src, dest)
> 
> Is there a reason you don't use the move function?  rename won't work if 
> the two directories aren't on the same file system.
> 
> >          # eg. src and dest are the same file
> >          except shutil.Error as e:
> >              print('Error: %s' % e)
> >          # eg. source or destination doesn't exist
> >          except IOError as e:
> >              print('Error: %s' % e.strerror)
> >
> >
> > # Call main(), with a loop that calls # validate_files(), with a sleep after each pass. Before, my present #code was assuming all filenames come directly from the commandline.  There was no actual searching #of a directory.
> >
> > # I am assuming that this is appropriate since I moved the earlier versions of the files.
> > # I let the directory name be the argument to main, and let main do a dirlist each time through the loop,
> > # and pass the corresponding list to validate_files.
> >
> >
> > path = "/some/sample/path/"
> > dirlist = os.listdir(path)
> > before = dict([(f, None) for f in dirlist)
> >
> > #####Syntax Error?     before = dict([(f, None) for f in dirlist)
> >                                               ^
> > SyntaxError: invalid syntax
> 
> Look at the line in question. There's an unmatched set of brackets.  Not 
> that it matters, since you don't need these 2 lines for anything.  See 
> my comments on some other forum.
> 
> >
> > def main(dirlist):
> 
> bad name for a directory path variable.
> 
> >      while True:
> >          time.sleep(10) #time between update check
> 
> Somewhere inside this loop, you want to obtain a list of files in the 
> specified directory.  And you want to do something with that list.  You 
> don't have to worry about what the files were last time, because 
> presumably those are gone.  Unless in an unwritten part of the spec, 
> you're supposed to abort if any filename is repeated over time.
> 
> 
> >      after = dict([(f, None) for f in dirlist)
> >      added = [f for f in after if not f in before]
> >      if added:
> >          print('Sucessfully added new file - ready to validate')
> >        ####add return statement here to pass to validate_files
> > if __name__ == "__main__":
> >      main()
> 
> You'll need an argument to call main()
> 
> >
> >
> > #check for record time and record length - logic to be written to either pass to Failure or Success folder respectively
> >
> > def validate_files():
> 
> Where are all the parameters to this function?
> 
> >      creation = time.ctime(os.path.getctime(added))
> >      lastmod = time.ctime(os.path.getmtime(added))
> >
> >
> >
> > #Potential Additions/Substitutions  - what are the implications/consequences for this
> >
> > def move_to_failure_folder_and_return_error_file():
> >      os.mkdir('Failure')
> >      copy_and_move_File(filename, 'Failure')
> >      initialize_logger('rootdir/Failure')
> >      logging.error("Either this file is empty or there are no lines")
> >
> >
> > def move_to_success_folder_and_read(f):
> >      os.mkdir('Success')
> >      copy_and_move_File(filename, 'Success')
> >      print("Success", f)
> >      return file_len()
> >
> > #This simply checks the file information by name------> is this needed anymore?
> >
> > def fileinfo(file):
> >      filename = os.path.basename(f)
> >      rootdir = os.path.dirname(f)
> >      filesize = os.path.getsize(f)
> >      return filename, rootdir, filesize
> >
> > if __name__ == '__main__':
> >     import sys
> >     validate_files(sys.argv[1:])
> >
> > # -- end of file
> >
> 
> 
> -- 
> DaveA

@DaveA

I debugged and rewrote everything. Here is the full version. Feel free to tear this apart. The homework assignment is not due until tomorrow, so I am currently also experimenting with pyinotify as well. I do have questions regarding how to make this function compatible with the ProcessEvent Class. I will create another post for this. 

What would you advise in regards to renaming the inaptly named dirlist?

# # # Without data to examine here, I can only guess based on this requirement's language that 
# # fixed records are in the input.  If so, here's a slight revision to the helper functions that I wrote earlier which
# # takes the function fileinfo as a starting point and demonstrates calling a function from within a function.  
# I tested this little sample on a small set of files created with MD5 checksums.  I wrote the Python in such a way as it 
# would work with Python 2.x or 3.x (note the __future__ at the top).

# # # There are so many wonderful ways of failure, so, from a development standpoint, I would probably spend a bit 
# # more time trying to determine which failure(s) I would want to report to the user, and how (perhaps creating my own Exceptions)

# # # The only other comments I would make are about safe-file handling.

# # #   #1:  Question: After a user has created a file that has failed (in
# # #        processing),can the user create a file with the same name?
# # #        If so, then you will probably want to look at some sort
# # #        of file-naming strategy to avoid overwriting evidence of
# # #        earlier failures.

# # # File naming is a tricky thing.  I referenced the tempfile module [1] and the Maildir naming scheme to see two different 
# # types of solutions to the problem of choosing a unique filename.

## I am assuming that all of my files are going to be specified in unicode  

## Utilized Spyder's Scientific Computing IDE to debug, check for indentation errors and test function suite

from __future__ import print_function

import os.path
import time
import logging

def initialize_logger(output_dir):
    logger = logging.getLogger()
    logger.setLevel(logging.DEBUG)

    # create console handler and set level to info
    handler = logging.StreamHandler()
    handler.setLevel(logging.INFO)
    formatter = logging.Formatter("%(levelname)s - %(message)s")
    handler.setFormatter(formatter)
    logger.addHandler(handler)

    # create error file handler and set level to error
    handler = logging.FileHandler(os.path.join(output_dir, "error.log"),"w", encoding=None, delay="true")
    handler.setLevel(logging.ERROR)
    formatter = logging.Formatter("%(levelname)s - %(message)s")
    handler.setFormatter(formatter)
    logger.addHandler(handler)

    # create debug file handler and set level to debug
    handler = logging.FileHandler(os.path.join(output_dir, "all.log"),"w")
    handler.setLevel(logging.DEBUG)
    formatter = logging.Formatter("%(levelname)s - %(message)s")
    handler.setFormatter(formatter)
    logger.addHandler(handler)

#Returns filename, rootdir and filesize 

def fileinfo(f):
    filename = os.path.basename(f)
    rootdir = os.path.dirname(f)  
    filesize = os.path.getsize(f)
    return filename, rootdir, filesize

#returns length of file
def file_len(f):
    with open(f) as f:
        for i, l in enumerate(f):
            pass
            return i + 1
#attempts to copy file and move file to it's directory
def copy_and_move_file(src, dest):
    try:
        os.rename(src, dest)
        # eg. src and dest are the same file
    except IOError as e:
        print('Error: %s' % e.strerror)

path = "."
dirlist = os.listdir(path)

def main(dirlist):   
    before = dict([(f, 0) for f in dirlist])
    while True:
        time.sleep(1) #time between update check
    after = dict([(f, None) for f in dirlist])
    added = [f for f in after if not f in before]
    if added:
        f = ''.join(added)
        print('Sucessfully added %s file - ready to validate') %()
        return validate_files(f)
    else:
        return move_to_failure_folder_and_return_error_file(f)

def validate_files(f):
    creation = time.ctime(os.path.getctime(f))
    lastmod = time.ctime(os.path.getmtime(f))
    if creation == lastmod and file_len(f) > 0:
        return move_to_success_folder_and_read(f)
    if file_len < 0 and creation != lastmod:
        return move_to_success_folder_and_read(f)
    else:
        return move_to_failure_folder_and_return_error_file(f)

#Potential Additions/Substitutions

def move_to_failure_folder_and_return_error_file():
    filename, rootdir, lastmod, creation, filesize = fileinfo(file)  
    os.mkdir('Failure')
    copy_and_move_file( 'Failure')
    initialize_logger('rootdir/Failure')
    logging.error("Either this file is empty or there are no lines")

def move_to_success_folder_and_read():
    filename, rootdir, lastmod, creation, filesize = fileinfo(file)  
    os.mkdir('Success')
    copy_and_move_file(rootdir, 'Success') #file name
    print("Success", file)
    return file_len(file)

if __name__ == '__main__':
   main(dirlist)