Strategy/ Advice for How to Best Attack this Problem?

Thu Apr 2 09:06:53 EDT 2015

On Wednesday, April 1, 2015 at 7:52:27 PM UTC-4, Dave Angel wrote:
> On 04/01/2015 09:43 AM, Saran A wrote:
> > On Tuesday, March 31, 2015 at 9:19:37 AM UTC-4, Dave Angel wrote:
> >> On 03/31/2015 07:00 AM, Saran A wrote:
> >>
> >>   > @DaveA: This is a homework assignment. .... Is it possible that you
> >> could provide me with some snippets or guidance on where to place your
> >> suggestions (for your TO DOs 2,3,4,5)?
> >>   >
> >>
> >>
> >>> On Monday, March 30, 2015 at 2:36:02 PM UTC-4, Dave Angel wrote:
> >>
> >>>>
> >>>> It's missing a number of your requirements.  But it's a start.
> >>>>
> >>>> If it were my file, I'd have a TODO comment at the bottom stating known
> >>>> changes that are needed.  In it, I'd mention:
> >>>>
> >>>> 1) your present code is assuming all filenames come directly from the
> >>>> commandline.  No searching of a directory.
> >>>>
> >>>> 2) your present code does not move any files to success or failure
> >>>> directories
> >>>>
> >>
> >> In function validate_files()
> >> Just after the line
> >>                   print('success with %s on %d reco...
> >> you could move the file, using shutil.  Likewise after the failure print.
> >>
> >>>> 3) your present code doesn't calculate or write to a text file any
> >>>> statistics.
> >>
> >> You successfully print to sys.stderr.  So you could print to some other
> >> file in the exact same way.
> >>
> >>>>
> >>>> 4) your present code runs once through the names, and terminates.  It
> >>>> doesn't "monitor" anything.
> >>
> >> Make a new function, perhaps called main(), with a loop that calls
> >> validate_files(), with a sleep after each pass.  Of course, unless you
> >> fix TODO#1, that'll keep looking for the same files.  No harm in that if
> >> that's the spec, since you moved the earlier versions of the files.
> >>
> >> But if you want to "monitor" the directory, let the directory name be
> >> the argument to main, and let main do a dirlist each time through the
> >> loop, and pass the corresponding list to validate_files.
> >>
> >>>>
> >>>> 5) your present code doesn't check for zero-length files
> >>>>
> >>
> >> In validate_and_process_data(), instead of checking filesize against
> >> ftell, check it against zero.
> >>
> >>>> I'd also wonder why you bother checking whether the
> >>>> os.path.getsize(file) function returns the same value as the os.SEEK_END
> >>>> and ftell() code does.  Is it that you don't trust the library?  Or that
> >>>> you have to run on Windows, where the line-ending logic can change the
> >>>> apparent file size?
> >>>>
> >>>> I notice you're not specifying a file mode on the open.  So in Python 3,
> >>>> your sizes are going to be specified in unicode characters after
> >>>> decoding.  Is that what the spec says?  It's probably safer to
> >>>> explicitly specify the mode (and the file encoding if you're in text).
> >>>>
> >>>> I see you call strip() before comparing the length.  Could there ever be
> >>>> leading or trailing whitespace that's significant?  Is that the actual
> >>>> specification of line size?
> >>>>
> >>>> --
> >>>> DaveA
> >>>
> >>>
> >>
> >>> I ask this because I have been searching fruitlessly through for some time and there are so many permutations that I am bamboozled by which is considered best practice.
> >>>
> >>> Moreover, as to the other comments, those are too specific. The scope of the assignment is very limited, but I am learning what I need to look out or ask questions regarding specs - in the future.
> >>>
> >>
> >>
> >> --
> >> DaveA
> >
> > @DaveA
> >
> > My most recent commit (https://github.com/ahlusar1989/WGProjects/blob/master/P1version2.0withassumptions_mods.py) has more annotations and comments for each file.
> 
> Perhaps you don't realize how github works.  The whole point is it 
> preserves the history of your code, and you use the same filename for 
> each revision.
> 
> Or possibly it's I that doesn't understand it.  I use git, but haven't 
> actually used github for my own code.
> 
> >
> > I have attempted to address the functional requirements that you brought up:
> >
> > 1) Before, my present code was assuming all filenames come directly from the commandline.  No searching of a directory. I think that I have addressed this.
> >
> 
> Have you even tried to run the code?  It quits immediately with an 
> exception since your call to main() doesn't pass any arguments, and main 
> requires one.
> 
>  > def main(dirslist):
>  >     while True:
>  >         for file in dirslist:
>  >         	return validate_files(file)
>  >         	time.sleep(5)
> 
> In addition, you aren't actually doing anything to find what the files 
> in the directory are.  I tried to refer to dirlist, as a hint.  A 
> stronger hint:  look up  os.listdir()
> 
> And that list of files has to change each time through the while loop, 
> that's the whole meaning of scanning.  You don't just grab the names 
> once, you look to see what's there.
> 
> The next thing is that you're using a variable called 'file', while 
> that's a built-in type in Python.  So you really want to use a different 
> name.
> 
> Next, you have a loop through the magical dirslist to get individual 
> filenames, but then you call the validate_files() function with a single 
> file, but that function is expecting a list of filenames.  One or the 
> other has to change.
> 
> Next, you return from main after validating the first file, so no others 
> will get processed.
> 
> > 2) My present code does not move any files to success or failure directories (requirements for this assignment1). I am still wondering if and how I should use shututil() like you advised me to. I keep receiving a syntax error when declaring this below the print statement.
> 
> You had a function to do that in your very first post to this thread. 
> Have you forgotten it already?  As for syntax errors, you can't expect 
> any help on that when you don't show any code nor the syntax error 
> traceback.  Remember that when a syntax error is shown, it's frequently 
> the previous line or two that actually was wrong.
> 
> >
> > 3) You correctly reminded me that my present code doesn't calculate or write to a text file any statistics or errors for the cause of the error.
> 
>  > #I wrote an error class in case the I needed such a specification for 
> a notifcation - this is not in the requirements
>  > class ErrorHandler:
>  >
>  >     def __init__(self):
>  >         pass
>  >
>  >     def write(self, string):
>  >
>  >         # write error to file
>  >         fname = " Error report (" + time.strftime("%Y-%m-%d %I-%M%p") 
> + ").txt"
>  >         handler = open(fname, "w")
>  >         handler.write(string)
>  >         handler.close()
> 
> 
> 
> 
> This looks like Java code.  With no data attributes, and only one 
> method(function), what's the reason you don't just use a function?
> 
> 
> original_stderr = sys.stderr # stored in case want to revert
> sys.stderr = ErrorHandler()
> 
> This is just bogus.  There are rare times when a programmer might want 
> to replace stderr, but that's just unnecessarily confusing in a program 
> like this one.  When you want to write printable data to another file, 
> just use that file's handle as the argument to the file= argument of 
> print().  You could use an instance of ErrorHandler as a file handle. 
> Or do it one of a dozen other straightforward ways.
> 
> While we're at it, why do you keep creating new files for your error 
> report?  And overwriting all the old messages with whatever new ones 
> come in the same minute?
> 
> 
> > (Should I use the copy or copy2 method in order provide metadata? If so, should I wrap it into a try and except logic?)
> 
> No idea what you mean here.  What metadata?  And what do copy and copy2 
> have to do with anything here?
> 
> >
> > 4) Before, my present code runs once through the names, and terminates.  It doesn't "monitor" anything. I think I have addressed this with the  main function - correct?
> >
> Not even close.
> 
> > 5) Before, my present code doesn't check for zero-length files  - I have added a comment there in case that is needed)
> >
> > I realize appreciate your invaluable feedback. I have grown a lot with this assignment!
> >
> > Sincerely,
> >
> > Saran
> >
> 
> 
> -- 
> DaveA

@DaveA

Thanks for your help on this homework assignment. I started from scratch last night. I have added some comments that will perhaps help clarify my intentions and my thought process. Thanks again. 

from __future__ import print_function

import os
import time
import glob
import sys

def initialize_logger(output_dir):
    logger = logging.getLogger()
    logger.setLevel(logging.DEBUG)

    # create console handler and set level to info
    handler = logging.StreamHandler()
    handler.setLevel(logging.INFO)
    formatter = logging.Formatter("%(levelname)s - %(message)s")
    handler.setFormatter(formatter)
    logger.addHandler(handler)

    # create error file handler and set level to error
    handler = logging.FileHandler(os.path.join(output_dir, "error.log"),"w", encoding=None, delay="true")
    handler.setLevel(logging.ERROR)
    formatter = logging.Formatter("%(levelname)s - %(message)s")
    handler.setFormatter(formatter)
    logger.addHandler(handler)

    # create debug file handler and set level to debug
    handler = logging.FileHandler(os.path.join(output_dir, "all.log"),"w")
    handler.setLevel(logging.DEBUG)
    formatter = logging.Formatter("%(levelname)s - %(message)s")
    handler.setFormatter(formatter)
    logger.addHandler(handler)

#Helper Functions for the Success and Failure Folder Outcomes, respectively

    def file_len(filename):
        with open(filename) as f:
            for i, l in enumerate(f):
                pass
            return i + 1

    def copy_and_move_File(src, dest):
        try:
            shutil.rename(src, dest)
        # eg. src and dest are the same file
        except shutil.Error as e:
            print('Error: %s' % e)
        # eg. source or destination doesn't exist
        except IOError as e:
            print('Error: %s' % e.strerror)

# Call main(), with a loop that calls # validate_files(), with a sleep after each pass. Before, my present #code was assuming all filenames come directly from the commandline.  There was no actual searching #of a directory.  

# I am assuming that this is appropriate since I moved the earlier versions of the files. 
# I let the directory name be the argument to main, and let main do a dirlist each time through the loop, 
# and pass the corresponding list to validate_files. 

path = "/some/sample/path/"
dirlist = os.listdir(path)
before = dict([(f, None) for f in dirlist)

#####Syntax Error?     before = dict([(f, None) for f in dirlist)
                                             ^
SyntaxError: invalid syntax

def main(dirlist):
    while True:
        time.sleep(10) #time between update check
    after = dict([(f, None) for f in dirlist)
    added = [f for f in after if not f in before]
    if added:
        print('Sucessfully added new file - ready to validate')
      ####add return statement here to pass to validate_files
if __name__ == "__main__": 
    main() 

#check for record time and record length - logic to be written to either pass to Failure or Success folder respectively

def validate_files():
    creation = time.ctime(os.path.getctime(added))
    lastmod = time.ctime(os.path.getmtime(added))

#Potential Additions/Substitutions  - what are the implications/consequences for this 

def move_to_failure_folder_and_return_error_file():
    os.mkdir('Failure')
    copy_and_move_File(filename, 'Failure')
    initialize_logger('rootdir/Failure')
    logging.error("Either this file is empty or there are no lines")

def move_to_success_folder_and_read(f):
    os.mkdir('Success')
    copy_and_move_File(filename, 'Success')
    print("Success", f)
    return file_len()

#This simply checks the file information by name------> is this needed anymore?

def fileinfo(file):
    filename = os.path.basename(f)
    rootdir = os.path.dirname(f)  
    filesize = os.path.getsize(f)
    return filename, rootdir, filesize

if __name__ == '__main__':
   import sys
   validate_files(sys.argv[1:])

# -- end of file