Strategy/ Advice for How to Best Attack this Problem?
Saran A
ahlusar.ahluwalia at gmail.com
Thu Apr 2 09:06:53 EDT 2015
On Wednesday, April 1, 2015 at 7:52:27 PM UTC-4, Dave Angel wrote:
> On 04/01/2015 09:43 AM, Saran A wrote:
> > On Tuesday, March 31, 2015 at 9:19:37 AM UTC-4, Dave Angel wrote:
> >> On 03/31/2015 07:00 AM, Saran A wrote:
> >>
> >> > @DaveA: This is a homework assignment. .... Is it possible that you
> >> could provide me with some snippets or guidance on where to place your
> >> suggestions (for your TO DOs 2,3,4,5)?
> >> >
> >>
> >>
> >>> On Monday, March 30, 2015 at 2:36:02 PM UTC-4, Dave Angel wrote:
> >>
> >>>>
> >>>> It's missing a number of your requirements. But it's a start.
> >>>>
> >>>> If it were my file, I'd have a TODO comment at the bottom stating known
> >>>> changes that are needed. In it, I'd mention:
> >>>>
> >>>> 1) your present code is assuming all filenames come directly from the
> >>>> commandline. No searching of a directory.
> >>>>
> >>>> 2) your present code does not move any files to success or failure
> >>>> directories
> >>>>
> >>
> >> In function validate_files()
> >> Just after the line
> >> print('success with %s on %d reco...
> >> you could move the file, using shutil. Likewise after the failure print.
> >>
> >>>> 3) your present code doesn't calculate or write to a text file any
> >>>> statistics.
> >>
> >> You successfully print to sys.stderr. So you could print to some other
> >> file in the exact same way.
> >>
> >>>>
> >>>> 4) your present code runs once through the names, and terminates. It
> >>>> doesn't "monitor" anything.
> >>
> >> Make a new function, perhaps called main(), with a loop that calls
> >> validate_files(), with a sleep after each pass. Of course, unless you
> >> fix TODO#1, that'll keep looking for the same files. No harm in that if
> >> that's the spec, since you moved the earlier versions of the files.
> >>
> >> But if you want to "monitor" the directory, let the directory name be
> >> the argument to main, and let main do a dirlist each time through the
> >> loop, and pass the corresponding list to validate_files.
> >>
> >>>>
> >>>> 5) your present code doesn't check for zero-length files
> >>>>
> >>
> >> In validate_and_process_data(), instead of checking filesize against
> >> ftell, check it against zero.
> >>
> >>>> I'd also wonder why you bother checking whether the
> >>>> os.path.getsize(file) function returns the same value as the os.SEEK_END
> >>>> and ftell() code does. Is it that you don't trust the library? Or that
> >>>> you have to run on Windows, where the line-ending logic can change the
> >>>> apparent file size?
> >>>>
> >>>> I notice you're not specifying a file mode on the open. So in Python 3,
> >>>> your sizes are going to be specified in unicode characters after
> >>>> decoding. Is that what the spec says? It's probably safer to
> >>>> explicitly specify the mode (and the file encoding if you're in text).
> >>>>
> >>>> I see you call strip() before comparing the length. Could there ever be
> >>>> leading or trailing whitespace that's significant? Is that the actual
> >>>> specification of line size?
> >>>>
> >>>> --
> >>>> DaveA
> >>>
> >>>
> >>
> >>> I ask this because I have been searching fruitlessly through for some time and there are so many permutations that I am bamboozled by which is considered best practice.
> >>>
> >>> Moreover, as to the other comments, those are too specific. The scope of the assignment is very limited, but I am learning what I need to look out or ask questions regarding specs - in the future.
> >>>
> >>
> >>
> >> --
> >> DaveA
> >
> > @DaveA
> >
> > My most recent commit (https://github.com/ahlusar1989/WGProjects/blob/master/P1version2.0withassumptions_mods.py) has more annotations and comments for each file.
>
> Perhaps you don't realize how github works. The whole point is it
> preserves the history of your code, and you use the same filename for
> each revision.
>
> Or possibly it's I that doesn't understand it. I use git, but haven't
> actually used github for my own code.
>
> >
> > I have attempted to address the functional requirements that you brought up:
> >
> > 1) Before, my present code was assuming all filenames come directly from the commandline. No searching of a directory. I think that I have addressed this.
> >
>
> Have you even tried to run the code? It quits immediately with an
> exception since your call to main() doesn't pass any arguments, and main
> requires one.
>
> > def main(dirslist):
> > while True:
> > for file in dirslist:
> > return validate_files(file)
> > time.sleep(5)
>
> In addition, you aren't actually doing anything to find what the files
> in the directory are. I tried to refer to dirlist, as a hint. A
> stronger hint: look up os.listdir()
>
> And that list of files has to change each time through the while loop,
> that's the whole meaning of scanning. You don't just grab the names
> once, you look to see what's there.
>
> The next thing is that you're using a variable called 'file', while
> that's a built-in type in Python. So you really want to use a different
> name.
>
> Next, you have a loop through the magical dirslist to get individual
> filenames, but then you call the validate_files() function with a single
> file, but that function is expecting a list of filenames. One or the
> other has to change.
>
> Next, you return from main after validating the first file, so no others
> will get processed.
>
> > 2) My present code does not move any files to success or failure directories (requirements for this assignment1). I am still wondering if and how I should use shututil() like you advised me to. I keep receiving a syntax error when declaring this below the print statement.
>
> You had a function to do that in your very first post to this thread.
> Have you forgotten it already? As for syntax errors, you can't expect
> any help on that when you don't show any code nor the syntax error
> traceback. Remember that when a syntax error is shown, it's frequently
> the previous line or two that actually was wrong.
>
> >
> > 3) You correctly reminded me that my present code doesn't calculate or write to a text file any statistics or errors for the cause of the error.
>
> > #I wrote an error class in case the I needed such a specification for
> a notifcation - this is not in the requirements
> > class ErrorHandler:
> >
> > def __init__(self):
> > pass
> >
> > def write(self, string):
> >
> > # write error to file
> > fname = " Error report (" + time.strftime("%Y-%m-%d %I-%M%p")
> + ").txt"
> > handler = open(fname, "w")
> > handler.write(string)
> > handler.close()
>
>
>
>
> This looks like Java code. With no data attributes, and only one
> method(function), what's the reason you don't just use a function?
>
>
> original_stderr = sys.stderr # stored in case want to revert
> sys.stderr = ErrorHandler()
>
> This is just bogus. There are rare times when a programmer might want
> to replace stderr, but that's just unnecessarily confusing in a program
> like this one. When you want to write printable data to another file,
> just use that file's handle as the argument to the file= argument of
> print(). You could use an instance of ErrorHandler as a file handle.
> Or do it one of a dozen other straightforward ways.
>
> While we're at it, why do you keep creating new files for your error
> report? And overwriting all the old messages with whatever new ones
> come in the same minute?
>
>
> > (Should I use the copy or copy2 method in order provide metadata? If so, should I wrap it into a try and except logic?)
>
> No idea what you mean here. What metadata? And what do copy and copy2
> have to do with anything here?
>
> >
> > 4) Before, my present code runs once through the names, and terminates. It doesn't "monitor" anything. I think I have addressed this with the main function - correct?
> >
> Not even close.
>
> > 5) Before, my present code doesn't check for zero-length files - I have added a comment there in case that is needed)
> >
> > I realize appreciate your invaluable feedback. I have grown a lot with this assignment!
> >
> > Sincerely,
> >
> > Saran
> >
>
>
> --
> DaveA
@DaveA
Thanks for your help on this homework assignment. I started from scratch last night. I have added some comments that will perhaps help clarify my intentions and my thought process. Thanks again.
from __future__ import print_function
import os
import time
import glob
import sys
def initialize_logger(output_dir):
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
# create console handler and set level to info
handler = logging.StreamHandler()
handler.setLevel(logging.INFO)
formatter = logging.Formatter("%(levelname)s - %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)
# create error file handler and set level to error
handler = logging.FileHandler(os.path.join(output_dir, "error.log"),"w", encoding=None, delay="true")
handler.setLevel(logging.ERROR)
formatter = logging.Formatter("%(levelname)s - %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)
# create debug file handler and set level to debug
handler = logging.FileHandler(os.path.join(output_dir, "all.log"),"w")
handler.setLevel(logging.DEBUG)
formatter = logging.Formatter("%(levelname)s - %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)
#Helper Functions for the Success and Failure Folder Outcomes, respectively
def file_len(filename):
with open(filename) as f:
for i, l in enumerate(f):
pass
return i + 1
def copy_and_move_File(src, dest):
try:
shutil.rename(src, dest)
# eg. src and dest are the same file
except shutil.Error as e:
print('Error: %s' % e)
# eg. source or destination doesn't exist
except IOError as e:
print('Error: %s' % e.strerror)
# Call main(), with a loop that calls # validate_files(), with a sleep after each pass. Before, my present #code was assuming all filenames come directly from the commandline. There was no actual searching #of a directory.
# I am assuming that this is appropriate since I moved the earlier versions of the files.
# I let the directory name be the argument to main, and let main do a dirlist each time through the loop,
# and pass the corresponding list to validate_files.
path = "/some/sample/path/"
dirlist = os.listdir(path)
before = dict([(f, None) for f in dirlist)
#####Syntax Error? before = dict([(f, None) for f in dirlist)
^
SyntaxError: invalid syntax
def main(dirlist):
while True:
time.sleep(10) #time between update check
after = dict([(f, None) for f in dirlist)
added = [f for f in after if not f in before]
if added:
print('Sucessfully added new file - ready to validate')
####add return statement here to pass to validate_files
if __name__ == "__main__":
main()
#check for record time and record length - logic to be written to either pass to Failure or Success folder respectively
def validate_files():
creation = time.ctime(os.path.getctime(added))
lastmod = time.ctime(os.path.getmtime(added))
#Potential Additions/Substitutions - what are the implications/consequences for this
def move_to_failure_folder_and_return_error_file():
os.mkdir('Failure')
copy_and_move_File(filename, 'Failure')
initialize_logger('rootdir/Failure')
logging.error("Either this file is empty or there are no lines")
def move_to_success_folder_and_read(f):
os.mkdir('Success')
copy_and_move_File(filename, 'Success')
print("Success", f)
return file_len()
#This simply checks the file information by name------> is this needed anymore?
def fileinfo(file):
filename = os.path.basename(f)
rootdir = os.path.dirname(f)
filesize = os.path.getsize(f)
return filename, rootdir, filesize
if __name__ == '__main__':
import sys
validate_files(sys.argv[1:])
# -- end of file
More information about the Python-list
mailing list