Strategy/ Advice for How to Best Attack this Problem?

Fri Apr 3 08:50:28 EDT 2015

On Friday, April 3, 2015 at 8:05:14 AM UTC-4, Dave Angel wrote:
> On 04/02/2015 07:43 PM, Saran A wrote:
>     <snip>
> >
> > I debugged and rewrote everything. Here is the full version. Feel free to tear this apart. The homework assignment is not due until tomorrow, so I am currently also experimenting with pyinotify as well. I do have questions regarding how to make this function compatible with the ProcessEvent Class. I will create another post for this.
> >
> > What would you advise in regards to renaming the inaptly named dirlist?
> 
> Asked and answered.  You called it path at one point.  dir_name would 
> also be good.  The point is that if you have a good name, you're less 
> likely to have unreasonable code processing that name.  For example,
> 
> >
> > # # # Without data to examine here, I can only guess based on this requirement's language that
> > # # fixed records are in the input.  If so, here's a slight revision to the helper functions that I wrote earlier which
> > # # takes the function fileinfo as a starting point and demonstrates calling a function from within a function.
> > # I tested this little sample on a small set of files created with MD5 checksums.  I wrote the Python in such a way as it
> > # would work with Python 2.x or 3.x (note the __future__ at the top).
> >
> > # # # There are so many wonderful ways of failure, so, from a development standpoint, I would probably spend a bit
> > # # more time trying to determine which failure(s) I would want to report to the user, and how (perhaps creating my own Exceptions)
> >
> > # # # The only other comments I would make are about safe-file handling.
> >
> > # # #   #1:  Question: After a user has created a file that has failed (in
> > # # #        processing),can the user create a file with the same name?
> > # # #        If so, then you will probably want to look at some sort
> > # # #        of file-naming strategy to avoid overwriting evidence of
> > # # #        earlier failures.
> >
> > # # # File naming is a tricky thing.  I referenced the tempfile module [1] and the Maildir naming scheme to see two different
> > # # types of solutions to the problem of choosing a unique filename.
> >> ## I am assuming that all of my files are going to be specified in unicode
> >
> > ## Utilized Spyder's Scientific Computing IDE to debug, check for indentation errors and test function suite
> >
> > from __future__ import print_function
> >
> > import os.path
> > import time
> > import logging
> >
> >
> > def initialize_logger(output_dir):
>             <snip>
> 
> I didn't ever bother to read the body of this function, since a simple
> 
>        print(mydata, file=mylog_file)
> 
> will suffice to add data to the chosen text file.  Why is it constantly 
> getting more complex, to solve a problem that was simple in the beginning?
> 
> >
> >
> > #Returns filename, rootdir and filesize
> >
> > def fileinfo(f):
> >      filename = os.path.basename(f)
> >      rootdir = os.path.dirname(f)
> >      filesize = os.path.getsize(f)
> >      return filename, rootdir, filesize
> >
> > #returns length of file
> > def file_len(f):
> >      with open(f) as f:
> >          for i, l in enumerate(f):
> >              pass
> >              return i + 1
> 
> Always returns 1 or None.  Check the indentation, and test to see what 
> it does for empty file, for a file with one line, and for a file with 
> more than one line.
> 
> 
> > #attempts to copy file and move file to it's directory
> > def copy_and_move_file(src, dest):
> 
> Which is it, are you trying to copy it, or move it?  Pick one and make a 
> good function name that shows your choice.
> 
> >      try:
> >          os.rename(src, dest)
> 
> Why are you using rename, when you're trying to move the file?  Take a 
> closer look at shutil, and see if it has a function that does it safer 
> than rename.  The function you need uses rename, when it'll work, and 
> does it other ways when rename will not.
> 
> >          # eg. src and dest are the same file
> >      except IOError as e:
> >          print('Error: %s' % e.strerror)
> 
> A try/except that doesn't try to correct the problem is not generally 
> useful.  Figure out what could be triggering the exception, and how 
> you're going to handle it.  If it cannot be handled, terminate the program.
> 
> For example, what if you don't have permissions to modify one of the 
> specified directories?  You can't get any useful work done, so you 
> should notify the user and exit.  An alternative is to produce 50 
> thousand reports of each file you've got, telling how it succeeded or 
> failed, over and over.
> 
> >
> > path = "."
> > dirlist = os.listdir(path)
> >
> > def main(dirlist):
> >      before = dict([(f, 0) for f in dirlist])
> 
> Since dirlist is a path, it's a string.  So you're looping through the 
> characters of the name of the path.  I still don't have a clue what the 
> dict is supposed to mean here.
> 
> >      while True:
> >          time.sleep(1) #time between update check
> 
> That loop goes forever, so the following code will never run.
> 
> >      after = dict([(f, None) for f in dirlist])
> 
> Once again, you're looping through the letters of the directory name. 
> Or if dirlist is really a list, and you're deciding that's what it 
> should be, then of course after will be identical to before.
> 
> >      added = [f for f in after if not f in before]
> >      if added:
> >          f = ''.join(added)
> >          print('Sucessfully added %s file - ready to validate') %()
> >          return validate_files(f)
> >      else:
> >          return move_to_failure_folder_and_return_error_file(f)
> >
> >
> >
> > def validate_files(f):
> 
> You completely changed the scope of this function.  Why does it still 
> have an s in its name?
> 
> >      creation = time.ctime(os.path.getctime(f))
> >      lastmod = time.ctime(os.path.getmtime(f))
> 
> What do timestamps have to do with anything?
> 
> >      if creation == lastmod and file_len(f) > 0:
> >          return move_to_success_folder_and_read(f)
> 
> How have you decided that the records are all the same length?
> 
> >      if file_len < 0 and creation != lastmod:
> 
> file_len cannot be safely compared to 0, it's a function.  If this were 
> Python 3, it would give you a runtime error for trying.  Even if you had 
> successfully called the function here, how is it possible for it to 
> return a negative value?
> 
> >          return move_to_success_folder_and_read(f)
> >      else:
> >          return move_to_failure_folder_and_return_error_file(f)
> >
> >
> > #Potential Additions/Substitutions
> >
> > def move_to_failure_folder_and_return_error_file():
> >      filename, rootdir, lastmod, creation, filesize = fileinfo(file)
> 
> fileinfo() returns 3 things, and you're stuffing them in 5 variables.
> 
> >      os.mkdir('Failure')
> 
> What do you do the second time through this function, when that 
> directory is already existing?
> 
> >      copy_and_move_file( 'Failure')
> 
> The function takes two arguments, neither of which is likely to be that 
> string.
> 
> >      initialize_logger('rootdir/Failure')
> >      logging.error("Either this file is empty or there are no lines")
> 
> 
> >
> > def move_to_success_folder_and_read():
> >      filename, rootdir, lastmod, creation, filesize = fileinfo(file)
> >      os.mkdir('Success')
> >      copy_and_move_file(rootdir, 'Success') #file name
> >      print("Success", file)
> >      return file_len(file)
> >
> >
> >
> > if __name__ == '__main__':
> >     main(dirlist)
> 
> So now you're passing a list of filenames to main()?  How then is it 
> going to know when new files arrive?  You were originally going to tell 
> it a directory name.  That's why I suggested that you call the parameter 
> 'path' or 'dirname'.
> 
> 
> 
> -- 
> DaveA

@DaveA

I addressed most of the issues. I do admit that, as a novice, I feel beholden to the computer - hence the over-engineering.

As you correctly stated:

"
> What do you do the second time through this function, when that 
> directory is already existing?
> 
> >      copy_and_move_file( 'Failure')
> 
> The function takes two arguments, neither of which is likely to be that 
> string.
> 
> >      initialize_logger('rootdir/Failure')
> >      logging.error("Either this file is empty or there are no lines")"

How would I ensure that this s directory is made only once and every file that is passeed goes only to 'success' or 'failure'?