Strategy/ Advice for How to Best Attack this Problem?
Saran A
ahlusar.ahluwalia at gmail.com
Fri Apr 3 08:50:28 EDT 2015
On Friday, April 3, 2015 at 8:05:14 AM UTC-4, Dave Angel wrote:
> On 04/02/2015 07:43 PM, Saran A wrote:
> <snip>
> >
> > I debugged and rewrote everything. Here is the full version. Feel free to tear this apart. The homework assignment is not due until tomorrow, so I am currently also experimenting with pyinotify as well. I do have questions regarding how to make this function compatible with the ProcessEvent Class. I will create another post for this.
> >
> > What would you advise in regards to renaming the inaptly named dirlist?
>
> Asked and answered. You called it path at one point. dir_name would
> also be good. The point is that if you have a good name, you're less
> likely to have unreasonable code processing that name. For example,
>
> >
> > # # # Without data to examine here, I can only guess based on this requirement's language that
> > # # fixed records are in the input. If so, here's a slight revision to the helper functions that I wrote earlier which
> > # # takes the function fileinfo as a starting point and demonstrates calling a function from within a function.
> > # I tested this little sample on a small set of files created with MD5 checksums. I wrote the Python in such a way as it
> > # would work with Python 2.x or 3.x (note the __future__ at the top).
> >
> > # # # There are so many wonderful ways of failure, so, from a development standpoint, I would probably spend a bit
> > # # more time trying to determine which failure(s) I would want to report to the user, and how (perhaps creating my own Exceptions)
> >
> > # # # The only other comments I would make are about safe-file handling.
> >
> > # # # #1: Question: After a user has created a file that has failed (in
> > # # # processing),can the user create a file with the same name?
> > # # # If so, then you will probably want to look at some sort
> > # # # of file-naming strategy to avoid overwriting evidence of
> > # # # earlier failures.
> >
> > # # # File naming is a tricky thing. I referenced the tempfile module [1] and the Maildir naming scheme to see two different
> > # # types of solutions to the problem of choosing a unique filename.
> >> ## I am assuming that all of my files are going to be specified in unicode
> >
> > ## Utilized Spyder's Scientific Computing IDE to debug, check for indentation errors and test function suite
> >
> > from __future__ import print_function
> >
> > import os.path
> > import time
> > import logging
> >
> >
> > def initialize_logger(output_dir):
> <snip>
>
> I didn't ever bother to read the body of this function, since a simple
>
> print(mydata, file=mylog_file)
>
> will suffice to add data to the chosen text file. Why is it constantly
> getting more complex, to solve a problem that was simple in the beginning?
>
> >
> >
> > #Returns filename, rootdir and filesize
> >
> > def fileinfo(f):
> > filename = os.path.basename(f)
> > rootdir = os.path.dirname(f)
> > filesize = os.path.getsize(f)
> > return filename, rootdir, filesize
> >
> > #returns length of file
> > def file_len(f):
> > with open(f) as f:
> > for i, l in enumerate(f):
> > pass
> > return i + 1
>
> Always returns 1 or None. Check the indentation, and test to see what
> it does for empty file, for a file with one line, and for a file with
> more than one line.
>
>
> > #attempts to copy file and move file to it's directory
> > def copy_and_move_file(src, dest):
>
> Which is it, are you trying to copy it, or move it? Pick one and make a
> good function name that shows your choice.
>
> > try:
> > os.rename(src, dest)
>
> Why are you using rename, when you're trying to move the file? Take a
> closer look at shutil, and see if it has a function that does it safer
> than rename. The function you need uses rename, when it'll work, and
> does it other ways when rename will not.
>
> > # eg. src and dest are the same file
> > except IOError as e:
> > print('Error: %s' % e.strerror)
>
> A try/except that doesn't try to correct the problem is not generally
> useful. Figure out what could be triggering the exception, and how
> you're going to handle it. If it cannot be handled, terminate the program.
>
> For example, what if you don't have permissions to modify one of the
> specified directories? You can't get any useful work done, so you
> should notify the user and exit. An alternative is to produce 50
> thousand reports of each file you've got, telling how it succeeded or
> failed, over and over.
>
> >
> > path = "."
> > dirlist = os.listdir(path)
> >
> > def main(dirlist):
> > before = dict([(f, 0) for f in dirlist])
>
> Since dirlist is a path, it's a string. So you're looping through the
> characters of the name of the path. I still don't have a clue what the
> dict is supposed to mean here.
>
> > while True:
> > time.sleep(1) #time between update check
>
> That loop goes forever, so the following code will never run.
>
> > after = dict([(f, None) for f in dirlist])
>
> Once again, you're looping through the letters of the directory name.
> Or if dirlist is really a list, and you're deciding that's what it
> should be, then of course after will be identical to before.
>
> > added = [f for f in after if not f in before]
> > if added:
> > f = ''.join(added)
> > print('Sucessfully added %s file - ready to validate') %()
> > return validate_files(f)
> > else:
> > return move_to_failure_folder_and_return_error_file(f)
> >
> >
> >
> > def validate_files(f):
>
> You completely changed the scope of this function. Why does it still
> have an s in its name?
>
> > creation = time.ctime(os.path.getctime(f))
> > lastmod = time.ctime(os.path.getmtime(f))
>
> What do timestamps have to do with anything?
>
> > if creation == lastmod and file_len(f) > 0:
> > return move_to_success_folder_and_read(f)
>
> How have you decided that the records are all the same length?
>
> > if file_len < 0 and creation != lastmod:
>
> file_len cannot be safely compared to 0, it's a function. If this were
> Python 3, it would give you a runtime error for trying. Even if you had
> successfully called the function here, how is it possible for it to
> return a negative value?
>
> > return move_to_success_folder_and_read(f)
> > else:
> > return move_to_failure_folder_and_return_error_file(f)
> >
> >
> > #Potential Additions/Substitutions
> >
> > def move_to_failure_folder_and_return_error_file():
> > filename, rootdir, lastmod, creation, filesize = fileinfo(file)
>
> fileinfo() returns 3 things, and you're stuffing them in 5 variables.
>
> > os.mkdir('Failure')
>
> What do you do the second time through this function, when that
> directory is already existing?
>
> > copy_and_move_file( 'Failure')
>
> The function takes two arguments, neither of which is likely to be that
> string.
>
> > initialize_logger('rootdir/Failure')
> > logging.error("Either this file is empty or there are no lines")
>
>
> >
> > def move_to_success_folder_and_read():
> > filename, rootdir, lastmod, creation, filesize = fileinfo(file)
> > os.mkdir('Success')
> > copy_and_move_file(rootdir, 'Success') #file name
> > print("Success", file)
> > return file_len(file)
> >
> >
> >
> > if __name__ == '__main__':
> > main(dirlist)
>
> So now you're passing a list of filenames to main()? How then is it
> going to know when new files arrive? You were originally going to tell
> it a directory name. That's why I suggested that you call the parameter
> 'path' or 'dirname'.
>
>
>
> --
> DaveA
@DaveA
I addressed most of the issues. I do admit that, as a novice, I feel beholden to the computer - hence the over-engineering.
As you correctly stated:
"
> What do you do the second time through this function, when that
> directory is already existing?
>
> > copy_and_move_file( 'Failure')
>
> The function takes two arguments, neither of which is likely to be that
> string.
>
> > initialize_logger('rootdir/Failure')
> > logging.error("Either this file is empty or there are no lines")"
How would I ensure that this s directory is made only once and every file that is passeed goes only to 'success' or 'failure'?
More information about the Python-list
mailing list