New to Programming: TypeError: coercing to Unicode: need string or buffer, list found

Saran A ahlusar.ahluwalia at gmail.com
Thu Apr 2 20:14:30 EDT 2015


On Thursday, April 2, 2015 at 8:03:53 PM UTC-4, Dennis Lee Bieber wrote:
> On Thu, 2 Apr 2015 05:46:57 -0700 (PDT), Saran A
> <ahlusar.ahluwalia at gmail.com> declaimed the following:
> 
> >
> >@ChrisA - this is a smaller function that will take the most updated file. My intention is the following:
> >
> >* Monitor a folder for files that are dropped throughout the day
> >
> 	I would suggest that your first prototype is to be a program that
> contains a function whose only purpose is to report on the files it finds
> -- forget about all the processing/moving of the files until you can
> successfully loop around the work of fetching the directory and handling
> the file names found (by maybe printing the names of the ones determined to
> be new since last fetch).
> 
> >* When a file is dropped in the folder the program should scan the file
> >
> >o IF all the contents in the file have the same length (let's assume line length)
> >
> >o THEN the file should be moved to a "success" folder and a text file written indicating the total number of records/lines/words processed
> >
> >o IF the file is empty OR the contents are not all of the same length
> >
> >o THEN the file should be moved to a "failure" folder and a text file written indicating the cause for failure (for example: Empty file or line 100 was not the same length as the rest).
> >
> 	You still haven't defined how you determine the "correct length" of the
> record. What if the first line is 79 characters, and all the others are 80
> characters? Do you report ALL lines EXCEPT the first as being the wrong
> length, when really it is the first line that is wrong?
> 
> 	Also, if the files are Unicode (UTF-8, in particular) -- the byte
> length of a line could differ but the character length could be the same.
> 
> >Here is the code I have written:
> >
> >import os
> >import time
> >import glob
> >import sys
> >
> >def initialize_logger(output_dir):
> >    logger = logging.getLogger()
> >    logger.setLevel(logging.DEBUG)
> >     
> >    # create console handler and set level to info
> >    handler = logging.StreamHandler()
> >    handler.setLevel(logging.INFO)
> >    formatter = logging.Formatter("%(levelname)s - %(message)s")
> >    handler.setFormatter(formatter)
> >    logger.addHandler(handler)
> > 
> >    # create error file handler and set level to error
> >    handler = logging.FileHandler(os.path.join(output_dir, "error.log"),"w", encoding=None, delay="true")
> >    handler.setLevel(logging.ERROR)
> >    formatter = logging.Formatter("%(levelname)s - %(message)s")
> >    handler.setFormatter(formatter)
> >    logger.addHandler(handler)
> >
> >    # create debug file handler and set level to debug
> >    handler = logging.FileHandler(os.path.join(output_dir, "all.log"),"w")
> >    handler.setLevel(logging.DEBUG)
> >    formatter = logging.Formatter("%(levelname)s - %(message)s")
> >    handler.setFormatter(formatter)
> >    logger.addHandler(handler)
> >
> >#Helper Functions for the Success and Failure Folder Outcomes, respectively
> >
> >#checks the length of the file
> >    def file_len(filename
> >        with open(filename) as f:
> >            for i, l in enumerate(f):
> >                pass
> >            return i + 1
> >
> >#copies file to new destination
> >
> >    def copyFile(src, dest):
> >        try:
> >            shutil.copy(src, dest)
> >        # eg. src and dest are the same file
> >        except shutil.Error as e:
> >            print('Error: %s' % e)
> >        # eg. source or destination doesn't exist
> >        except IOError as e:
> >            print('Error: %s' % e.strerror)
> >
> >#Failure Folder
> >
> >def move_to_failure_folder_and_return_error_file():
> >    os.mkdir('Failure')
> >    copyFile(filename, 'Failure')
> >    initialize_logger('rootdir/Failure')
> >    logging.error("Either this file is empty or the lines")
> >     
> ># Success Folder Requirement
> >             
> >def move_to_success_folder_and_read(file):
> >    os.mkdir('Success')
> >    copyFile(filename, 'Success')
> >    print("Success", file)
> >    return file_len()
> >
> >
> >#This simply checks the file information by name
> >
> >def fileinfo(file):
> >    filename = os.path.basename(file)
> >    rootdir = os.path.dirname(file)
> >    lastmod = time.ctime(os.path.getmtime(file))
> >    creation = time.ctime(os.path.getctime(file))
> >    filesize = os.path.getsize(file)
> >    return filename, rootdir, lastmod, creation, filesize
> >
> >if __name__ == '__main__':
> >   import sys
> >   validate_files(sys.argv[1:])
> 
> 	Yeesh... Did you even try running that?
> 
> 	validate_files		is not defined
> 	file_len				is at the wrong indentation
> 						is syntactically garbage
> 						is a big time-waste (you read the file just to
> enumerate the number of lines? Why didn't you count the lines while
> checking the line lengths)
> 	copyFile			is at the wrong indentation
> 						(after a bunch of word_word, why camelCase here)
> 
> 	Correct all the edit errors and copy/paste the actual file that at
> least attempts to run.
> 
> 	You might also want to look at os.stat, rather than using three os.path
> calls.
> -- 
> 	Wulfraed                 Dennis Lee Bieber         AF6VN
>     wlfraed at ix.netcom.com    HTTP://wlfraed.home.netcom.com/

@Dennis:

Below is my full program (so far). Please feel free to tear it apart and provide me with constructive criticism. I have been programming for 8 months now and this is a huge learning experience for me. Feedback and modifications is very welcome. 

What would be a better name for dirlist?

# # # Without data to examine here, I can only guess based on this requirement's language that 
# # fixed records are in the input.

##I made the assumption that the directories are in the same filesystem

# # Takes the function fileinfo as a starting point and demonstrates calling a function from within a function.  
# I tested this little sample on a small set of files created with MD5 checksums.  I wrote the Python in such a way as it 
# would work with Python 2.x or 3.x (note the __future__ at the top).

# # # There are so many wonderful ways of failure, so, from a development standpoint, I would probably spend a bit 
# # more time trying to determine which failure(s) I would want to report to the user, and how (perhaps creating my own Exceptions)

# # # The only other comments I would make are about safe-file handling.

# # #   #1:  Question: After a user has created a file that has failed (in
# # #        processing),can the user create a file with the same name?
# # #        If so, then you will probably want to look at some sort
# # #        of file-naming strategy to avoid overwriting evidence of
# # #        earlier failures.

# # # File naming is a tricky thing.  I referenced the tempfile module [1] and the Maildir naming scheme to see two different 
# # types of solutions to the problem of choosing a unique filename.

## I am assuming that all of my files are going to be specified in unicode  

## Utilized Spyder's Scientific Computing IDE to debug, check for indentation errors and test function suite

from __future__ import print_function

import os.path
import time
import difflib
import logging

def initialize_logger(output_dir):
    logger = logging.getLogger()
    logger.setLevel(logging.DEBUG)
     
    # create console handler and set level to info
    handler = logging.StreamHandler()
    handler.setLevel(logging.INFO)
    formatter = logging.Formatter("%(levelname)s - %(message)s")
    handler.setFormatter(formatter)
    logger.addHandler(handler)
 
    # create error file handler and set level to error
    handler = logging.FileHandler(os.path.join(output_dir, "error.log"),"w", encoding=None, delay="true")
    handler.setLevel(logging.ERROR)
    formatter = logging.Formatter("%(levelname)s - %(message)s")
    handler.setFormatter(formatter)
    logger.addHandler(handler)

    # create debug file handler and set level to debug
    handler = logging.FileHandler(os.path.join(output_dir, "all.log"),"w")
    handler.setLevel(logging.DEBUG)
    formatter = logging.Formatter("%(levelname)s - %(message)s")
    handler.setFormatter(formatter)
    logger.addHandler(handler)


#This function's purpose is to obtain the filename, rootdir and filesize 

def fileinfo(f):
    filename = os.path.basename(f)
    rootdir = os.path.dirname(f)  
    filesize = os.path.getsize(f)
    return filename, rootdir, filesize

#This helper function returns the length of the file
def file_len(f):
    with open(f) as f:
        for i, l in enumerate(f):
            pass
            return i + 1

#This helper function attempts to copy file and move file to the respective directory
#I am assuming that the directories are in the same filesystem

# If directories ARE in different file systems, I would use the following helper function:

# def move(src, dest): 
#     shutil.move(src, dest)

def copy_and_move_file(src, dest):
    try:
        os.rename(src, dest)
        # eg. src and dest are the same file
    except IOError as e:
        print('Error: %s' % e.strerror)


path = "."
dirlist = os.listdir(path)


# Caveats of the "main" function is that it does not scale well 
#(although it is appropriate if one assumes that there will be few changes)

# It does not account for updated files existing in the directory - only new files "dropped" in
# (If this was included in the requirements, os.stat would be appropriate here)

 
def main(dirlist):   
    before = dict([(f, 0) for f in dirlist])
    while True:
        time.sleep(1) #time between update check
    after = dict([(f, None) for f in dirlist])
    added = [f for f in after if not f in before]
    if added:
        f = ''.join(added)
        print('Sucessfully added %s file - ready to validate') %(f)
        return validate_files(f)
    else:
        return move_to_failure_folder_and_return_error_file(f)


    
def validate_files(f):
    creation = time.ctime(os.path.getctime(f))
    lastmod = time.ctime(os.path.getmtime(f))
    if creation == lastmod and file_len(f) > 0:
        return move_to_success_folder_and_read(f)
    if file_len < 0 and creation != lastmod:
        return move_to_success_folder_and_read(f)
    else:
        return move_to_failure_folder_and_return_error_file(f)


# Failure/Success Folder Functions

def move_to_failure_folder_and_return_error_file():
    filename, rootdir, lastmod, creation, filesize = fileinfo(file)  
    os.mkdir('Failure')
    copy_and_move_file( 'Failure')
    initialize_logger('rootdir/Failure')
    logging.error("Either this file is empty or there are no lines")
     
             
def move_to_success_folder_and_read():
    filename, rootdir, lastmod, creation, filesize = fileinfo(file)  
    os.mkdir('Success')
    copy_and_move_file(rootdir, 'Success') #file name
    print("Success", file)
    return file_len(file)



if __name__ == '__main__':
   main(dirlist) 



More information about the Python-list mailing list