fixing an horrific formatted csv file.

flebber flebber.crue at gmail.com
Fri Jul 4 06:48:10 EDT 2014


On Friday, 4 July 2014 16:19:09 UTC+10, Gregory Ewing  wrote:
> flebber wrote:
> 
> > so in my file I had on line 44 this trainer name.
> 
> > 
> 
> > "Michael, Wayne & John Hawkes"
> 
> > 
> 
> > and in line 95 this horse name. Inz'n'out
> 
> > 
> 
> > this throws of my capturing correct item 9. How do I protect against this?
> 
> 
> 
> Use python's csv module to read the file. Don't try to
> 
> do it yourself; the rules for handling embedded commas
> 
> and quotes in csv are quite complicated. As long as
> 
> the file is a well-formed csv file, the csv module
> 
> should parse fields like that correctly.
> 
> 
> 
> -- 
> 
> Greg

True Greg worked easier

def race_table(text_file):
    """utility to reorganise poorly made csv entry"""
#     input_table = [[item.strip(' "') for item in record.split(',')]
#                    for record in text_file.splitlines()]
# At this point look at input_table to find the record indices
#     identity = string.maketrans("", "")
#     print(input_table)
#     input_table = [s.translate(identity, ",'") for s
#                    in input_table]
    output_table = []
    for record in text_file:
        if record[0] == 'Meeting':
            meeting = record[3]
        elif record[0] == 'Race':
            date = record[13]
            race = record[1]
        elif record[0] == 'Horse':
            number = record[1]
            name = record[2]
            results = record[9]
            res_split = re.split('[- ]', results)
            starts = res_split[0]
            wins = res_split[1]
            seconds = res_split[2]
            thirds = res_split[3]
            try:
                prizemoney = res_split[4]
            finally:
                prizemoney = 0
            trainer = record[4]
            location = record[5]
            print(name, wins, seconds)
            output_table.append((meeting, date, race, number, name,
                                 starts, wins, seconds, thirds, prizemoney,
                                 trainer, location))
    return output_table

MY_FILE = out_file_name(FILENAME)

# with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out:
#     for line in race_table(f_in.readline()):
#         new_row = line
with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out:
    CONTENT = csv.reader(f_in)
    # print(content)
    FILE_CONTENTS = race_table(CONTENT)
    # print new_name
    f_out.write(str(FILE_CONTENTS))


if __name__ == '__main__':
    pass

Sayth



More information about the Python-list mailing list