csv read clean up and write out to csv

Neil Cerutti neilc at norwich.edu
Fri Nov 2 15:33:41 EDT 2012


On 2012-11-02, Sacha Rook <sacharook at gmail.com> wrote:
> Hi
>
> I have a problem with a csv file from a supplier, so they
> export data to csv however the last column in the record is a
> description which is marked up with html.
>
> trying to automate the processing of this csv to upload
> elsewhere in a useable format. If i open the csv with csved it
> looks like all the records aren't escaped correctly as after a
> while i find html tags and text on the next line/record.

Maybe compose a simple parter to disambiguate the lines from the
file.

Something like (you'll have to write is_html, and my Python 2 is
mighty rusty, you'll have to fix up. Note that infile doesn't
have to be in binary mode with this scheme, but it would fail on
bizarre newlines in the file):

def parse_records(iter):
    for line in iter:
        if is_html(line):
	    yield ('html', line)
	else:
	    yield ('csv', csv.reader([line.strip()]).next())

infile = open('c:\data\input.csv')
outfile = open('c:\data\output.csv', 'wb')

writer = csv.writer(outfile)

for tag, rec in parse_record(infile):
    if tag == 'html':
        print rec
    elif tag == 'csv':
        writer.writerow(rec)
    else:
        raise ValueError("Unknown record type %s" % tag)

-- 
Neil Cerutti



More information about the Python-list mailing list