csv read clean up and write out to csv
Neil Cerutti
neilc at norwich.edu
Fri Nov 2 15:33:41 EDT 2012
On 2012-11-02, Sacha Rook <sacharook at gmail.com> wrote:
> Hi
>
> I have a problem with a csv file from a supplier, so they
> export data to csv however the last column in the record is a
> description which is marked up with html.
>
> trying to automate the processing of this csv to upload
> elsewhere in a useable format. If i open the csv with csved it
> looks like all the records aren't escaped correctly as after a
> while i find html tags and text on the next line/record.
Maybe compose a simple parter to disambiguate the lines from the
file.
Something like (you'll have to write is_html, and my Python 2 is
mighty rusty, you'll have to fix up. Note that infile doesn't
have to be in binary mode with this scheme, but it would fail on
bizarre newlines in the file):
def parse_records(iter):
for line in iter:
if is_html(line):
yield ('html', line)
else:
yield ('csv', csv.reader([line.strip()]).next())
infile = open('c:\data\input.csv')
outfile = open('c:\data\output.csv', 'wb')
writer = csv.writer(outfile)
for tag, rec in parse_record(infile):
if tag == 'html':
print rec
elif tag == 'csv':
writer.writerow(rec)
else:
raise ValueError("Unknown record type %s" % tag)
--
Neil Cerutti
More information about the Python-list
mailing list