csv read clean up and write out to csv

Hans Mulder hansmu at xs4all.nl
Fri Nov 2 15:51:55 EDT 2012


On 2/11/12 18:25:09, Sacha Rook wrote:
> I have a problem with a csv file from a supplier, so they export data to csv
> however the last column in the record is a description which is marked up
> with html.
> 
> trying to automate the processing of this csv to upload elsewhere in a
> useable format. If i open the csv with csved it looks like all the records
> aren't escaped correctly as after a while i find html tags and text on the
> next line/record.

The example line you gave was correctly escaped: the description starts
with a double quote, and ends several lines later with another double
quote.  Double quotes in the HTML are represented by '"'.

Maybe csved doesn't recognize this escape convention?

> If I 'openwith' excel the description stays on the correct line/record?

Excel implements this convention

> I want to use python to read these records in and output a valid csv with
> the descriptions intact preferably without the html tags so a string of
> text formatted with newline/CR where appropriate.

How about this:

import csv

infile = file("input.csv", "rb")
outfile = file("output.csv", "wb")

reader = csv.reader(infile)
writer = csv.writer(outfile)

for line in reader:
    line[-1] = line[-1].replace("\n", " ")
    print line
    writer.writerow(line)

infile.close()
outfile.close()


That will replace the newlines inside the HTML, which your csved
doesn't seem to recognize, by spaces.  When viewed as HTML code,
spaces have the same effect as newlines, so this replacement
shouldn't alter the meaning of the HTML text.

Hope this helps,

-- HansM



More information about the Python-list mailing list