Trying to fix Invalid CSV File

John Machin sjmachin at lexicon.net
Mon Aug 4 05:34:43 EDT 2008


On Aug 4, 6:15 pm, Ryan Rosario <uclamath... at gmail.com> wrote:
> On Aug 4, 1:01 am, John Machin <sjmac... at lexicon.net> wrote:
>
> > On Aug 4, 5:49 pm, Ryan Rosario <uclamath... at gmail.com> wrote:
>
> > > Thanks Emile! Works almost perfectly, but is there some way I can
> > > adapt this to quote fields that contain a comma in them?
>
> > You originally said "I have a very large CSV file that contains double
> > quoted fields (since they contain commas)". Are you now saying  that
> > if a field contained a comma, you didn't wrap the field in quotes? Or
> > is this a separate question unrelated to your original problem?
>
> I enclosed all text fields within quotes. The problem is that I have
> quotes embedded inside those text fields as well and I did not double/
> escape them. Emile's snippet takes care of the escaping but it strips
> the outer quotes from the text fields and if there are commas inside
> the text field, the field is split into multiple fields. Of course, it
> is possible that I am not using the snippet correctly I suppose.

Without you actually showing how you are using it, I can only surmise:

Emile's snippet is pushing it through the csv reading process, to
demonstrate that his series of replaces works (on your *sole* example,
at least). Note carefully his output for one line is a *list* of
fields. The repr() of that list looks superficially like a line of csv
input. It looks like you are csv-reading it a second time, using
quotechar="'", after stripping off the enclosing []. If this guess is
not correct, please show what you are actually doing.

If (as you said) you require a fixed csv file, you need to read the
bad file line by line, use Emile's chain of replaces, and write each
fixed line out to the new file.



More information about the Python-list mailing list