Trying to fix Invalid CSV File

Larry Bates larry.bates at websafe.com`
Mon Aug 4 16:56:15 EDT 2008


Ryan Rosario wrote:
> On Aug 4, 8:30 am, Emile van Sebille <em... at fenx.com> wrote:
>> John Machin wrote:
>>> On Aug 4, 6:15 pm, Ryan Rosario <uclamath... at gmail.com> wrote:
>>>> On Aug 4, 1:01 am, John Machin <sjmac... at lexicon.net> wrote:
>>>>> On Aug 4, 5:49 pm, Ryan Rosario <uclamath... at gmail.com> wrote:
>>>>>> Thanks Emile! Works almost perfectly, but is there some way I can
>>>>>> adapt this to quote fields that contain a comma in them?
>> <snip>
>>
>>> Emile's snippet is pushing it through the csv reading process, to
>>> demonstrate that his series of replaces works (on your *sole* example,
>>> at least).
>> Exactly -- just print out the results of the passed argument:
>>
>>  >>>
>> rec.replace(',"',",'''").replace('",',"''',").replace('"','""').replace("'''",'"')
>>
>> '123,"Here is some, text ""and some quoted text"" where the quotes
>> should have been doubled",321'
>>
>> Where it won't work is if any of the field embedded quotes are next to
>> commas.
>>
>> I'd run it against the file.  Presumably, you've got a consistent field
>> count expectation per record.  Any resulting record not matching is
>> suspect and will identify records this approach won't address.
>>
>> There's probably better ways, but sometimes it's fun to create
>> executable line noise.  :)
>>
>> Emile
> 
> Thanks for your responses. I think John may be right that I am reading
> it a second time. I will take a look at the CSV reader documentation
> and see if that helps. Then once I run it I can see if I need to worry
> about the comma-next-to-quote issue.

This is a perfect demonstration of why tab delimited files are so much better 
than comma and quote delimited.  Virtually all software can handle table 
delimited as well as comma and quote delimited, but you would have none of these 
problems if you had used tab delimited.  The chances of tabs being embedded in 
most data is virtually nil.

-Larry



More information about the Python-list mailing list