Some help in refining this regex for CSV files

Thu Dec 6 02:57:58 EST 2012

On 06/12/2012 07:21, Oltmans wrote:
> Hi guys,
>
> I've to deal with CSVs that look like following
>
> CSV (with one header and 3 legit rows where each legit row has 3 columns)
> ----
> Some info
> Date: 12/6/2012
> Author: Some guy
> Total records: 100
>
> header1, header2, header3
> one, two, three
> one, "Python is great, so are other languages, isn't ?", three
> one, two, 'some languages, are realyl beautiful\r\n, I really cannot deny \n this \t\t\t fact. \t\t\t\tthis fact alone is amazing'
> ----
>
> So inside this CSV, there will always be bad lines like the top 4 (they could end up in the beginning, in the middle and even in the last). So above sample, csv has 3 legit lines and a header. I want to read those three lines and here is a regex that I came up with (which clearly isn't working)
>
>      #print line
>      pattern = r"([^\t]+\t|,+)"
>      matches = re.match(pattern, line)
>
> Do you've any better ideas guys? I will really appreciate all help.
>

I'd simply use the csv module from the standard library to read your 
files, discarding anything that you regard as bad.  I'd certainly not 
use a regex for this.

-- 
Cheers.

Mark Lawrence.