Finding Blank Columns in CSV

MRAB python at mrabarnett.plus.com
Tue Oct 6 11:03:56 EDT 2015


On 2015-10-06 12:24, Jaydip Chakrabarty wrote:
> On Tue, 06 Oct 2015 01:34:17 +1100, Chris Angelico wrote:
>
>> On Tue, Oct 6, 2015 at 1:06 AM, Tim Chase
>> <python.list at tim.thechases.com> wrote:
>>> That way, if you determine by line 3 that your million-row CSV file has
>>> no blank columns, you can get away with not processing all million
>>> rows.
>>
>> Sure, although that effectively means the entire job is moot. I kinda
>> assume that the OP knows that there are some blank columns (maybe lots
>> of them). The extra check is unnecessary unless it's actually plausible
>> that there'll be no blanks whatsoever.
>>
>> Incidentally, you have an ordered_headers list which is the blank
>> columns in order; I think the OP was looking for a list of the
>> _non_blank columns. But that's a trivial difference, easy to tweak.
>>
>> ChrisA
>
> Thanks to you all. I got it this far. But while writing back to another
> csv file, I got this error - "ValueError: dict contains fields not in
> fieldnames: None". Here is my code.
>
> rdr = csv.DictReader(fin, delimiter=',')
> header_set = set(rdr.fieldnames)

Initially, header_set contains all of the field names.

> for r in rdr:
>      header_set = set(h for h in header_set if not r[h])

Keeping the field name if the field is empty.

>      if not header_set:
>          break
>
At this point, header_set will contain the field names where none of
its values are empty.

Wasn't the original question about excluding columns where all of the
values are empty? You're excluding columns where _any_ of the values
are empty.

> for r in rdr:
>      data = list(r[i] for i in header_set)

data will contain each processed row in turn. Because of the
indentation, only the final data (row) will be would be written out.

>
> dw = csv.DictWriter(fout, header_set)
> dw.writeheader()
> dw.writerows(data)
>
> Also, there is difference between len(header_set) and len(data[0].keys).
> Why is so?
> Thanks again for all your help.
>
> Thanks.
>




More information about the Python-list mailing list