[Python-ideas] csv.DictReader could handle headers more intelligently.

Wed Jan 23 02:51:38 CET 2013

On Jan 23, 11:06 am, "J. Cliff Dyer" <j... at sdf.lonestar.org> wrote:
> I'm working with some poorly-formed CSV files, and I noticed that
> DictReader always and only pulls headers off of the first row.  But many
> of the files I see have blank lines before the row of headers, sometimes
> with commas to the appropriate field count, sometimes without.  The
> current implementation's behavior in this case is likely never correct,
> and certainly always annoying.

I don't think we should start adding support for every malformed type
of csv file that exists. It's easy enough to remove the unnecessary
lines yourself before passing them to DictReader:

    from csv import DictReader

    with open('malformed.csv','rb') as csvfile:
        csvlines = list(l for l in csvfile if l.strip())
        csvreader = DictReader(csvlines)

Personally, if I was dealing with this as often as you are, I'd
probably make a custom context manager instead. The problem lies in
the files themselves, not in csv's response to them.