[Python-ideas] csv.DictReader could handle headers more intelligently.

Thu Jan 24 17:12:09 CET 2013

On 2013-01-24 15:24, Chris Angelico wrote:
> On Fri, Jan 25, 2013 at 2:11 AM, J. Cliff Dyer <jcd at sdf.lonestar.org> wrote:
>> On Thu, 2013-01-24 at 13:38 +0100, Antoine Pitrou wrote:
>>> > 1. Do any data conditioning by ignoring empty lines and lines of
>>> > just field delimiters before the header row (consensus seems to be
>>> > "no")
>>
>> Well, I wouldn't necessarily say we have a consensus on this one.  This
>> idea received a +1 from Bruce Leban and an "I don't see any reason not
>> to" from Steven D'Aprano.
>
> I've been lurking this thread, but fwiw, I'd put +1 on ignoring empty
> lines/just delimiter lines. For a row of column headers, a completely
> blank line makes no sense. It's a backward-incompatible change, yes,
> but I can't imagine any code actively relying on this. ISTM this would
> probably be safe for a minor release (Python 3.4), though of course
> not for Python 3.3.1.
>
Ignoring empty lines before a header seems OK to me, but ignoring
just-delimiter lines doesn't.

To me, a just-delimiter line where it's expecting a header would mean
that all of the columns are unnamed, unless we insist that it's not a
header unless at least one column is named, and I don't think that that
should be the default behaviour.

As for duplicated columns names, I think that it should probably raise
an exception unless you've specified that duplicates should be put into
a list.