[Python-ideas] csv.DictReader could handle headers more intelligently.

Fri Jan 25 00:53:51 CET 2013

On 25/01/13 02:11, J. Cliff Dyer wrote:
> On Thu, 2013-01-24 at 13:38 +0100, Antoine Pitrou wrote:
>>> 1. Do any data conditioning by ignoring empty lines and lines of
>>> just field delimiters before the header row (consensus seems to be
>>> "no")
>
> Well, I wouldn't necessarily say we have a consensus on this one.  This
> idea received a +1 from Bruce Leban and an "I don't see any reason not
> to" from Steven D'Aprano.
>
> Objections are:
>
> 1. It's a backwards-incompatible change.

All bug fixes are backwards-incompatible changes. The question is, is
there anyone relying on this behaviour?

DictReader already ignores blank lines, *except for the very first line*.
Using Python 3.3:

py> from io import StringIO
py> from csv import DictReader
py> data = StringIO('spam,ham,eggs\n\n\n\n1,2,3\n\n\n\n\n4,5,6\n')
py> x = csv.DictReader(data)
py> next(x)
{'eggs': '3', 'ham': '2', 'spam': '1'}
py> next(x)
{'eggs': '6', 'ham': '5', 'spam': '4'}

I don't expect that there is anyone relying on a CSV file with a leading
blank line to be treated as one having no columns at all:

py> data = StringIO('\n\n\n\nspam,ham,eggs\n1,2,3\n4,5,6\n')
py> x = DictReader(data)
py> next(x)
{None: ['spam', 'ham', 'eggs']}
py> x.fieldnames
[]

I expect that there is probably code that works around this issue, by
skipping blank lines somehow, e.g.

DictReader(row for row in data if row.strip())

These work-arounds may (or not) be fragile or buggy, but they ought
to continue working even if DictReader changes its header detection.

-- 
Steven