[Python-ideas] csv.DictReader could handle headers more intelligently.

Shane Green shane at umbrellacode.com
Fri Jan 25 01:05:43 CET 2013


If this is part of the same response…

> A row of delimiters should be treated by the reader object as a row with
> explicitly empty fields. If the caller wishes to discard them, they can.
> But the reader object shouldn't make that decision.
> 
> An empty row, on the other hand, should be just ignored. DictReader *already*
> ignores empty rows, provided that they are not in the first row.

Then I think my description was unclear.  I wasn't suggesting we add methods for manipulating individual headers, only for telling the DictParser to drop existing headers and reevaluate them on the next row.  To make it easy to do something like 

while not any(records.fieldnames):
	records.discard_fieldnames() # or something to that effect…

without changing any existing behaviour.






Shane Green 
www.umbrellacode.com
805-452-9666 | shane at umbrellacode.com

On Jan 24, 2013, at 3:15 PM, Steven D'Aprano <steve at pearwood.info> wrote:

> On 25/01/13 03:08, J. Cliff Dyer wrote:
>> On Thu, 2013-01-24 at 07:28 -0800, Shane Green wrote:
>>> Since every form of CSV file counts EOL as a line terminator, I think
>>> discarding empty lines preceding the headers is arguably acceptable,
>>> but do not think discarding lines of just delimiters would be.  What
>>> about extending the DictReader API so it was easy to perform these
>>> actions explicitly, such as being able to discard() the field names to
>>> be re-evaluated on the next line?
>> 
>> I think I like this idea.  There's something a little distasteful about
>> making the user manually delve into the underlying reader, but this
>> makes it more user-friendly and more obvious how to proceed.
> 
> I couldn't disagree more. I think:
> 
> - it adds burden to the caller, since the caller is now expected to manually
>  inspect the field names and decide whether some should be discarded;
> 
> - it is less obvious: *how* does the caller decide that there are too many
>  field names?
> 
> - incomplete: if there is a discard(), where is the add()?
> 
> - completely irrelevant for the topic being discussed ("DictReader should
>  ignore leading blank lines... I know, let's give the caller the ability
>  to *discard* field names" -- but auto-detecting *too many* field names is
>  not the problem);
> 
> - and being able to change the field names on the fly is so far beyond
>  anything required for ordinary CSV that it doesn't belong in the CSV
>  module.
> 
> 
>> For clarity's sake, what is your objection to discarding lines of
>> delimiters?  The reason I suggest doing it is that it is a common output
>> situation when exporting Excel files or LibreCalc files that have a
>> blank row at the top.
> 
> 
> A row of delimiters should be treated by the reader object as a row with
> explicitly empty fields. If the caller wishes to discard them, they can.
> But the reader object shouldn't make that decision.
> 
> An empty row, on the other hand, should be just ignored. DictReader *already*
> ignores empty rows, provided that they are not in the first row.
> 
> 
> 
> -- 
> Steven
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130124/007a05f6/attachment.html>


More information about the Python-ideas mailing list