[Python-ideas] csv.DictReader could handle headers more intelligently.

Thu Jan 24 13:38:58 CET 2013

Le Thu, 24 Jan 2013 22:33:07 +1000,
Nick Coghlan <ncoghlan at gmail.com> a
écrit :
> On Thu, Jan 24, 2013 at 9:55 PM, Shane Green
> <shane at umbrellacode.com> wrote:
> > Not sure if I'm reading the discussion correctly, but it sounds
> > like there's discussion about whether swallowing CSV values when
> > confronted with multiple columns by the same name, which seems very
> > incorrect if so.  CSV doesn't even mandate column headers exist at
> > all, as far as I know.  If anything I would think mapping column
> > positions to header values would make sense, such that
> > header.items() -> [(0, header1), (1, header2), (2, header3), etc.],
> > and header1 and header2 could be equal.  To work with rows as
> > dictionaries they can follow the FieldStorage model and have lists
> > of values–either when there's a collision, or always–so all column
> > values are contained.
> 
> That's not quite the discussion. The discussion is specifically about
> *DictReader*, and whether it should:
> 
> 1. Do any data conditioning by ignoring empty lines and lines of just
> field delimiters before the header row (consensus seems to be "no")
> 2. Give an error when encountering a duplicate field name (which will
> lead to data loss when reading from the file) (consensus seems to be
> "yes")
> 
> The problem with the latter suggestion is that it's a backwards
> incompatible change - code where "use the last column with that name"
> is the correct behaviour currently works, but would be broken if that
> situation was declared an error.

It's not really a problem if the new behaviour is conditioned by a
constructor argument.

Regards

Antoine.