csv Parser Question - Handling of Double Quotes

John Machin sjmachin at lexicon.net
Thu Mar 27 18:07:21 EDT 2008


On Mar 28, 8:40 am, jwbrow... at gmail.com wrote:
> On Mar 27, 1:53 pm, "Gabriel Genellina" <gagsl-... at yahoo.com.ar>
> wrote:
>
>
>
> > En Thu, 27 Mar 2008 17:37:33 -0300, Aaron Watters
> > <aaron.watt... at gmail.com> escribió:
>
> > >> "this";"is";"a";"test"
>
> > >> Resulting in an output of:
>
> > >> ['this', 'is', 'a', 'test']
>
> > >> However, if I modify the csv to:
>
> > >> "t"h"is";"is";"a";"test"
>
> > >> The output changes to:
>
> > >> ['th"is"', 'is', 'a', 'test']
>
> > > I'd be tempted to say that this is a bug,
> > > except that I think the definition of "csv" is
> > > informal, so the "bug/feature" distinction
> > > cannot be exactly defined, unless I'm mistaken.
>
> > AFAIK, the csv module tries to mimic Excel behavior as close as possible.
> > It has some test cases that look horrible, but that's what Excel does...
> > I'd try actually using Excel to see what happens.
> > Perhaps the behavior could be more configurable, like the codecs are.
>
> > --
> > Gabriel Genellina
>
> Thank you Aaron and Gabriel.  I was also hesitant to use the term
> "bug" since as you said CSV isn't a standard.  Yet in the same right I
> couldn't readily think of an instance where the quote should be
> removed if it's not sitting right next to the delimiter (or at the
> very beginning/end of the line).
>
> I'm not even sure if it should be patched since there could be cases
> where this is how people want it to behave and I wouldn't want their
> code to break.
>
> I think rolling out a custom class seems like the only solution but if
> anyone else has any other advice I'd like to hear it.
>

I have code in awk, C, and Python for reading bad-CSV data under the
assumptions (1) no embedded newlines (2) embedded quotes are not
doubled as they should be (3) there is an even number of quotes in
each original field (4) the caller prefers an exception or error
return when there is anomalous data.





More information about the Python-list mailing list