csv Parser Question - Handling of Double Quotes

Aaron Watters aaron.watters at gmail.com
Thu Mar 27 16:37:33 EDT 2008


> "this";"is";"a";"test"
>
> Resulting in an output of:
>
> ['this', 'is', 'a', 'test']
>
> However, if I modify the csv to:
>
> "t"h"is";"is";"a";"test"
>
> The output changes to:
>
> ['th"is"', 'is', 'a', 'test']

I'd be tempted to say that this is a bug,
except that I think the definition of "csv" is
informal, so the "bug/feature" distinction
cannot be exactly defined, unless I'm mistaken.

What I would do is write roll my own
parser using very simple python and check
that it works for the examples of interest.
If, for example, you can assume that the
delimiter will never occur inside the
payload and the payload contains no
"quoted" characters you could do something like:

==== cut
def trimQuotes(txt):
    txt = txt.strip()
    if txt:
        start = txt[0]
        end = txt[-1]
        if start==end and start in ('"', "'"):
            return txt[1:-1]
    return txt

def simpleCsv(lines, delimiter):
    for line in lines:
        fields = line.split(delimiter)
        fields = map(trimQuotes, fields)
        yield fields

def test():
    lines = ['"t"h"is";"is";"a";"test"']
    for fields in simpleCsv(lines, ';'):
        print fields

if __name__=="__main__":
    test()
=== cut

If you want fame and admiration you could fix
the arguably bug in the csv module and send
the patch to the python bugs mailing list.
However, I just had a perusal of csv.py....
good luck :).
   -- Aaron Watters

===
http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=too+general



More information about the Python-list mailing list