[Csv] Devil in the details, including the small one between delimiters and quotechars

Cliff Wells LogiplexSoftware at earthlink.net
Wed Jan 29 17:58:37 CET 2003


Okay, despite claims to the contrary, Pure Evil can in fact be broken
down into little bits and stored in ASCII files.

This spaces around quoted data bit is starting to bother me.  Consider
the following:

1, "not quoted","quoted"

It seems reasonable to parse this as:

[1, ' "not quoted"', "quoted"]

which is the described Excel behavior.

Now consider

1,"not quoted" ,"quoted"

Is the second field quoted or not?  If it is, do we discard the
extraneous whitespace following it or raise an exception?

Worse, consider this

"quoted", "not quoted, but this ""field"" has delimiters and quotes"

How should this parse?  I say free exceptions for everyone.

While we're on the topic, I heard back from my DSV user who had
mentioned this corner case of spaces between delimiters and quotes and
he admitted that the files were created by hand, by him (figures), he
seems to recall some now forgotten application that may have done this
but wasn't sure.  His memory was vague on whether he saw it on a PC or
in a barn eating hay.

I propose space between delimiters and quotes raise an exception and
let's be done with it.  I don't think this really affects Excel
compatibility since Excel will never generate this type of file and
doesn't require it for import.  It's true that some files that Excel
would import (probably incorrectly) won't import in CSV, but I think
that's outside the scope of Excel compatibility.


Anyway, I know no one has said "On your mark, get set" yet, but I can't
think without code sitting in front of me, breaking worse with every
keystroke, so in addition to creating some test cases, I've hacked up a
very preliminary CSV module so we have something to play with.  I was up
til 6am so if there's anything odd, I blame it on lack of sleep and the
feverish optimism and glossing of detail that comes with it.  Note that
while the entire test.csv gets imported without exception, the last few
lines aren't parsed correctly.  At least, I don't think they are.  I
can't remember now.  Also, this code is based upon what was discussed up
until yesterday when I went home, so recent conversations may not be
reflected.

Mercilessly disect away.

-- 
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308  (800) 735-0555 x308
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CSV.py
Type: text/x-python
Size: 5570 bytes
Desc: not available
Url : http://mail.python.org/pipermail/csv/attachments/20030129/d80b9ba0/attachment.py 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.csv
Type: text/x-comma-separated-values
Size: 720 bytes
Desc: not available
Url : http://mail.python.org/pipermail/csv/attachments/20030129/d80b9ba0/attachment.bin 


More information about the Csv mailing list