DSVWizard.py

Cliff Wells LogiplexSoftware at earthlink.net
Mon Jan 27 02:47:46 CET 2003


On Sun, 2003-01-26 at 16:33, Skip Montanaro wrote:
> I'm adding Dave Cole to the distribution list on this note.  Dave, Kevin
> Altis, Cliff Wells (author of DSV) and I have exchanged a few messages about
> trying to develop a CSV API for Python.
> 
>     >> I suspect most of the differences I see between the DSV and csv
>     >> modules are due to interpretation differences between Cliff and Dave.
> 
>     Cliff> Or a bug in an older version of DSV.  If you have anything that
>     Cliff> differs using 1.4, please pass it on so I can take a look at it.
> 
> I downloaded 1.4 just now.  The sfsample.csv file is now processed
> identically by the two modules.  The nastiness.csv file generates three
> differences though:
> 
>     % python shootout.py nastiness.csv 
>     DSV: 0.01 seconds, 13 rows
>     csv: 0.00 seconds, 13 rows
>     2
>     DSV: ['Test 1', 'Fred said "hey!", and left the room', '']
>     csv: ['Test 1', ' "Fred said ""hey!""', ' and left the room"', ' ""']

IMO, Dave's is incorrect in this one (unless he has specific reasons
otherwise).  The original line (from the csv file) is:

Test 1, "Fred said ""hey!"", and left the room", ""

The "" at the end is an empty, quoted field.  Maybe someone should run
this through Excel to see what it claims (I'd be willing to accept
Dave's interpretation if Excel does it this way, although I'd still feel
it was incorrect).  I handled this case specifically at a user's
request.

>     10
>     DSV: ['Test 9', 'no spaces around this', ' but single spaces around this ']
>     csv: ['Test 9', ' "no spaces around this" ', ' but single spaces around this ']
>     12
>     DSV: ['Test 11', 'has no spaces around anything', 'because the data is quoted']
>     csv: ['   "Test 11"  ', '   "has no spaces around anything"   ', '   "because the data is quoted"    ']
> 
> All the three lines have white space immediately following separating
> commas.  DSV appears to skip over this white space, while csv treats it as
> part of the field contents.

Again, this was at a user's request, and is special-case code in DSV
that can easily be removed.  The user noted, and I concurred, that given
a quoted field with whitespace around it, the whitespace should be
ignored.  However, once again I'd be willing to follow Excel's lead in
this because I'd also consider this to be malformed or at least
ambiguous data.

> 
> Skip
> 
> PS, Just so Dave has the same "test harness", I've attached shootout.py and
> nastiness.csv.  The shootout.py script now assumes DSV is installed with the
> package structure of DSV 1.4.0.
-- 
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308  (800) 735-0555 x308




More information about the Csv mailing list