[CSV] Re: First Cut at CSV PEP

Cliff Wells LogiplexSoftware at earthlink.net
Wed Jan 29 01:03:45 CET 2003


On Tue, 2003-01-28 at 15:28, Dave Cole wrote:

> I suppose that exporting should raise an exception if you specify any
> variation on the dialect in the writer function.
> 
>     csvwriter = csv.writer(file("newnastiness.csv", "w"),
>                            dialect='excel2000', delimiter='"')
> 
> That should raise an exception.

I still don't see a good reason for this.  The programmer asked for it,
let her do it.  I don't see a problem with letting the programmer shoot
herself in the foot, as long as the gun doesn't start out pointing at
it.

> This probably shouldn't raise an exception though:
> 
>     csvwriter = csv.writer(file("newnastiness.csv", "w"),
>                            dialect='excel2000')
>     csvwriter.setparams(delimiter='"')

While this provides a workaround, it also seems a bit non-obvious why
this should work when passing delimiter as an argument raises an
exception.  I'm not dead-set against it, its JMHO.

> >> I think that we need some way to handle a potentially different set
> >> of options on each dialect.
> 
> Kevin> I'm not real comfortable with the dialect idea, it doesn't seem
> Kevin> to add any value over simply specifying a separator and
> Kevin> delimiter.
> 
> It makes thing *a lot* easier for module users who are not fully
> conversant in the vagaries of CSV.

I agree.

> Kevin> The CR, CR/LF, and LF line endings probably have something to
> Kevin> do with saving in Mac format, but it may also do some 8-bit
> Kevin> character translation.
> 
> Should we be trying to handle unicode.  I think we should since Python
> is now unicode capable.

What issues is unicode support going to raise?

> Kevin> The universal readlines support in Python 2.3 may impact the
> Kevin> use of a file reader/writer when processing different text
> Kevin> files, but would returns or newlines within a field be
> Kevin> impacted? Should the PEP and API specify that the record
> Kevin> delimiter can be either CR, LF, or CR/LF, but use of those
> Kevin> characters inside a field requires the field to be quoted or an
> Kevin> exception will be thrown?
> 
> Should we raise an exception or just pass the data through?
> 
> If it is not a newline, then it is not a newline.

This seems like a particularly intractable problem.  If an file can't
decide what sort of newlines it is going to use, then I'm not convinced
it's the parser's problem.  

So the question becomes whether to except or pass through.  The two
things to consider in this case are:

1)  The data might be correct, in which case it should be passed through
2)  The target for the data might be someone's mission-critical SQL
server and we don't want to help them mung up their data.  An exception
would seem appropriate.

Frankly, I think I lean towards an exception on this one.  There are
enough text-processing tools available (dos2unix and kin) that someone
should be able to pre-process a CSV file that is raising exceptions and
get it into a form acceptable to the parser.  A little work up front is
far more acceptable than putting out a fire on someone's database.

-- 
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308  (800) 735-0555 x308




More information about the Csv mailing list