First Cut at CSV PEP

Cliff Wells LogiplexSoftware at earthlink.net
Tue Jan 28 22:17:32 CET 2003


On Mon, 2003-01-27 at 20:56, Dave Cole wrote:

> I only have one issue with the PEP as it stands.  It is still aiming
> too low.  One of the things that we support in our parser is the
> ability to handle CSV without quote characters.
> 
>         field1,field2,field3\, field3,field4
> 
> One of our customers has data like the above.  To handle this we would
> need something like the following:
> 
>     # Use the 'raw' dialect to get access to all tweakables.
>     writer(fileobj,
>            dialect='raw', quotechar=None, delimiter=',', escapechar='\\')

+1 on escapechar, -1 on 'raw' dialect.

Why would a 'raw' dialect be needed?  It isn't clear to me why
escapechar would be mutually exclusive with any particular dialect. 
Further, not specifying a dialect (dialect=None) should be the default
which would seem the same as 'raw'.

> I think that we need some way to handle a potentially different set of
> options on each dialect.

I'm not understanding how this is different from Skip's suggestion to
use

reader(fileobj, dialect="excel2000", delimiter='\t')

Or are you suggesting that not all options would be available on all
dialects?  Can you suggest an example?

> When you CSV export from Excel, do you have the ability to use a
> delimiter other than comma?  Do you have the ability to change the
> quotechar?

I think it is an option to save as a TSV file (IIRC), which is the same
as a CSV file, but with tabs.

> Should the wrapper protect you from yourself so that when you select
> the Excel dialect you are limited to the options available within
> Excel?

No.  I think this would be unnecessarily limiting.

> Maybe the dialect should not limit you, it should just provide the
> correct defaults.

This is what I'm thinking.

> Since we are going to have one parsing engine in an extension module
> below the Python layer, we are probably going to evolve more tweakable
> settings in the parser over time.  It would be nice if we could hide
> new tweakables from application code by associating defaults values
> with dialect names in the Python layer.  We should not be exposing the
> low level parser interface to user code if it can be avoided.

+1

-- 
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308  (800) 735-0555 x308




More information about the Csv mailing list