[Csv] Status

Thu Jan 30 04:12:54 CET 2003

>I'd like to get the wording in the PEP to converge on our current thoughts
>and announce it on c.l.py and python-dev sometime tomorrow.  I think we will
>get a lot of feedback from both camps, hopefully some of it useful. ;-)
>
>Sound like a plan?

Yep, pending an ACK from the others.

>I just finished making a pass through the messages I hadn't deleted (and
>then saved them to a csv mbox file since the list appears to still not be
>archiving).  Here's what I think we've concluded:

I have all the messages archived, which I can forward to you in a
convenient form for feeding to mailman.

>    * Dialects are a set of defaults, probably implemented as classes (which
>      allows subclassing, whereas dicts wouldn') and the default dialect
>      named as something like csv.dialects.excel or "excel" if we allow
>      string specifiers.  (I think strings work well at the API, simply
>      because they are shorter and can more easily be presented in GUI
>      tools.)

I think you are right - we need strings as well, and a way to list
them. But exposing the "dialects are classes" to the user of the module
is valuable.

I'd vote +1 on giving the class a "name" attribute, and the dialects
should probably share a common null root class (say "dialect") - the
"list_dialects()" function could then walk the csv.dialects namespace
returning the names of any classses found that are subclasses of dialect?

>    * These individual parameters are necessary (hopefully the names will be
>      enough clue as to there meaning): quote_char, quoting ("auto",
>      "always", "nonnumeric", "never"), delimiter, line_terminator,
>      skip_whitespace, escape_char, hard_return.  Are there others?

Not that I can think of at the moment. As other dialects appear, we may
want to add new paramaters anyway.

>    * We're still undecided about None (I certainly don't think it's a valid
>      value to be writing to CSV files)

I suspect we're in violent agreement? If the user happens to pass None,
it should be written as a null field. On input, a null field should be
returned as a zero length string. Is that what you were suggesting?

>    * Don't raise exceptions needlessly.  For example, specifying
>      quoting="never" and not specifying a value for escape_char would be
>      okay until you encounter a field when writing which contains the
>      delimiter.

I don't like this specific one. Because it depends on the data, the
module user may not pick up their error during testing. Better to raise
an exception immediately if we know the format is invalid.

This is an argument I have over and over - I believe it's nearly always
better to push errors back towards their source. In spite of how it
sounds, this isn't really at odds with "be liberal in what you accept,
be strict in what you generate".

>    * Files have to be opened in binary mode (we can check the mode
>      attribute I believe) so we can do the right thing with line
>      terminators.

We need to be a little careful when using uncommon interfaces on the
file class, because file-like classes may not have implemented them
(for example, StringIO doesn't have the mode attribute).

>    * Data values should always be returned as strings, even if they are
>      valid numbers.  Let the application do data conversion.

Yes. +1

>Other stuff we haven't talked about much:
>
>    * Unicode.  I think we punt on this for now and just pretend that
>      passing codecs.open(csvfile, mode, encoding) is sufficient.  I'm sure
>      Martin von Löwis will let us know if it isn't. ;-) Dave said, "The low
>      level parser (C code) is probably going to need to handle unicode."
>      Let's wait and see how well codecs.open() works for us.

I'm almost 100% certain the C code will need work. But it should the
sort of work that can be done without disturbing the interface too much?

>    * We know we need tests but haven't talked much about them.  I vote for
>      PyUnit as much as possible, though a certain amount of manual testing
>      using existing spreadsheets and databases will be required.

This is the big one - tests are absolutely essential. 

I put a bit of effort into coming up with a bunch of "this is how Excel
does it with this unusual case" tests for our csv module - we can use
this as a start.

I haven't investigated how the official python test harness works - it
predates pyunit.

>    * Exceptions.  We know we need some.  We should start with CSVError and
>      try to avoid getting carried away with things.  If need be, we can add
>      a code field to the class.  I don't like the idea of having 17
>      different subclasses of CSVError though.  It's too much complexity for
>      most users.

Agreed.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
_______________________________________________
Csv mailing list
Csv at mail.mojam.com
http://manatee.mojam.com/mailman/listinfo/csv