[Python-checkins] python/nondist/peps pep-0305.txt,1.7,1.8

Fri, 31 Jan 2003 13:49:35 -0800

Update of /cvsroot/python/python/nondist/peps
In directory sc8-pr-cvs1:/tmp/cvs-serv9751

Modified Files:
	pep-0305.txt 
Log Message:
various cleanups
expanded Rationale a tad
added Post-History date (announcing it in a moment)
added pointer to sandbox implementation
mentioned implementation in the (massive ;-) Testing section

Index: pep-0305.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0305.txt,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** pep-0305.txt	30 Jan 2003 13:34:29 -0000	1.7
--- pep-0305.txt	31 Jan 2003 21:49:32 -0000	1.8
***************
*** 12,16 ****
  Content-Type: text/x-rst
  Created: 26-Jan-2003
! Post-History: 

--- 12,16 ----
  Content-Type: text/x-rst
  Created: 26-Jan-2003
! Post-History: 31-Jan-2003

***************
*** 25,29 ****
  PEP defines an API for reading and writing CSV files which should make
  it possible for programmers to select a CSV module which meets their
! requirements.

--- 25,30 ----
  PEP defines an API for reading and writing CSV files which should make
  it possible for programmers to select a CSV module which meets their
! requirements.  It is accompanied by a corresponding module which
! implements the API.

***************
*** 47,55 ****
  CSV files:

! - Object Craft's CSV module [1]_

! - Cliff Wells's Python-DSV module [2]_

! - Laurence Tratt's ASV module [3]_

  Each has a different API, making it somewhat difficult for programmers
--- 48,56 ----
  CSV files:

! - Object Craft's CSV module [2]_

! - Cliff Wells' Python-DSV module [3]_

! - Laurence Tratt's ASV module [4]_

  Each has a different API, making it somewhat difficult for programmers
***************
*** 70,73 ****
--- 71,89 ----
  distribution.

+ CSV formats are not well-defined and different implementations have a
+ number of subtle corner cases.  It has been suggested that the "V" in
+ the acronym stands for "Vague" instead of "Values".  Different
+ delimiters and quoting characters are just the start.  Some programs
+ generate whitespace after the delimiter.  Others quote embedded
+ quoting characters by doubling them or prefixing them with an escape
+ character.  The list of weird ways to do things seems nearly endless.
+ 
+ Unfortunately, all this variability and subtlety means it is difficult
+ for programmers to reliably parse CSV files from many sources or
+ generate CSV files designed to be fed to specific external programs
+ without deep knowledge of those sources and programs.  This PEP and
+ the software which accompany it attempt to make the process less
+ fragile.
+ 

  Module Interface
***************
*** 77,81 ****
  writing.  The basic reading interface is::

!     reader(fileobj [, dialect='excel2000'] [optional keyword args])

  A reader object is an iterable which takes a file-like object opened
--- 93,98 ----
  writing.  The basic reading interface is::

!     obj = reader(fileobj [, dialect='excel2000']
!                  [optional keyword args])

  A reader object is an iterable which takes a file-like object opened
***************
*** 92,102 ****
  The writing interface is similar::

!     writer(fileobj [, dialect='excel2000'], [, fieldnames=list]
!            [optional keyword args])

  A writer object is a wrapper around a file-like object opened for
  writing.  It accepts the same optional keyword parameters as the
  reader constructor.  In addition, it accepts an optional fieldnames
! argument.  This is a list which defines the order of fields in the
  output file.  It allows the write() method to accept mapping objects
  as well as sequence objects.
--- 109,119 ----
  The writing interface is similar::

!     obj = writer(fileobj [, dialect='excel2000'], [, fieldnames=seq]
!                  [optional keyword args])

  A writer object is a wrapper around a file-like object opened for
  writing.  It accepts the same optional keyword parameters as the
  reader constructor.  In addition, it accepts an optional fieldnames
! argument.  This is a sequence that defines the order of fields in the
  output file.  It allows the write() method to accept mapping objects
  as well as sequence objects.
***************
*** 116,119 ****
--- 133,138 ----
          csvwriter.write(row)

+ or arrange for it to be the first row in the iterable being written.
+ 

  Dialects
***************
*** 123,141 ****
  convenient handle on a group of lower level parameters.

! When dialect is a string it identifies one of the dialect which is
  known to the module, otherwise it is processed as a dialect class as
  described below.
!  
  Dialects will generally be named after applications or organizations
  which define specific sets of format constraints.  The initial dialect
! is excel2000, which describes the format constraints of Excel 2000's
! CSV format.  Another possible dialect (used here only as an example)
! might be "gnumeric".

! Dialects are implemented as attribute only classes to enable user to
! construct variant dialects by subclassing.  The excel2000 dialect is
  implemented as follows::

!     class excel2000:
          quotechar = '"'
          delimiter = ','
--- 142,160 ----
  convenient handle on a group of lower level parameters.

! When dialect is a string it identifies one of the dialects which is
  known to the module, otherwise it is processed as a dialect class as
  described below.
! 
  Dialects will generally be named after applications or organizations
  which define specific sets of format constraints.  The initial dialect
! is "excel", which describes the format constraints of Excel 97 and
! Excel 2000 regarding CSV input and output.  Another possible dialect
! (used here only as an example) might be "gnumeric".

! Dialects are implemented as attribute only classes to enable users to
! construct variant dialects by subclassing.  The "excel" dialect is
  implemented as follows::

!     class excel:
          quotechar = '"'
          delimiter = ','
***************
*** 151,161 ****
          delimiter = '\t'

! Two functions are defined in the API to set and retrieve dialects::

      set_dialect(name, dialect)
      dialect = get_dialect(name)

  The dialect parameter is a class or instance whose attributes are the
! formatting parameters defined in the next section.

--- 170,184 ----
          delimiter = '\t'

! Three functions are defined in the API to set, get and list dialects::

      set_dialect(name, dialect)
      dialect = get_dialect(name)
+     known_dialects = list_dialects()

  The dialect parameter is a class or instance whose attributes are the
! formatting parameters defined in the next section.  The
! list_dialects() function returns all the registered dialect names as
! given in previous set_dialect() calls (both predefined and
! user-defined).

***************
*** 168,209 ****
  for the set_dialect() and get_dialect() module functions.

! - quotechar specifies a one-character string to use as the quoting
    character.  It defaults to '"'.

! - delimiter specifies a one-character string to use as the field
    separator.  It defaults to ','.

! - escapechar specifies a one character string used to escape the
    delimiter when quotechar is set to None.

! - skipinitialspace specifies how to interpret whitespace which
    immediately follows a delimiter.  It defaults to False, which means
!   that whitespace immediate following a delimiter is part of the
    following field.

! - lineterminator specifies the character sequence which should
    terminate rows.

! - quoting controls when quotes should be generated by the
!   writer.

!     "minimal" means only when required, for example, when a field
!     contains either the quotechar or the delimiter

!     "always" means that quotes are always placed around fields.

!     "nonnumeric" means that quotes are always placed around fields
!     which contain characters other than [+-0-9.].

! ... XXX More to come XXX ...

  When processing a dialect setting and one or more of the other
  optional parameters, the dialect parameter is processed first, then
  the others are processed.  This makes it easy to choose a dialect,
! then override one or more of the settings.  For example, if a CSV file
! was generated by Excel 2000 using single quotes as the quote
! character and TAB as the delimiter, you could create a reader like::

!     csvreader = csv.reader(file("some.csv"), dialect="excel2000",
                             quotechar="'", delimiter='\t')

--- 191,235 ----
  for the set_dialect() and get_dialect() module functions.

! - ``quotechar`` specifies a one-character string to use as the quoting
    character.  It defaults to '"'.

! - ``delimiter`` specifies a one-character string to use as the field
    separator.  It defaults to ','.

! - ``escapechar`` specifies a one character string used to escape the
    delimiter when quotechar is set to None.

! - ``skipinitialspace`` specifies how to interpret whitespace which
    immediately follows a delimiter.  It defaults to False, which means
!   that whitespace immediately following a delimiter is part of the
    following field.

! - ``lineterminator`` specifies the character sequence which should
    terminate rows.

! - ``quoting`` controls when quotes should be generated by the
!   writer.  It can take on any of the following module constants::

!     csv.QUOTE_MINIMAL means only when required, for example, when a
!     field contains either the quotechar or the delimiter

!     csv.QUOTE_ALL means that quotes are always placed around fields.

!     csv.QUOTE_NONNUMERIC means that quotes are always placed around
!     fields which contain characters other than [+-0-9.].

! - ``doublequote`` (tbd)
! 
! - are there more to come?

  When processing a dialect setting and one or more of the other
  optional parameters, the dialect parameter is processed first, then
  the others are processed.  This makes it easy to choose a dialect,
! then override one or more of the settings without defining a new
! dialect class.  For example, if a CSV file was generated by Excel 2000
! using single quotes as the quote character and TAB as the delimiter,
! you could create a reader like::

!     csvreader = csv.reader(file("some.csv"), dialect="excel",
                             quotechar="'", delimiter='\t')

***************
*** 212,219 ****

  Testing
  =======

! TBD.

--- 238,253 ----

+ Implementation
+ ==============
+ 
+ There is a sample implementation available.  [1]_ The goal is for it
+ to efficiently implement the API described in the PEP.  It is heavily
+ based on the Object Craft csv module. [2]_
+ 
+ 
  Testing
  =======

! The sample implementation [1]_ includes a set of test cases.

***************
*** 284,294 ****
  ==========

! .. [1] csv module, Object Craft
!    (http://www.object-craft.com.au/projects/csv) 

! .. [2] Python-DSV module, Wells
!    (http://sourceforge.net/projects/python-dsv/) 

! .. [3] ASV module, Tratt
     (http://tratt.net/laurie/python/asv/)

--- 318,331 ----
  ==========

! .. [1] csv module, Python Sandbox
!    (http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/nondist/sandbox/csv/)

! .. [2] csv module, Object Craft
!    (http://www.object-craft.com.au/projects/csv)

! .. [3] Python-DSV module, Wells
!    (http://sourceforge.net/projects/python-dsv/)
! 
! .. [4] ASV module, Tratt
     (http://tratt.net/laurie/python/asv/)