[Python-checkins] python/nondist/peps pep-0305.txt,1.7,1.8
montanaro@users.sourceforge.net
montanaro@users.sourceforge.net
Fri, 31 Jan 2003 13:49:35 -0800
Update of /cvsroot/python/python/nondist/peps
In directory sc8-pr-cvs1:/tmp/cvs-serv9751
Modified Files:
pep-0305.txt
Log Message:
various cleanups
expanded Rationale a tad
added Post-History date (announcing it in a moment)
added pointer to sandbox implementation
mentioned implementation in the (massive ;-) Testing section
Index: pep-0305.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0305.txt,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** pep-0305.txt 30 Jan 2003 13:34:29 -0000 1.7
--- pep-0305.txt 31 Jan 2003 21:49:32 -0000 1.8
***************
*** 12,16 ****
Content-Type: text/x-rst
Created: 26-Jan-2003
! Post-History:
--- 12,16 ----
Content-Type: text/x-rst
Created: 26-Jan-2003
! Post-History: 31-Jan-2003
***************
*** 25,29 ****
PEP defines an API for reading and writing CSV files which should make
it possible for programmers to select a CSV module which meets their
! requirements.
--- 25,30 ----
PEP defines an API for reading and writing CSV files which should make
it possible for programmers to select a CSV module which meets their
! requirements. It is accompanied by a corresponding module which
! implements the API.
***************
*** 47,55 ****
CSV files:
! - Object Craft's CSV module [1]_
! - Cliff Wells's Python-DSV module [2]_
! - Laurence Tratt's ASV module [3]_
Each has a different API, making it somewhat difficult for programmers
--- 48,56 ----
CSV files:
! - Object Craft's CSV module [2]_
! - Cliff Wells' Python-DSV module [3]_
! - Laurence Tratt's ASV module [4]_
Each has a different API, making it somewhat difficult for programmers
***************
*** 70,73 ****
--- 71,89 ----
distribution.
+ CSV formats are not well-defined and different implementations have a
+ number of subtle corner cases. It has been suggested that the "V" in
+ the acronym stands for "Vague" instead of "Values". Different
+ delimiters and quoting characters are just the start. Some programs
+ generate whitespace after the delimiter. Others quote embedded
+ quoting characters by doubling them or prefixing them with an escape
+ character. The list of weird ways to do things seems nearly endless.
+
+ Unfortunately, all this variability and subtlety means it is difficult
+ for programmers to reliably parse CSV files from many sources or
+ generate CSV files designed to be fed to specific external programs
+ without deep knowledge of those sources and programs. This PEP and
+ the software which accompany it attempt to make the process less
+ fragile.
+
Module Interface
***************
*** 77,81 ****
writing. The basic reading interface is::
! reader(fileobj [, dialect='excel2000'] [optional keyword args])
A reader object is an iterable which takes a file-like object opened
--- 93,98 ----
writing. The basic reading interface is::
! obj = reader(fileobj [, dialect='excel2000']
! [optional keyword args])
A reader object is an iterable which takes a file-like object opened
***************
*** 92,102 ****
The writing interface is similar::
! writer(fileobj [, dialect='excel2000'], [, fieldnames=list]
! [optional keyword args])
A writer object is a wrapper around a file-like object opened for
writing. It accepts the same optional keyword parameters as the
reader constructor. In addition, it accepts an optional fieldnames
! argument. This is a list which defines the order of fields in the
output file. It allows the write() method to accept mapping objects
as well as sequence objects.
--- 109,119 ----
The writing interface is similar::
! obj = writer(fileobj [, dialect='excel2000'], [, fieldnames=seq]
! [optional keyword args])
A writer object is a wrapper around a file-like object opened for
writing. It accepts the same optional keyword parameters as the
reader constructor. In addition, it accepts an optional fieldnames
! argument. This is a sequence that defines the order of fields in the
output file. It allows the write() method to accept mapping objects
as well as sequence objects.
***************
*** 116,119 ****
--- 133,138 ----
csvwriter.write(row)
+ or arrange for it to be the first row in the iterable being written.
+
Dialects
***************
*** 123,141 ****
convenient handle on a group of lower level parameters.
! When dialect is a string it identifies one of the dialect which is
known to the module, otherwise it is processed as a dialect class as
described below.
!
Dialects will generally be named after applications or organizations
which define specific sets of format constraints. The initial dialect
! is excel2000, which describes the format constraints of Excel 2000's
! CSV format. Another possible dialect (used here only as an example)
! might be "gnumeric".
! Dialects are implemented as attribute only classes to enable user to
! construct variant dialects by subclassing. The excel2000 dialect is
implemented as follows::
! class excel2000:
quotechar = '"'
delimiter = ','
--- 142,160 ----
convenient handle on a group of lower level parameters.
! When dialect is a string it identifies one of the dialects which is
known to the module, otherwise it is processed as a dialect class as
described below.
!
Dialects will generally be named after applications or organizations
which define specific sets of format constraints. The initial dialect
! is "excel", which describes the format constraints of Excel 97 and
! Excel 2000 regarding CSV input and output. Another possible dialect
! (used here only as an example) might be "gnumeric".
! Dialects are implemented as attribute only classes to enable users to
! construct variant dialects by subclassing. The "excel" dialect is
implemented as follows::
! class excel:
quotechar = '"'
delimiter = ','
***************
*** 151,161 ****
delimiter = '\t'
! Two functions are defined in the API to set and retrieve dialects::
set_dialect(name, dialect)
dialect = get_dialect(name)
The dialect parameter is a class or instance whose attributes are the
! formatting parameters defined in the next section.
--- 170,184 ----
delimiter = '\t'
! Three functions are defined in the API to set, get and list dialects::
set_dialect(name, dialect)
dialect = get_dialect(name)
+ known_dialects = list_dialects()
The dialect parameter is a class or instance whose attributes are the
! formatting parameters defined in the next section. The
! list_dialects() function returns all the registered dialect names as
! given in previous set_dialect() calls (both predefined and
! user-defined).
***************
*** 168,209 ****
for the set_dialect() and get_dialect() module functions.
! - quotechar specifies a one-character string to use as the quoting
character. It defaults to '"'.
! - delimiter specifies a one-character string to use as the field
separator. It defaults to ','.
! - escapechar specifies a one character string used to escape the
delimiter when quotechar is set to None.
! - skipinitialspace specifies how to interpret whitespace which
immediately follows a delimiter. It defaults to False, which means
! that whitespace immediate following a delimiter is part of the
following field.
! - lineterminator specifies the character sequence which should
terminate rows.
! - quoting controls when quotes should be generated by the
! writer.
! "minimal" means only when required, for example, when a field
! contains either the quotechar or the delimiter
! "always" means that quotes are always placed around fields.
! "nonnumeric" means that quotes are always placed around fields
! which contain characters other than [+-0-9.].
! ... XXX More to come XXX ...
When processing a dialect setting and one or more of the other
optional parameters, the dialect parameter is processed first, then
the others are processed. This makes it easy to choose a dialect,
! then override one or more of the settings. For example, if a CSV file
! was generated by Excel 2000 using single quotes as the quote
! character and TAB as the delimiter, you could create a reader like::
! csvreader = csv.reader(file("some.csv"), dialect="excel2000",
quotechar="'", delimiter='\t')
--- 191,235 ----
for the set_dialect() and get_dialect() module functions.
! - ``quotechar`` specifies a one-character string to use as the quoting
character. It defaults to '"'.
! - ``delimiter`` specifies a one-character string to use as the field
separator. It defaults to ','.
! - ``escapechar`` specifies a one character string used to escape the
delimiter when quotechar is set to None.
! - ``skipinitialspace`` specifies how to interpret whitespace which
immediately follows a delimiter. It defaults to False, which means
! that whitespace immediately following a delimiter is part of the
following field.
! - ``lineterminator`` specifies the character sequence which should
terminate rows.
! - ``quoting`` controls when quotes should be generated by the
! writer. It can take on any of the following module constants::
! csv.QUOTE_MINIMAL means only when required, for example, when a
! field contains either the quotechar or the delimiter
! csv.QUOTE_ALL means that quotes are always placed around fields.
! csv.QUOTE_NONNUMERIC means that quotes are always placed around
! fields which contain characters other than [+-0-9.].
! - ``doublequote`` (tbd)
!
! - are there more to come?
When processing a dialect setting and one or more of the other
optional parameters, the dialect parameter is processed first, then
the others are processed. This makes it easy to choose a dialect,
! then override one or more of the settings without defining a new
! dialect class. For example, if a CSV file was generated by Excel 2000
! using single quotes as the quote character and TAB as the delimiter,
! you could create a reader like::
! csvreader = csv.reader(file("some.csv"), dialect="excel",
quotechar="'", delimiter='\t')
***************
*** 212,219 ****
Testing
=======
! TBD.
--- 238,253 ----
+ Implementation
+ ==============
+
+ There is a sample implementation available. [1]_ The goal is for it
+ to efficiently implement the API described in the PEP. It is heavily
+ based on the Object Craft csv module. [2]_
+
+
Testing
=======
! The sample implementation [1]_ includes a set of test cases.
***************
*** 284,294 ****
==========
! .. [1] csv module, Object Craft
! (http://www.object-craft.com.au/projects/csv)
! .. [2] Python-DSV module, Wells
! (http://sourceforge.net/projects/python-dsv/)
! .. [3] ASV module, Tratt
(http://tratt.net/laurie/python/asv/)
--- 318,331 ----
==========
! .. [1] csv module, Python Sandbox
! (http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/nondist/sandbox/csv/)
! .. [2] csv module, Object Craft
! (http://www.object-craft.com.au/projects/csv)
! .. [3] Python-DSV module, Wells
! (http://sourceforge.net/projects/python-dsv/)
!
! .. [4] ASV module, Tratt
(http://tratt.net/laurie/python/asv/)