[Python-checkins] python/nondist/sandbox/csv libcsv.tex,NONE,1.1

Sun, 02 Feb 2003 19:09:30 -0800

Update of /cvsroot/python/python/nondist/sandbox/csv
In directory sc8-pr-cvs1:/tmp/cvs-serv2201

Added Files:
	libcsv.tex 
Log Message:
First cut at a libref section.  I'm certain there are mistakes, but this
gets some ink on the paper.

--- NEW FILE: libcsv.tex ---
\section{\module{csv} --- CSV File Reading and Writing}

\declaremodule{standard}{csv}
\modulesynopsis{Write and read tabular data to and from delimited files.}

\index{csv}
\indexii{data}{tabular}

The \module{csv} module implements classes to read and write tabular data.
The so-called CSV (Comma Separated Values) format is the most common import
and export format for spreadsheets and databases.  While the delimiters and
quoting characters vary, the overall format is similar enough that it is
possible to write a single module which can manipulate such data.

There is no ``CSV standard'', so the format is operationally defined by the
many applications which read and write it.  The lack of a standard means
there can be subtle differences in the data produced and consumed by
different applications.  These differences can be maddeningly subtle.

The \module{csv} allows programmers to say, ``write this data in the format
preferred by Excel (tm),'' without knowing all the fiddly little details of
the CSV format used by Excel.  Programmers can also easily define their own
CSV formats.

\subsection{Relationship to other Python modules}

The csv module reads and writes sequences.  It can also read data and return
the rows as dicts.  Sequence types other than lists and tuples
(e.g. \code{array} objects) can be written.  To make it as easy as possible
to interface with modules which implement the DB API, the value None is
written as the empty string.  While this isn't a reversible transformation,
it makes it easier to dump SQL NULL data values to CSV files without
preprocessing the data returned from a {}\code{cursor.fetch*()} call.

The \module{csv} module defines the following classes.

\begin{classdesc}{reader}{iterable\optional{, dialect="excel"}
			  \optional{, fmtparam}}
Create a reader object which will iterate over lines in the given
{}\var{csvfile}.  An optional \var{dialect} parameter can be given which is
used to define a set of parameters specific to a particular CSV dialect.
The other optional \var{fmtparam} keyword arguments can be given to override
individual formatting parameters in the current dialect.  For more
information about the dialect and formatting parameters, see section
{}\ref{fmt-params}, ``Dialects and Formatting Parameters'' for details of
these parameters.
\end{classdesc}

\begin{classdesc}{writer}{fileobj\optional{, dialect="excel"}
			  \optional{, fieldnames}
			  \optional{, fmtparam}}

Create a writer object responsible for converting the user's data into
delimited strings on the given file-like object.  An optional \var{dialect}
parameter can be given which is used to define a set of parameters specific
to a particular CSV dialect.  If a sequence of strings is given as the
optional \var{fieldnames} parameter, the writer will use them to properly
order mapping objects passed to the object's \method{write} methods.  The
other optional \var{fmtparam} keyword arguments can be given to override
individual formatting parameters in the current dialect.  For more
information about the dialect and formatting parameters, see section
{}\ref{fmt-params}, ``Dialects and Formatting Parameters'' for details of
these parameters.
\end{classdesc}

The \module{csv} module defines the following functions.

\begin{funcdesc}{register_dialect}{name, dialect}
Associate \var{dialect} with \var{name}.  \var{dialect} must be a subclass
of \class{csv.Dialect}.  \var{name} must be a string or Unicode object.
\end{funcdesc}

\begin{funcdesc}{get_dialect}{name}
Return the dialect associated with \var{name}.  A \exception{KeyError} is
raised if \var{name} is not a registered dialect name.
\end{funcdesc}

\begin{funcdesc}{list_dialects}{}
Return the names of all registered dialects.
\end{funcdesc}

The \module{csv} module defines the following constants.

\begin{datadesc}{QUOTE_ALWAYS}
Instructs \class{writer} objects to quote all fields.
\end{datadesc}

\begin{datadesc}{QUOTE_MINIMAL}
Instructs \class{writer} objects to only quote those fields which contain
the current \var{delimiter} or begin with the current \var{quotechar}.
\end{datadesc}

\begin{datadesc}{QUOTE_NONNUMERIC}
Instructs \class{writer} objects to quote all non-numeric fields.
\end{datadesc}

\begin{datadesc}{QUOTE_NONE}
Instructs \class{writer} objects to never quote fields.  When the current
{}\var{delimiter} occurs in output data it is preceded by the current
{}\var{escapechar} character.  When QUOTE_NONE is in effect, it is an error
not to have a single-character \var{escapechar} defined, even if no data to
be written contains the \var{delimiter} character.
\end{datadesc}

\subsection{Dialects and Formatting Parameters\label{fmt-params}}

To make it easier to specify the format of input and output records,
specific formatting parameters are grouped together into dialects.  A
dialect is a subclass of the \class{Dialect} class having a set of specific
methods and a single \method{validate} method.  When creating \class{reader}
or \class{writer} objects, the programmer can specify a string or a subclass
of the \class{Dialect} class as the dialect parameter.  In addition to, or
instead of, the \var{dialect} parameter, the programmer can also specify
individual formatting parameters, described in the following section.

\subsubsection{Formatting Parameters} 

Both the \class{reader} and \class{writer} classes take several specific
formatting parameters, specified as keyword parameters.

\begin{description}
\item{quotechar}{specifies a one-character string to use as the quoting
  character.  It defaults to \code{"}.}

\item{delimiter}{specifies a one-character string to use as the field
  separator.  It defaults to \code{,}.}

\item{escapechar}{specifies a one-character string used to escape the
  delimiter when quotechar is set to \var{None}.}

\item{skipinitialspace}{specifies how to interpret whitespace which
  immediately follows a delimiter.  It defaults to False, which means
  that whitespace immediately following a delimiter is part of the
  following field.}

\item{lineterminator}{specifies the character sequence which should
  terminate rows.}

\item{quoting}{controls when quotes should be generated by the
  writer.  It can take on any of the following module constants:}

\begin{description}
\item{QUOTE_MINIMAL}{means only when required, for example, when a
    field contains either the quotechar or the delimiter.}

\item{QUOTE_ALL}{means that quotes are always placed around all fields.}

\item{QUOTE_NONNUMERIC}{means that quotes are always placed around
    fields which contain characters other than [+-0-9.].}

\item{QUOTE_NONE}{means that quotes are never placed around fields.
	Instead, the \var{escapechar} is used to escape any instances of the
	\var{delimiter} which occurs in the data.}
\end{description}

\item{doublequote}{controls the handling of quotes inside fields.  When
  \var{True}, two consecutive quotes are interpreted as one during read, and
  when writing, each quote is written as two quotes.}
\end{description}

\subsection{Reader Objects}

\class{Reader} objects have the following public methods.

\begin{methoddesc}{next}{}
Return the next row of the reader's iterable object as a list, parsed
according to the current dialect.
\end{methoddesc}

\subsection{Writer Objects}

\class{Writer} objects have the following public methods.

\begin{methoddesc}{write}{row}
Write the \var{row} parameter to the writer's file object, formatted
according to the current dialect.
\end{methoddesc}

\begin{methoddesc}{writelines}{rows}
Write all the \var{rows} parameters to the writer's file object, formatted
according to the current dialect.
\end{methoddesc}

\begin{methoddesc}{close}{}
Close the underlying file object.
\end{methoddesc}

\subsection{Examples}

The ``hello world'' of csv reading is

\begin{verbatim}
    reader = csv.reader(file("some.csv"))
    for row in reader:
        print row
\end{verbatim}

The corresponding simplest possible writing example is

\begin{verbatim}
    writer = csv.writer(file("some.csv", "w"))
    for row in someiterable:
        writer.write(row)
\end{verbatim}

Both the \class{reader} and \class{writer} classes accept a number of
optional arguments which are used to tailor them to the dialect of the input
or output file.