csv module strangeness.

John Machin sjmachin at lexicon.net
Wed Aug 30 18:28:36 EDT 2006


tobiah wrote:

>
> The docs clearly state what the defaults are, but they are not
> in the code.  It seems so clumsy to have to specify every one
> of these, just to change the delimiter from comma to tab.
>

That particular case is handled by the built-in (but cunningly
concealed) 'excel-tab' class:
|>>> import csv
|>>> csv.list_dialects()
['excel-tab', 'excel']
|>>> td = csv.get_dialect('excel-tab')
|>>> dir(td)
['__doc__', '__init__', '__module__', '_name', '_valid', '_validate',
'delimiter', 'doublequote', 'escapechar', 'lineterminator',
'quotechar', 'quoting', 'skipinitialspace']
|>>> td.delimiter
'\t'

However, more generally, the docs also clearly state that "In addition
to, or instead of, the dialect parameter, the programmer can also
specify individual formatting parameters, which have the same names as
the attributes defined below for the Dialect class."

In practice, using a Dialect class would be a rather rare occurrence.

E.g. here's the guts of the solution to the "fix a csv file by
rsplitting one column" problem, using the "quoting" attribute on the
assumption that the solution really needs those usually redundant
quotes:

import sys, csv

def fix(inf, outf, fixcol):
    wtr = csv.writer(outf, quoting=csv.QUOTE_ALL)
    for fields in csv.reader(inf):
       fields[fixcol:fixcol+1] = fields[fixcol].rsplit(None, 1)
       wtr.writerow(fields)

if __name__ == "__main__":
    av = sys.argv
    fix(open(av[1], 'rb'), open(av[2], 'wb'), int(av[3]))

HTH,
John




More information about the Python-list mailing list