Any suggestions for working w/ CSVs?

Skip Montanaro skip at mojam.com
Fri Oct 27 18:00:00 EDT 2000


    Jen> Is there something already written that deals with CSVs?  In
    Jen> particular, where quotes are used to enclose fields that contain
    Jen> commas.

Jen, et al,

Here's a reference to a locally modified version of Laurence Tratt's CSV.py
module:

    http://www.musi-cal.com/~skip/python/CSV.py

Modifications include properly handling embedded quotation marks.

I use it all the time.  Should be fairly stable.

Here's a simple example of its use (I call it csv2csv.py).  It takes a
series of field names and input and output filenames (both CSVs).  It
assumes the input is quoted and comma-separated and dumps the specified
fields to the output CSV file using the values of the -s and -n flags, e.g.:

    csv2csv.py -f "performers date venue city state" master.csv subset.csv

It's not as stable as CSV.py (I was just working on it today!), but should
do reasonable things with reasonable inputs.  The progress module it uses is
at

    http://www.musi-cal.com/~skip/python/progress.py

Cheers,

-- 
Skip Montanaro (skip at mojam.com)
http://www.mojam.com/
http://www.musi-cal.com/

#!/usr/bin/env python

import CSV, sys, string, getopt
import progress

def usage(prog):
    print 'usage: %(prog)s -f "f1 f2 f3 ..." [ -s separator ] [ -n ] infile outfile' % locals()
    print '    -f is required and lists a set of field names to dump'
    print '    -s is optional and specifies an alternate output field separator'
    print '       string (default is a comma)'
    print '    -n is optional and indicates the output fields are not to be quoted.'
    sys.exit(1)

def main():
    usequotes = 1
    separator = ','
    fieldnames = ""
    opts,args = getopt.getopt(sys.argv[1:], "s:nf:")
    for opt,arg in opts:
        if opt == "-f":
            fieldnames = string.split(arg)
        elif opt == "-s":
            separator = arg
        elif opt == "-n":
            usequotes = 0

    if len(args) != 2 or not fieldnames:
        usage()

    input = args[0]
    output = args[1]

    csv1 = CSV.CSV()
    csv1.load(input, 1, 0, ",")
    csv2 = CSV.CSV()
    csv2.append(CSV.Entry(fieldnames))
    ticker = progress.Progress()
    for item in csv1:
        newitem = CSV.Entry([""]*len(fieldnames), fieldnames)
        for f in fieldnames:
            try:
                newitem[f] = item[f]
            except ValueError:
                print f,item,newitem
                break
        csv2.append(newitem)
        ticker.tick()
    csv2.save(output, separator, usequotes, usequotes)

if __name__ == "__main__":
    main()




More information about the Python-list mailing list