[SciPy-User] [SciPy-user] Is there a better way to read a CSV file and store for processing?

mdekauwe mdekauwe at gmail.com
Thu Mar 8 22:31:10 EST 2012


Hi,

So I was wondering if there might be a "better" way to go about reading a
CSV file and storing it for later post-processing. What I have written does
the job fine, but I think there might be a better way as I seem to be
duplicating some steps to get around things I don't know. For example I
guess ideally I would like to read the CSV file into a numpy array one could
access by variable names but I couldn't work that out. Any thoughts welcome.

Thanks...

CSV file looks a bit like this

Year,Day of the year,NPP, etc...
--,--,some units, etc...
YEAR,DOY,NPP, etc...
1996.0,1.0,10.09, etc...
etc
etc

Code...

#!/usr/bin/env python

"""
Example of reading CSV file and some simple processing...

    1. Read CSV file into a python dictionary/list
    2. Save the data to a pickle object, to speed up reading back in 
    3. Read the object back in to test everything is fine
    4. Get the timeseries of one of the variables, print it and plot it...
"""
__author__ = "Martin De Kauwe"
__version__ = "1.0 (09.03.2012)"
__email__ = "mdekauwe at gmail.com"

import numpy as np
import sys
import glob
import csv
import cPickle as pickle

def main():
    for fname in glob.glob("*.csv"): 
        data = read_csv_file(fname, head_length=3, delim=",")
        
        # save the data to the hard disk for quick access later
        pkl_fname = "test_model_data.pkl"
        save_dictionary(data, pkl_fname)
        
        # read the data back in to check it worked...
        f = open(pkl_fname, 'rb')
        data = pickle.load(f)
        
        npp = get_var(data, "NPP")
        for i in xrange(len(npp)):
            print npp[i]
            
        import matplotlib.pyplot as plt
        plt.plot(npp, "ro-")
        plt.show()


def read_csv_file(fname, head_length=None, delim=None):
    """ read the csv file into a dictionary """
    f = open(fname, "rb")
    
    # read the correct header keys...
    f = find_header_keys(f, line_with_keys=2)
    
    # read the data into a nice big dictionary...and return as a list
    reader = csv.DictReader(f, delimiter=',')
    data = [row for row in reader]
    
    return data
    
def find_header_keys(fp, line_with_keys=None):
    """ Incase the csv file doesn't have the header keys on the first line,
    advanced the pointer until the line we desire """
    dialect = csv.Sniffer().sniff(fp.read(1024))
    fp.seek(0)
    for i in xrange(line_with_keys):
        next(fp)
    return fp
    
def save_dictionary(data, outfname):
    """ save dictionary to disk, i.e. pickle it """
    out_dict = open(outfname, 'wb')
    pickle.dump(data, out_dict, pickle.HIGHEST_PROTOCOL)
    out_dict.close()    
    
def get_var(data, var):
    """ return the entire time series for a given variable """
    return np.asarray([data[i][var] for i in xrange(len(data))])
    

if __name__ == "__main__":
    
    main()


-- 
View this message in context: http://old.nabble.com/Is-there-a-better-way-to-read-a-CSV-file-and-store-for-processing--tp33469432p33469432.html
Sent from the Scipy-User mailing list archive at Nabble.com.




More information about the SciPy-User mailing list