[SciPy-user] Record Array: How to add a column?

John Hunter jdh2358 at gmail.com
Tue Oct 14 06:35:06 EDT 2008


On Mon, Oct 13, 2008 at 7:41 PM, Robert Kern <robert.kern at gmail.com> wrote:

> This is somewhat more straightforward:
>
> http://projects.scipy.org/pipermail/numpy-discussion/2007-September/029357.html

I took Robert's suggestion from the link above and added
rec_append_fields to matplotlib.mlab  -- I think it may have been
called rec_append_field in 0.98.3, but we altered it in svn HEAD to
support multiple column adds.  There are a number of nice helper
functions for recarrays there

   * rec2txt          : pretty print a record array
   * rec2csv          : store record array in CSV file
   * csv2rec          : import record array from CSV file with type inspection
   * rec_append_fields: adds  field(s)/array(s) to record array
   * rec_drop_fields  : drop fields from record array
   * rec_join         : join two record arrays on sequence of fields
   * rec_groupby      : summarize data by groups (similar to SQL GROUP BY)
   * rec_summarize    : helper code to filter rec array fields into new fields

rec_join is really nice -- supports inner and outer joins with default
fill values and customizable postfixing of column names when joining
two record arrays with identically named fields.

Here is an example showing many of these functions in action

"""
Illustrate the rec array utility funcitons by loading prices from a
csv file, computing the daily returns, appending the results to the
record arrays, joining on date
"""
import urllib
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab

# grab the price data off yahoo
u1 = urllib.urlretrieve('http://ichart.finance.yahoo.com/table.csv?s=AAPL&d=9&e=14&f=2008&g=d&a=8&b=7&c=1984&ignore=.csv')
u2 = urllib.urlretrieve('http://ichart.finance.yahoo.com/table.csv?s=GOOG&d=9&e=14&f=2008&g=d&a=8&b=7&c=1984&ignore=.csv')

# load the CSV files into record arrays
r1 = mlab.csv2rec(file(u1[0]))
r2 = mlab.csv2rec(file(u2[0]))

# compute the daily returns and add these columns to the arrays
gains1 = np.zeros_like(r1.adj_close)
gains2 = np.zeros_like(r2.adj_close)
gains1[1:] = np.diff(r1.adj_close)/r1.adj_close[:-1]
gains2[1:] = np.diff(r2.adj_close)/r2.adj_close[:-1]
r1 = mlab.rec_append_fields(r1, 'gains', gains1)
r2 = mlab.rec_append_fields(r2, 'gains', gains2)

# now join them by date; the default postfixes are 1 and 2
r = mlab.rec_join('date', r1, r2)

# long appl, short goog
g = r.gains1-r.gains2
tr = (1+g).cumprod()  # the total return

# plot the return
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(r.date, tr)
ax.set_title('total return: long appl, short goog')
ax.grid()
fig.autofmt_xdate()
plt.show()



More information about the SciPy-User mailing list