[SciPy-User] pandas data frame building by outer join

Wes McKinney wesmckinn at gmail.com
Thu Dec 23 11:27:10 EST 2010


On Thu, Dec 23, 2010 at 10:43 AM, Fabrizio Pollastri
<f.pollastri at inrim.it> wrote:
> Hello,
>
> A pandas question:
> it is possible to build a data frame from several time series, starting with an
> empty data frame and reading the time series one at a time from a file and
> joining them in outer mode to the data frame? How I can control the column name
> of each added time series?
>
> Here is a coding example, not working since join wants two data frames.
>
> from pandas import *
> from pandas.io.parsers import parseCSV
> import sys
>
> global_df = DataFrame()
>
> for fname in sys.argv[1:]:
>    current_time_series = parseCSV(fname)['col_of_interest']
>    global_df = global_df.join(current_time_series,how='outer')
>
>
> TIA,
> Fabrizio Pollastri
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

A preferable approach (faster and simpler) would be to create a dict
of time series and pass that to the DataFrame constructor, e.g.

data_dict = {}
for fname in sys.argv[1:]:
    data_dict[fname] = parseCSV(fname)['col_of_interest']
df = DataFrame(data_dict)

So the keys of the dict will be the column names.

hth,
Wes



More information about the SciPy-User mailing list