[Pandas-dev] Mean, stdev, and var for array elements
Tyler Hardin
th020394 at gmail.com
Sat Oct 7 02:04:12 EDT 2017
That was a bad example. Better:
import pandas as pd
a = pd.Series([1, 2, 3, 4]) * 1.
b = pd.Series([1, 2, 3, 4]) * 2.
c = pd.Series([1, 2, 3, 4]) * 3.
d = pd.Series([1, 2, 3, 4]) * 4.
df = pd.DataFrame({
'date' : ['20170103'] * 4,
'stock' : ['AAPL', 'GOOG', 'MSFT', 'TSLA'],
'pnl_curve' : [a,b,c,d]
})
def proc_grp(grp):
return pd.DataFrame({'pnl_curve' : grp.pnl_curve.sum()})
print(df.groupby('date').apply(proc_grp))
Output:
pnl_curve
date
20170103 0 10.0
1 20.0
2 30.0
3 40.0
The goal is meaningful dimensionality reduction with curves.
On Sat, Oct 7, 2017 at 1:55 AM, Tyler Hardin <th020394 at gmail.com> wrote:
> Hi,
>
> I'd really like to be able to calculate the mean, stdev, and var across
> within cells of a dataframe. It already works as I expect it to with sum.
>
> Example:
>
> import pandas as pd
>
> a = pd.Series([1, 2, 3, 4]) * 1.
> b = pd.Series([1, 2, 3, 4]) * 2.
> c = pd.Series([1, 2, 3, 4]) * 3.
> d = pd.Series([1, 2, 3, 4]) * 4.
>
> df = pd.DataFrame({'a' : [a,b,c,d]}, index=[0, 1, 2, 3])
>
> print(df.a.sum())
>
> Output:
>
> 0 10.0
> 1 20.0
> 2 30.0
> 3 40.0
> dtype: float64
>
> This is very useful for embedding a third dimension within a single column
> (because it's only needed there) instead of going full multi-index.
>
> For example, say you have a dataframe indexed on (date, stock) and in the
> dataframe you have columns for close pnl, close gmv, etc. Further, say you
> have a pnl_curve column, a minute-indexed (intraday) timeseries (again,
> unique per date, stock). As in, each (date, stock) has an associated
> intraday pnl curve (pd.Series object) in the column.
>
> From that setup, I want to reduce away the stock dimension. I might want
> to sum the pnl curves (to get overall intraday pnl curves for each date).
> This actually works already. (As simple as df.pnl_curve.sum()). But I'd
> also like to plot the mean pnl and std bands around that. Neither mean nor
> std work for this.
>
> Can someone implement these functions for series, or help me do it right?
> Or is there a better way?
>
> It seems the implementation for mean is as simple as removing
> _ensure_numeric in core/nanops.py. As for nanvar, I'm really not sure how
> to 1) use numpy functions to calculate what I need and 2) extend the
> function to accept dtype object without making it more likely to give
> cryptic errors when someone accidentally uses it with objects. (E.g. Pandas
> seems to be careful to throw meaningful Value and TypeErrors when it can.
> Amateurishly loosing restrictions defeats that.)
>
> Regards,
> Tyler
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171007/7ae49a24/attachment.html>
More information about the Pandas-dev
mailing list