[Pandas-dev] Mean, stdev, and var for array elements

Tyler Hardin th020394 at gmail.com
Sat Oct 7 02:04:12 EDT 2017


That was a bad example. Better:

import pandas as pd

a = pd.Series([1, 2, 3, 4]) * 1.
b = pd.Series([1, 2, 3, 4]) * 2.
c = pd.Series([1, 2, 3, 4]) * 3.
d = pd.Series([1, 2, 3, 4]) * 4.

df = pd.DataFrame({
    'date' : ['20170103'] * 4,
    'stock' : ['AAPL', 'GOOG', 'MSFT', 'TSLA'],
    'pnl_curve' : [a,b,c,d]
})

def proc_grp(grp):
    return pd.DataFrame({'pnl_curve' : grp.pnl_curve.sum()})

print(df.groupby('date').apply(proc_grp))


Output:

            pnl_curve
date
20170103 0       10.0
         1       20.0
         2       30.0
         3       40.0

The goal is meaningful dimensionality reduction with curves.

On Sat, Oct 7, 2017 at 1:55 AM, Tyler Hardin <th020394 at gmail.com> wrote:

> Hi,
>
> I'd really like to be able to calculate the mean, stdev, and var across
> within cells of a dataframe. It already works as I expect it to with sum.
>
> Example:
>
> import pandas as pd
>
> a = pd.Series([1, 2, 3, 4]) * 1.
> b = pd.Series([1, 2, 3, 4]) * 2.
> c = pd.Series([1, 2, 3, 4]) * 3.
> d = pd.Series([1, 2, 3, 4]) * 4.
>
> df = pd.DataFrame({'a' : [a,b,c,d]}, index=[0, 1, 2, 3])
>
> print(df.a.sum())
>
> Output:
>
> 0    10.0
> 1    20.0
> 2    30.0
> 3    40.0
> dtype: float64
>
> This is very useful for embedding a third dimension within a single column
> (because it's only needed there) instead of going full multi-index.
>
> For example, say you have a dataframe indexed on (date, stock) and in the
> dataframe you have columns for close pnl, close gmv, etc. Further, say you
> have a pnl_curve column, a minute-indexed (intraday) timeseries (again,
> unique per date, stock). As in, each (date, stock) has an associated
> intraday pnl curve (pd.Series object) in the column.
>
> From that setup, I want to reduce away the stock dimension. I might want
> to sum the pnl curves (to get overall intraday pnl curves for each date).
> This actually works already. (As simple as df.pnl_curve.sum()). But I'd
> also like to plot the mean pnl and std bands around that. Neither mean nor
> std work for this.
>
> Can someone implement these functions for series, or help me do it right?
> Or is there a better way?
>
> It seems the implementation for mean is as simple as removing
> _ensure_numeric in core/nanops.py. As for nanvar, I'm really not sure how
> to 1) use numpy functions to calculate what I need and 2) extend the
> function to accept dtype object without making it more likely to give
> cryptic errors when someone accidentally uses it with objects. (E.g. Pandas
> seems to be careful to throw meaningful Value and TypeErrors when it can.
> Amateurishly loosing restrictions defeats that.)
>
> Regards,
> Tyler
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171007/7ae49a24/attachment.html>


More information about the Pandas-dev mailing list