[Pandas-dev] [pydata] Feedback request for return value of empty or, all-NA sum (0 or NA?)

Stephen Simmons mail at stevesimmons.com
Sun Dec 3 05:34:06 EST 2017


Nat Smith wrote:

> I am baffled by the idea that sum([]) would return NaN.

So am I. Here are two cases that leave me confused what the intention is.

Case #1 - Summing an empty integer series

Not only does the answer change from 0 to NaN, but the type changes from int to float.

That occurs whether skipna is True or False!

> pd.Series([], dtype=int).sum()

nan

> pd.Series([], dtype=int).sum(skipna=True)

nan

> pd.Series([], dtype=int).sum(skipna=False)

nan

This confused me so I went back to the docstring and tried it with a float Series:

> pd.Series.sum?

Signature: pd.Series.sum(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

Docstring:

Return the sum of the values for the requested axis


Parameters

----------

axis : {index (0)}

skipna : boolean, default True

     Exclude NA/null values. If an entire row/column is NA or empty, the result

     will be NA

level : int or level name, default None

     If the axis is a MultiIndex (hierarchical), count along a

     particular level, collapsing into a scalar

numeric_only : boolean, default None

     Include only float, int, boolean columns. If None, will attempt to use

     everything, then use only numeric data. Not implemented for Series.


I would expect skipna being True to mean we don't want NaNs affecting the sum.
So why would we want NaN when the series is empty?
In fact, for an empty series, skipna gives the same NaN output for both
skipna=True and skipna=False:

> pd.Series([], dtype=float).sum(skipna=False)

nan

>pd.Series([], dtype=float).sum(skipna=True)

nan

This looks even more weird in this case:

> pd.Series([0, float('nan')], dtype=float).sum(skipna=True)

0.0    # NaN is skipped, sum is non-NaN. So far so good...

So what happens with different non-empty input?

> pd.Series([float('nan')], dtype=float).sum(skipna=True)

nan   # Skip all NaNs, get empty series to sum, so return NaN???

So if we want to avoid NaNs in our output, the skipna parameter doesn't help.
For every use of sum(), we now need to separately check two special cases:
- empty input
- input with only NaNs


I can't see how this behaviour helps anyone!

Regards

Stephen




More information about the Pandas-dev mailing list