[Pandas-dev] [pydata] Proposal to change the default number of rows for DataFrame display (lower max_rows)

Wed Mar 28 07:44:30 EDT 2018

2018-03-28 12:16 GMT+02:00 Joris Van den Bossche <
jorisvandenbossche at gmail.com>:

> Coming back to this (we are discussing again a concrete PR proposing this
> change https://github.com/pandas-dev/pandas/pull/20514)
>
> 2017-12-08 16:11 GMT+01:00 Tom Augspurger <tom.augspurger88 at gmail.com>:
>
>> On Fri, Dec 8, 2017 at 8:54 AM, Joris Van den Bossche <
>> jorisvandenbossche at gmail.com> wrote:
>>
>>> *[Note for those reading it on the pydata mailing list, please answer to
>>> pandas-dev at python.org <pandas-dev at python.org> to keep discussion
>>> centralised there]*
>>>
>>> Hi all,
>>>
>>> I am reposting the mail of Clemens below, but with slightly changed
>>> focus, as I think the main discussion point is about the number of rows.
>>>
>>> The proposal in https://github.com/pandas-dev/pandas/pull/17023 is to
>>> lower the default number of rows shown when displaying a Series or
>>> DataFrame from 60 to 20.
>>> Thoughts on that?
>>>
>>
>>
>> Personally, I always set the max rows to 10 or 20, so I'd be OK with it
>> if the community is on board.
>>
>
> I also often set this at a lower value like that (eg typically for
> tutorials), so I am also in favor of changing *something*.
> However, my main 'problem' is that, in interactive usage, with a lower
> default it becomes very cumbersome to actually look at more data (changing
> the setting just to inspect some data). For example if the new max_rows
> default would be 10, doing df.head(20) to quickly inspect some more data
> will still only show 10 rows.
>
> We cannot change what a function like head does (it is still a normal
> repr following the same options, since it needs to actually return a
> dataframe, not only display it), but therefore, I have another proposal:
>
> - We have 2 thresholds instead of 1 (the current 'max_rows'): a number of
> rows to show *in* a truncated repr, and a max number of rows to show
> without truncating
> - For 'big' dataframes, we show a truncated repr. And then I would go even
> lower than 20 and only show first/last 5 (so like a max_rows of 10)
> - For 'small' dataframes, we show the full dataframe without truncating,
> up to the threshold.
>
> Of course, then the difficulty is to determine what we call 'big' and
> 'small', so what is the threshold to show a tuncated repr (and this part
> will again get more subjective :)).
> But for example, using the current max_rows of 60: we could show a full
> repr up to 60 rows, and once the number of rows > 60, we only show 10
> (first/last 5).
>
> You can then still set both thresholds at the same number (like 20) to not
> get this variable behaviour.
>
> This is actually similar to what numpy arrays do (but with a bigger
> threshold: eg np.random.randn(1000) shows all 1000 elements,
> np.random.randn(1001) shows the first/lst 3).
>
> And it seems this is also what R tibbles do: they have a "print_min" and
"print_max" options with exactly this behaviour, only their "print_max" is
lower (it's 10 and 20, respectively):

options(tibble.print_max = n, tibble.print_min = m): if there are more than
> n rows, print only the first m rows. Use options(tibble.print_max = Inf)
> to always show all rows.
>

(from https://cran.r-project.org/web/packages/tibble/vignettes/tibble.html)


> It's just an idea, but I think this might be a way to satisfy more use
> cases at once (and more possibility to fine tune the behaviour).
>
> Joris
>
>
>>
>> Tom
>>
>>
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20180328/a73f42d9/attachment.html>