[Pandas-dev] [pydata] Re: Upcoming Index repr changes

Joris Van den Bossche jorisvandenbossche at gmail.com
Fri May 22 01:31:32 CEST 2015


Follow-up of this discussion: as you may have seen, the changes were
released in 0.16.1 (see the whatsnew docs:
http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#index-representation
).
In the end, we used the suggestion of John to go for a bit more numpy style
output.

There will probably still be some quirks/things to improve, you can report
them at this follow-up issue: https://github.com/pydata/pandas/issues/10095

Joris

2015-04-21 2:59 GMT+02:00 Joris Van den Bossche <
jorisvandenbossche at gmail.com>:

> I like the suggestion of John to have something more like the output of
> numpy arrays.
>
> For example, the proposed repr:
>
> In [12]: pd.date_range('20130101',periods=104,name='foo',tz='US/Eastern')
> Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02
> 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14
> 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104,
> freq='D', tz='US/Eastern')
>
> would then be something like this:
>
> In [12]: pd.date_range('20130101',periods=104,name='foo',tz='US/Eastern')
> Out[12]:
> DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00',
> ...,
>                '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'],
>               dtype='datetime64[ns]', name=u'foo', length=104, freq='D',
> tz='US/Eastern')
>
>
> 2015-04-21 2:53 GMT+02:00 Jeff <jeffreback at gmail.com>:
>
>>
>> John, you are quoting the current impl (which is first), the new is like
>> this:
>>
>> In [11]: pd.date_range('20130101',periods=4,name='foo',tz='US/Eastern')
>> Out[11]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00'], dtype='datetime64[ns]', name=u'foo', freq='D', tz='US/Eastern')
>>
>> In [12]: pd.date_range('20130101',periods=104,name='foo',tz='US/Eastern')
>> Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104, freq='D', tz='US/Eastern')
>>
>> Lorenzo, to answer your question, MultiIndexes are unchanged (and
>> CategoricalIndex are new). We *could* make them a single line but would be
>> pretty crowded.
>>
>> Note that MultiIndex and CategoricalIndex are multi-line repr and do no
>> truncate sequences (of e.g. labels), this is consistent with previous
>> versions. (easy to change this though)
>>
>> In [1]: MultiIndex.from_product([list('abcdefg'),range(10)],names=['first','second'])
>> Out[1]:
>> MultiIndex(levels=[[u'a', u'b', u'c', u'd', u'e', u'f', u'g'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]],
>>            labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]],
>>            names=[u'first', u'second'])
>>
>> In [4]: pd.CategoricalIndex(np.random.randint(0,5,size=100),name='foo')
>> Out[4]:
>> CategoricalIndex([3, 0, 0, 3, 1, 3, 0, 4, 2, 3, 0, 4, 0, 1, 2, 0, 4, 1, 4, 2, 3, 1, 0, 4, 4, 3, 0, 3, 0, 1, 2, 3, 3, 1, 1, 0, 0, 4, 4, 1, 1, 3, 1, 1, 4, 4, 3, 0, 0, 0, 4, 4, 0, 1, 3, 1, 2, 0, 3, 1, 2, 2, 2, 1, 1, 4, 1, 0, 4, 3, 3, 0, 0, 0, 4, 4, 1, 4, 2, 2, 1, 4, 0, 0, 0, 4, 3, 0, 4, 0, 0, 0, 3, 3, 1, 2, 2, 3, 4, 1],
>>                  categories=[0, 1, 2, 3, 4],
>>                  ordered=False,
>>                  name=u'foo',
>>                  dtype='category')
>>
>>
>>
>>
>>
>> On Monday, April 20, 2015 at 8:37:01 PM UTC-4, John E wrote:
>>>
>>> This is probably not the sort of comment you're looking for, but I'd
>>> like to see more of a table-style output.  I can just put a 'values' at the
>>> end to get the more numpy like output (which is easier to read IMO), but it
>>> won't stop at 10 or 100 unless I tell it to.  Nevertheless, I think it's
>>> much easer to read this:
>>>
>>> pd.date_range('20130101', periods=104, name='foo',
>>> tz='US/Eastern').values
>>> Out[442]:
>>> array(['2013-01-01T00:00:00.000000000-0500',
>>>        '2013-01-02T00:00:00.000000000-0500',
>>>        '2013-01-03T00:00:00.000000000-0500',
>>>        '2013-01-04T00:00:00.000000000-0500',
>>>        '2013-01-05T00:00:00.000000000-0500',
>>>
>>> than this:
>>>
>>> pd.date_range('20130101', periods=104, name='foo', tz='US/Eastern')
>>> Out[443]:
>>> <class 'pandas.tseries.index.DatetimeIndex'>
>>> [2013-01-01 00:00:00-05:00, ..., 2013-04-14 00:00:00-04:00]
>>> Length: 104, Freq: D, Timezone: US/Eastern
>>>
>>>
>>> On Friday, April 17, 2015 at 6:07:44 AM UTC-4, Joris Van den Bossche
>>> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> We have a PR pending to unify the string representation of the
>>>> different Index objects: https://github.com/pydata/pandas/pull/9901
>>>>
>>>> What are the most important changes:
>>>>
>>>>    - We propose to reduce the default number of values shown from 100
>>>>    to 10 (an option controllable as pd.options.display.max_seq_items).
>>>>    - The datetime-like indices (DatetimeIndex, TimedeltaIndex,
>>>>    PeriodIndex) were always somewhat different and get a new repr that is now
>>>>    more consistent with how it is for other Index types like Int64Index. This
>>>>    is the biggest change.
>>>>
>>>> So for eg Int64Index not much changes (only 'name' is now also shown,
>>>> and the number of shown values has changed), but for DatetimeIndex the
>>>> change is larger.
>>>>
>>>> *But we would like to get some feedback on this!*
>>>>
>>>> Do you like the changes? For DatetimeIndex? For the number of shown
>>>> values?
>>>> Would you want different behaviour for repr() and str()?
>>>>
>>>> Some examples of the changes with the current state of the PR are shown
>>>> below:
>>>>
>>>> Previous Behavior
>>>>
>>>> In [1]: pd.get_option('max_seq_items')
>>>> Out[1]: 100
>>>>
>>>> In [2]: pd.Index(range(4), name='foo')
>>>> Out[2]: Int64Index([0, 1, 2, 3], dtype='int64')
>>>>
>>>> In [3]: pd.Index(range(104), name='foo')
>>>> Out[3]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
>>>> 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
>>>> 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,
>>>> 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71,
>>>> 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
>>>> 91, 92, 93, 94, 95, 96, 97, 98, 99, ...], dtype='int64')
>>>>
>>>> In [4]: pd.date_range('20130101', periods=4, name='foo',
>>>> tz='US/Eastern')
>>>> Out[4]:
>>>> <class 'pandas.tseries.index.DatetimeIndex'>
>>>> [2013-01-01 00:00:00-05:00, ..., 2013-01-04 00:00:00-05:00]
>>>> Length: 4, Freq: D, Timezone: US/Eastern
>>>>
>>>> In [5]: pd.date_range('20130101', periods=104, name='foo',
>>>> tz='US/Eastern')
>>>> Out[5]:
>>>> <class 'pandas.tseries.index.DatetimeIndex'>
>>>> [2013-01-01 00:00:00-05:00, ..., 2013-04-14 00:00:00-04:00]
>>>> Length: 104, Freq: D, Timezone: US/Eastern
>>>>
>>>> New Behavior
>>>>
>>>> In [1]: pd.get_option('max_seq_items')
>>>> Out[1]: 10
>>>>
>>>> In [9]: pd.Index(range(4), name='foo')
>>>> Out[9]: Int64Index([0, 1, 2, 3], dtype='int64', name=u'foo')
>>>>
>>>> In [10]: pd.Index(range(104), name='foo')
>>>> Out[10]: Int64Index([0, 1, ..., 102, 103], dtype='int64', name=u'foo',
>>>> length=104)
>>>>
>>>> In [11]: pd.date_range('20130101', periods=4, name='foo',
>>>> tz='US/Eastern')
>>>> Out[11]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02
>>>> 00:00:00-05:00', '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00'],
>>>> dtype='datetime64[ns]', name=u'foo', freq='D', tz='US/Eastern')
>>>>
>>>> In [12]: pd.date_range('20130101', periods=104 ,name='foo',
>>>> tz='US/Eastern')
>>>> Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02
>>>> 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14
>>>> 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104,
>>>> freq='D', tz='US/Eastern')
>>>>
>>>>  --
>> You received this message because you are subscribed to the Google Groups
>> "PyData" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to pydata+unsubscribe at googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20150522/336edee4/attachment.html>


More information about the Pandas-dev mailing list