[Pandas-dev] Upcoming Index repr changes

Mon Apr 20 20:53:17 EDT 2015

John, you are quoting the current impl (which is first), the new is like 
this:

In [11]: pd.date_range('20130101',periods=4,name='foo',tz='US/Eastern')
Out[11]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00'], dtype='datetime64[ns]', name=u'foo', freq='D', tz='US/Eastern')

In [12]: pd.date_range('20130101',periods=104,name='foo',tz='US/Eastern')
Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104, freq='D', tz='US/Eastern')

Lorenzo, to answer your question, MultiIndexes are unchanged (and 
CategoricalIndex are new). We *could* make them a single line but would be 
pretty crowded. 

Note that MultiIndex and CategoricalIndex are multi-line repr and do no 
truncate sequences (of e.g. labels), this is consistent with previous 
versions. (easy to change this though)

In [1]: MultiIndex.from_product([list('abcdefg'),range(10)],names=['first','second'])
Out[1]: 
MultiIndex(levels=[[u'a', u'b', u'c', u'd', u'e', u'f', u'g'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]],
           names=[u'first', u'second'])

In [4]: pd.CategoricalIndex(np.random.randint(0,5,size=100),name='foo')
Out[4]: 
CategoricalIndex([3, 0, 0, 3, 1, 3, 0, 4, 2, 3, 0, 4, 0, 1, 2, 0, 4, 1, 4, 2, 3, 1, 0, 4, 4, 3, 0, 3, 0, 1, 2, 3, 3, 1, 1, 0, 0, 4, 4, 1, 1, 3, 1, 1, 4, 4, 3, 0, 0, 0, 4, 4, 0, 1, 3, 1, 2, 0, 3, 1, 2, 2, 2, 1, 1, 4, 1, 0, 4, 3, 3, 0, 0, 0, 4, 4, 1, 4, 2, 2, 1, 4, 0, 0, 0, 4, 3, 0, 4, 0, 0, 0, 3, 3, 1, 2, 2, 3, 4, 1],
                 categories=[0, 1, 2, 3, 4],
                 ordered=False,
                 name=u'foo',
                 dtype='category')

On Monday, April 20, 2015 at 8:37:01 PM UTC-4, John E wrote:
>
> This is probably not the sort of comment you're looking for, but I'd like 
> to see more of a table-style output.  I can just put a 'values' at the end 
> to get the more numpy like output (which is easier to read IMO), but it 
> won't stop at 10 or 100 unless I tell it to.  Nevertheless, I think it's 
> much easer to read this:
>
> pd.date_range('20130101', periods=104, name='foo', tz='US/Eastern').values
> Out[442]: 
> array(['2013-01-01T00:00:00.000000000-0500',
>        '2013-01-02T00:00:00.000000000-0500',
>        '2013-01-03T00:00:00.000000000-0500',
>        '2013-01-04T00:00:00.000000000-0500',
>        '2013-01-05T00:00:00.000000000-0500',
>
> than this:
>
> pd.date_range('20130101', periods=104, name='foo', tz='US/Eastern')
> Out[443]: 
> <class 'pandas.tseries.index.DatetimeIndex'>
> [2013-01-01 00:00:00-05:00, ..., 2013-04-14 00:00:00-04:00]
> Length: 104, Freq: D, Timezone: US/Eastern
>
>
> On Friday, April 17, 2015 at 6:07:44 AM UTC-4, Joris Van den Bossche wrote:
>>
>> Hi all,
>>
>> We have a PR pending to unify the string representation of the different 
>> Index objects: https://github.com/pydata/pandas/pull/9901
>>
>> What are the most important changes:
>>
>>    - We propose to reduce the default number of values shown from 100 to 
>>    10 (an option controllable as pd.options.display.max_seq_items). 
>>    - The datetime-like indices (DatetimeIndex, TimedeltaIndex, 
>>    PeriodIndex) were always somewhat different and get a new repr that is now 
>>    more consistent with how it is for other Index types like Int64Index. This 
>>    is the biggest change.
>>
>> So for eg Int64Index not much changes (only 'name' is now also shown, and 
>> the number of shown values has changed), but for DatetimeIndex the change 
>> is larger.
>>
>> *But we would like to get some feedback on this!*
>>
>> Do you like the changes? For DatetimeIndex? For the number of shown 
>> values?
>> Would you want different behaviour for repr() and str()?
>>
>> Some examples of the changes with the current state of the PR are shown 
>> below:
>>
>> Previous Behavior
>>
>> In [1]: pd.get_option('max_seq_items')
>> Out[1]: 100
>>
>> In [2]: pd.Index(range(4), name='foo')
>> Out[2]: Int64Index([0, 1, 2, 3], dtype='int64')
>>
>> In [3]: pd.Index(range(104), name='foo')
>> Out[3]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 
>> 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 
>> 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 
>> 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 
>> 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 
>> 92, 93, 94, 95, 96, 97, 98, 99, ...], dtype='int64')
>>
>> In [4]: pd.date_range('20130101', periods=4, name='foo', tz='US/Eastern')
>> Out[4]:
>> <class 'pandas.tseries.index.DatetimeIndex'>
>> [2013-01-01 00:00:00-05:00, ..., 2013-01-04 00:00:00-05:00]
>> Length: 4, Freq: D, Timezone: US/Eastern
>>
>> In [5]: pd.date_range('20130101', periods=104, name='foo', 
>> tz='US/Eastern')
>> Out[5]:
>> <class 'pandas.tseries.index.DatetimeIndex'>
>> [2013-01-01 00:00:00-05:00, ..., 2013-04-14 00:00:00-04:00]
>> Length: 104, Freq: D, Timezone: US/Eastern
>>
>> New Behavior
>>
>> In [1]: pd.get_option('max_seq_items')
>> Out[1]: 10
>>
>> In [9]: pd.Index(range(4), name='foo')
>> Out[9]: Int64Index([0, 1, 2, 3], dtype='int64', name=u'foo')
>>
>> In [10]: pd.Index(range(104), name='foo')
>> Out[10]: Int64Index([0, 1, ..., 102, 103], dtype='int64', name=u'foo', 
>> length=104)
>>
>> In [11]: pd.date_range('20130101', periods=4, name='foo', tz='US/Eastern')
>> Out[11]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 
>> 00:00:00-05:00', '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00'], 
>> dtype='datetime64[ns]', name=u'foo', freq='D', tz='US/Eastern')
>>
>> In [12]: pd.date_range('20130101', periods=104 ,name='foo', 
>> tz='US/Eastern')
>> Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 
>> 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14 
>> 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104, 
>> freq='D', tz='US/Eastern')
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20150420/7e29f1aa/attachment-0001.html>