From jeffreback at gmail.com Mon May 11 17:42:11 2015 From: jeffreback at gmail.com (Jeff Reback) Date: Mon, 11 May 2015 11:42:11 -0400 Subject: [Pandas-dev] ANN: pandas 0.16.1 released Message-ID: Hello, We are proud to announce v0.16.1 of pandas, a minor release from 0.16.0. This release includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. This was a release of 7 weeks with 222 commits by 57 authors encompassing 85 issues. We recommend that all users upgrade to this version. *What is it:* *pandas* is a Python package providing fast, flexible, and expressive data structures designed to make working with ?relational? or ?labeled? data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. Highlights of this release include: - Support for *CategoricalIndex*, a category based index, see here - New section on how-to-contribute to *pandas*, see here - Revised "Merge, join, and concatenate" documentation, including graphical examples to make it easier to understand each operations, see here - New method *sample* for drawing random samples from Series, DataFrames and Panels. See here - The default *Index* printing has changed to a more uniform format, see here - *BusinessHour* datetime-offset is now supported, see here - Further enhancement to the *.str* accessor to make string operations easier, see here See the Whatsnew in v0.16.1 Documentation: http://pandas.pydata.org/pandas-docs/stable/ Source tarballs, windows binaries are available on PyPI: https://pypi.python.org/pypi/pandas windows binaries are courtesy of Christoph Gohlke and are built on Numpy 1.8 macosx wheels are courtesy of Matthew Brett Please report any issues here: https://github.com/pydata/pandas/issues Thanks The Pandas Development Team Contributors to the 0.16.1 release - - Alfonso MHC - Andy Hayden - Artemy Kolchinsky - Chris Gilmer - Chris Grinolds - Dan Birken - David BROCHART - David Hirschfeld - David Stephens - Dr. Leo - Evan Wright - Frans van Dunn? - Hatem Nassrat - Henning Sperr - Hugo Herter - Jan Schulz - Jeff Blackburne - Jeff Reback - Jim Crist - Jonas Abernot - Joris Van den Bossche - Kerby Shedden - Leo Razoumov - Manuel Riel - Mortada Mehyar - Nick Burns - Nick Eubank - Olivier Grisel - Phillip Cloud - Pietro Battiston - Roy Hyunjin Han - Sam Zhang - Scott Sanderson - Stephan Hoyer - Tiago Antao - Tom Ajamian - Tom Augspurger - Tomaz Berisa - Vikram Shirgur - Vladimir Filimonov - William Hogman - Yasin A - Younggun Kim - behzad nouri - dsm054 - floydsoft - flying-sheep - gfr - jnmclarty - jreback - ksanghai - lucas - mschmohl - ptype - rockg - scls19fr - sinhrks -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Fri May 22 01:31:32 2015 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Fri, 22 May 2015 01:31:32 +0200 Subject: [Pandas-dev] [pydata] Re: Upcoming Index repr changes In-Reply-To: References: <3cb616db-fbad-44a9-970d-93c7dda3d42d@googlegroups.com> <6ab996ef-e112-4f52-9d58-60947629011f@googlegroups.com> Message-ID: Follow-up of this discussion: as you may have seen, the changes were released in 0.16.1 (see the whatsnew docs: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#index-representation ). In the end, we used the suggestion of John to go for a bit more numpy style output. There will probably still be some quirks/things to improve, you can report them at this follow-up issue: https://github.com/pydata/pandas/issues/10095 Joris 2015-04-21 2:59 GMT+02:00 Joris Van den Bossche < jorisvandenbossche at gmail.com>: > I like the suggestion of John to have something more like the output of > numpy arrays. > > For example, the proposed repr: > > In [12]: pd.date_range('20130101',periods=104,name='foo',tz='US/Eastern') > Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 > 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14 > 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104, > freq='D', tz='US/Eastern') > > would then be something like this: > > In [12]: pd.date_range('20130101',periods=104,name='foo',tz='US/Eastern') > Out[12]: > DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', > ..., > '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'], > dtype='datetime64[ns]', name=u'foo', length=104, freq='D', > tz='US/Eastern') > > > 2015-04-21 2:53 GMT+02:00 Jeff : > >> >> John, you are quoting the current impl (which is first), the new is like >> this: >> >> In [11]: pd.date_range('20130101',periods=4,name='foo',tz='US/Eastern') >> Out[11]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00'], dtype='datetime64[ns]', name=u'foo', freq='D', tz='US/Eastern') >> >> In [12]: pd.date_range('20130101',periods=104,name='foo',tz='US/Eastern') >> Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104, freq='D', tz='US/Eastern') >> >> Lorenzo, to answer your question, MultiIndexes are unchanged (and >> CategoricalIndex are new). We *could* make them a single line but would be >> pretty crowded. >> >> Note that MultiIndex and CategoricalIndex are multi-line repr and do no >> truncate sequences (of e.g. labels), this is consistent with previous >> versions. (easy to change this though) >> >> In [1]: MultiIndex.from_product([list('abcdefg'),range(10)],names=['first','second']) >> Out[1]: >> MultiIndex(levels=[[u'a', u'b', u'c', u'd', u'e', u'f', u'g'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]], >> labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]], >> names=[u'first', u'second']) >> >> In [4]: pd.CategoricalIndex(np.random.randint(0,5,size=100),name='foo') >> Out[4]: >> CategoricalIndex([3, 0, 0, 3, 1, 3, 0, 4, 2, 3, 0, 4, 0, 1, 2, 0, 4, 1, 4, 2, 3, 1, 0, 4, 4, 3, 0, 3, 0, 1, 2, 3, 3, 1, 1, 0, 0, 4, 4, 1, 1, 3, 1, 1, 4, 4, 3, 0, 0, 0, 4, 4, 0, 1, 3, 1, 2, 0, 3, 1, 2, 2, 2, 1, 1, 4, 1, 0, 4, 3, 3, 0, 0, 0, 4, 4, 1, 4, 2, 2, 1, 4, 0, 0, 0, 4, 3, 0, 4, 0, 0, 0, 3, 3, 1, 2, 2, 3, 4, 1], >> categories=[0, 1, 2, 3, 4], >> ordered=False, >> name=u'foo', >> dtype='category') >> >> >> >> >> >> On Monday, April 20, 2015 at 8:37:01 PM UTC-4, John E wrote: >>> >>> This is probably not the sort of comment you're looking for, but I'd >>> like to see more of a table-style output. I can just put a 'values' at the >>> end to get the more numpy like output (which is easier to read IMO), but it >>> won't stop at 10 or 100 unless I tell it to. Nevertheless, I think it's >>> much easer to read this: >>> >>> pd.date_range('20130101', periods=104, name='foo', >>> tz='US/Eastern').values >>> Out[442]: >>> array(['2013-01-01T00:00:00.000000000-0500', >>> '2013-01-02T00:00:00.000000000-0500', >>> '2013-01-03T00:00:00.000000000-0500', >>> '2013-01-04T00:00:00.000000000-0500', >>> '2013-01-05T00:00:00.000000000-0500', >>> >>> than this: >>> >>> pd.date_range('20130101', periods=104, name='foo', tz='US/Eastern') >>> Out[443]: >>> >>> [2013-01-01 00:00:00-05:00, ..., 2013-04-14 00:00:00-04:00] >>> Length: 104, Freq: D, Timezone: US/Eastern >>> >>> >>> On Friday, April 17, 2015 at 6:07:44 AM UTC-4, Joris Van den Bossche >>> wrote: >>>> >>>> Hi all, >>>> >>>> We have a PR pending to unify the string representation of the >>>> different Index objects: https://github.com/pydata/pandas/pull/9901 >>>> >>>> What are the most important changes: >>>> >>>> - We propose to reduce the default number of values shown from 100 >>>> to 10 (an option controllable as pd.options.display.max_seq_items). >>>> - The datetime-like indices (DatetimeIndex, TimedeltaIndex, >>>> PeriodIndex) were always somewhat different and get a new repr that is now >>>> more consistent with how it is for other Index types like Int64Index. This >>>> is the biggest change. >>>> >>>> So for eg Int64Index not much changes (only 'name' is now also shown, >>>> and the number of shown values has changed), but for DatetimeIndex the >>>> change is larger. >>>> >>>> *But we would like to get some feedback on this!* >>>> >>>> Do you like the changes? For DatetimeIndex? For the number of shown >>>> values? >>>> Would you want different behaviour for repr() and str()? >>>> >>>> Some examples of the changes with the current state of the PR are shown >>>> below: >>>> >>>> Previous Behavior >>>> >>>> In [1]: pd.get_option('max_seq_items') >>>> Out[1]: 100 >>>> >>>> In [2]: pd.Index(range(4), name='foo') >>>> Out[2]: Int64Index([0, 1, 2, 3], dtype='int64') >>>> >>>> In [3]: pd.Index(range(104), name='foo') >>>> Out[3]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, >>>> 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, >>>> 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, >>>> 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, >>>> 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, >>>> 91, 92, 93, 94, 95, 96, 97, 98, 99, ...], dtype='int64') >>>> >>>> In [4]: pd.date_range('20130101', periods=4, name='foo', >>>> tz='US/Eastern') >>>> Out[4]: >>>> >>>> [2013-01-01 00:00:00-05:00, ..., 2013-01-04 00:00:00-05:00] >>>> Length: 4, Freq: D, Timezone: US/Eastern >>>> >>>> In [5]: pd.date_range('20130101', periods=104, name='foo', >>>> tz='US/Eastern') >>>> Out[5]: >>>> >>>> [2013-01-01 00:00:00-05:00, ..., 2013-04-14 00:00:00-04:00] >>>> Length: 104, Freq: D, Timezone: US/Eastern >>>> >>>> New Behavior >>>> >>>> In [1]: pd.get_option('max_seq_items') >>>> Out[1]: 10 >>>> >>>> In [9]: pd.Index(range(4), name='foo') >>>> Out[9]: Int64Index([0, 1, 2, 3], dtype='int64', name=u'foo') >>>> >>>> In [10]: pd.Index(range(104), name='foo') >>>> Out[10]: Int64Index([0, 1, ..., 102, 103], dtype='int64', name=u'foo', >>>> length=104) >>>> >>>> In [11]: pd.date_range('20130101', periods=4, name='foo', >>>> tz='US/Eastern') >>>> Out[11]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 >>>> 00:00:00-05:00', '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00'], >>>> dtype='datetime64[ns]', name=u'foo', freq='D', tz='US/Eastern') >>>> >>>> In [12]: pd.date_range('20130101', periods=104 ,name='foo', >>>> tz='US/Eastern') >>>> Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 >>>> 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14 >>>> 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104, >>>> freq='D', tz='US/Eastern') >>>> >>>> -- >> You received this message because you are subscribed to the Google Groups >> "PyData" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to pydata+unsubscribe at googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Fri May 29 22:36:48 2015 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Fri, 29 May 2015 22:36:48 +0200 Subject: [Pandas-dev] Pandas development meeting: tuesday June 2 at 17:00 UTC Message-ID: Hi all, We are planning a next online Pandas Development Meeting coming Monday, June 2nd, at 17:00 UTC (which should correspond to 19:00 CEST in most of Europe, and 13:00 EST and 10am PST in the two sides of America). Some first provisional topics to discuss are listed here: https://docs.google.com/document/d/1tGbTiYORHiSPgVMXawiweGJlBw5dOkVJLY-licoBmBU/edit#heading=h.fwo3xcbnullz If you think of other points, or have remarks on them, feel free to add them in the google docs. If you are interested in joining (and you don't need to be a core developer of pandas for that!), send a notice, then we ensure to invite you for the google hang-out. Regards, Joris -------------- next part -------------- An HTML attachment was scrubbed... URL: