Pandas GroupBy does not behave consistently

Sun May 15 07:05:00 EDT 2016

Hello, Michael,
Pandas GroupBy does not behave consistently.
Last time, when we had conversation, I used grouby.  It works well.
Now, I thought to re-write the program, so that I can end up with a clean script.
But, the problem is that a lot of columns are missing after groupby application.
Any idea?
Regards.
David 

    On Saturday, 14 May 2016, 17:00, Michael Selik <michael.selik at gmail.com> wrote:

 This StackOverflow question was the first search result when I Googled for "Python why is there a little u"http://stackoverflow.com/questions/11279331/what-does-the-u-symbol-mean-in-front-of-string-values
On Sat, May 14, 2016, 11:40 AM David Shi <davidgshi at yahoo.co.uk> wrote:

Hello, Michael,
Why there is a little u ?  u'ID',?
Why can be done to it?  How to handle such objects?
Can it be turn into list easily?
Regards.
David 

    On Saturday, 14 May 2016, 15:34, Michael Selik <michael.selik at gmail.com> wrote:

 You might also be interested in "Python for Data Analysis" for a thorough discussion of Pandas.http://shop.oreilly.com/product/0636920023784.do

On Sat, May 14, 2016 at 10:29 AM Michael Selik <michael.selik at gmail.com> wrote:

David, it sounds like you'll need a thorough introduction to the basics of Python.Check out the tutorial: https://docs.python.org/3/tutorial/
On Sat, May 14, 2016 at 6:19 AM David Shi <davidgshi at yahoo.co.uk> wrote:

Hello, Michael,
I discovered that the problem is "two columns of data are put together" and "are recognised as one column".
This is very strange.  I would like to understand the subject well.
And, how many ways are there to investigate into the nature of objects dynamically?
Some object types only get shown as an object.  Are there anything to be typed in Python, to reveal objects.
Regards.
David 

    On Saturday, 14 May 2016, 4:30, Michael Selik <michael.selik at gmail.com> wrote:

 What were you hoping to get from ``df[0]``?When you say it "yields nothing" do you mean it raised an error? What was the error message?
Have you tried a Google search for "pandas set index"?http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.set_index.html

On Fri, May 13, 2016 at 11:18 PM David Shi <davidgshi at yahoo.co.uk> wrote:

Hello, Michael,
I tried to discover the problem.
df[0]   yields nothingdf[1]  yields nothingdf[2] yields nothing
However, df[3] gives the following:sid
-9223372036854775808          NaN
 1                      133738.70
 4                      295256.11
 5                      137733.09
 6                      409413.58
 8                      269600.97
 9                       12852.94
Can we split this back to normal?  or turn it into a dictionary, so that I can put values back properly.
I like to use sid as index, some way.
Regards.
David 

    On Friday, 13 May 2016, 22:58, Michael Selik <michael.selik at gmail.com> wrote:

 What have code you tried? What error message are you receiving?
On Fri, May 13, 2016, 5:54 PM David Shi <davidgshi at yahoo.co.uk> wrote:

Hello, Michael,
How to convert a float type column into an integer or label or string type? 

    On Friday, 13 May 2016, 22:02, Michael Selik <michael.selik at gmail.com> wrote:

 To clarify that you're specifying the index as a label, use df.iloc
    >>> df = pd.DataFrame({'X': range(4)}, index=list('abcd'))    >>> df       X    a  0    b  1    c  2    d  3    >>> df.loc['a']    X    0    Name: a, dtype: int64    >>> df.iloc[0]    X    0    Name: a, dtype: int64
On Fri, May 13, 2016 at 4:54 PM David Shi <davidgshi at yahoo.co.uk> wrote:

Dear Michael,
To avoid complication, I only groupby using one column.
It is OK now.  But, how to refer to new row index?  How do I use floating index?
Float64Index([ 1.0,  4.0,  5.0,  6.0,  8.0,  9.0, 10.0, 11.0, 12.0, 13.0, 16.0,
              17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0,
              28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0, 35.0, 36.0, 37.0, 38.0,
              39.0, 40.0, 41.0, 42.0, 44.0, 45.0, 46.0, 47.0, 48.0, 49.0, 50.0,
              51.0, 53.0, 54.0, 55.0, 56.0],
             dtype='float64', name=u'StateFIPS')
Regards.
David 

    On Friday, 13 May 2016, 21:43, Michael Selik <michael.selik at gmail.com> wrote:

 Here's an example.
    >>> import pandas as pd    >>> df = pd.DataFrame({'group': list('AB') * 2, 'data': range(4)}, index=list('wxyz'))    >>> df       data group    w     0     A    x     1     B    y     2     A    z     3     B    >>> df = df.reset_index()    >>> df      index  data group    0     w     0     A    1     x     1     B    2     y     2     A    3     z     3     B    >>> df.groupby('group').max()          index  data    group    A         y     2    B         z     3
If that doesn't help, you'll need to explain what you're trying to accomplish in detail -- what variables you started with, what transformations you want to do, and what variables you hope to have when finished.
On Fri, May 13, 2016 at 4:36 PM David Shi <davidgshi at yahoo.co.uk> wrote:

Hello, Michael,
I changed groupby with one column.
The index is different.
Index([   u'AL',    u'AR',    u'AZ',    u'CA',    u'CO',    u'CT',    u'DC',
          u'DE',    u'FL',    u'GA',    u'IA',    u'ID',    u'IL',    u'IN',
          u'KS',    u'KY',    u'LA',    u'MA',    u'MD',    u'ME',    u'MI',
          u'MN',    u'MO',    u'MS',    u'MT',    u'NC',    u'ND',    u'NE',
          u'NH',    u'NJ',    u'NM',    u'NV',    u'NY',    u'OH',    u'OK',
          u'OR',    u'PA',    u'RI',    u'SC',    u'SD', u'State',    u'TN',
          u'TX',    u'UT',    u'VA',    u'VT',    u'WA',    u'WI',    u'WV',
          u'WY'],
      dtype='object', name=0)
How to use this index?
Regards.
David 

    On Friday, 13 May 2016, 21:19, David Shi <davidgshi at yahoo.co.uk> wrote:

 Hello, Michael,
I typed in df.index
I got the followingMultiIndex(levels=[[1.0, 4.0, 5.0, 6.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0, 35.0, 36.0, 37.0, 38.0, 39.0, 40.0, 41.0, 42.0, 44.0, 45.0, 46.0, 47.0, 48.0, 49.0, 50.0, 51.0, 53.0, 54.0, 55.0, 56.0], [u'AL', u'AR', u'AZ', u'CA', u'CO', u'CT', u'DC', u'DE', u'FL', u'GA', u'IA', u'ID', u'IL', u'IN', u'KS', u'KY', u'LA', u'MA', u'MD', u'ME', u'MI', u'MN', u'MO', u'MS', u'MT', u'NC', u'ND', u'NE', u'NH', u'NJ', u'NM', u'NV', u'NY', u'OH', u'OK', u'OR', u'PA', u'RI', u'SC', u'SD', u'State', u'TN', u'TX', u'UT', u'VA', u'VT', u'WA', u'WI', u'WV', u'WY']],
           labels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48], [0, 2, 1, 3, 4, 5, 7, 6, 8, 9, 11, 12, 13, 10, 14, 15, 16, 19, 18, 17, 20, 21, 23, 22, 24, 27, 31, 28, 29, 30, 32, 25, 26, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 45, 44, 46, 48, 47, 49]],
           names=[u'StateFIPS', 0])Regards.
David 

    On Friday, 13 May 2016, 21:11, David Shi <davidgshi at yahoo.co.uk> wrote:

 Dear Michael,
I have done a number of operation in between.
Providing that information does not help you
How to reset index after grouping and various operations is of interest.
How to type in a command to find out its current dataframe?
Regards.
David 

    On Friday, 13 May 2016, 20:58, Michael Selik <michael.selik at gmail.com> wrote:

 Just in case I misunderstood, why don't you make a little example of before and after the grouping? This mailing list does not accept attachments, so you'll have to make do with pasting a few rows of comma-separated or tab-separated values.
On Fri, May 13, 2016 at 3:56 PM Michael Selik <michael.selik at gmail.com> wrote:

In order to preserve your index after the aggregation, you need to make sure it is considered a data column (via reset_index) and then choose how your aggregation will operate on that column.
On Fri, May 13, 2016 at 3:29 PM David Shi <davidgshi at yahoo.co.uk> wrote:

Hello, Michael,
Why reset_index before grouping?
Regards.
David 

  On Friday, 13 May 2016, 17:57, Michael Selik <michael.selik at gmail.com> wrote:

On Fri, May 13, 2016 at 12:27 PM David Shi via Python-list <python-list at python.org> wrote:

I lost my indexes after grouping in Pandas.
I managed to rest_index and got back the index column.
But How can I get back a index row?

Was the grouping an aggregation? If so, the original indexes are meaningless. What you could do is reset_index before the grouping and when you aggregate decide how to handle the formerly-known-as-index column (min, max, mean, ?).