Combining 2 data series into one

Bhaskar Dhariyal dhariyalbhaskar at gmail.com
Thu Jun 29 00:34:56 EDT 2017


On Wednesday, 28 June 2017 23:43:57 UTC+5:30, Albert-Jan Roskam  wrote:
> (sorry for top posting)
> Yes, I'd try pd.concat([df1, df2]).
> Or this:
> df['both_names'] = df.apply(lambda row: row.name + ' ' + row.surname, axis=1)
> ________________________________
> From: Python-list <python-list-bounces+sjeik_appie=hotmail.com at python.org> on behalf of Paul Barry <paul.james.barry at gmail.com>
> Sent: Wednesday, June 28, 2017 12:30:25 PM
> To: Bhaskar Dhariyal
> Cc: python-list at python.org
> Subject: Re: Combining 2 data series into one
> 
> Maybe look at using .concat instead of +
> 
> See:
> http://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.06-Concat-And-Append.ipynb
> 
> On 28 June 2017 at 13:02, Paul Barry <paul.james.barry at gmail.com> wrote:
> 
> >
> > Maybe try your code on a sub-set of your data - perhaps 1000 lines of
> > data? - to see if that works.
> >
> > Anyone else on the list suggest anything to try here?
> >
> > On 28 June 2017 at 12:50, Bhaskar Dhariyal <dhariyalbhaskar at gmail.com>
> > wrote:
> >
> >> No it didn't work. I am getting memory error. Using 32GB RAM system
> >>
> >> On Wed, Jun 28, 2017 at 5:17 PM, Paul Barry <paul.james.barry at gmail.com>
> >> wrote:
> >>
> >>> On the line that's failing, your code is this:
> >>>
> >>>     combinedX=combinedX+dframe['tf']
> >>>
> >>> which uses combinedX on both sides of the assignment statement - note
> >>> that Python is reporting a 'MemoryError", which may be happening due to
> >>> this "double use" (maybe).  What happens if you create a new dataframe,
> >>> like this:
> >>>
> >>>     newX = combinedX + dframe['tf']
> >>>
> >>> Regardless, it looks like you are doing a dataframe merge.  Jake V's
> >>> book has an excellent section on it here: http://nbviewer.jupyter.
> >>> org/github/jakevdp/PythonDataScienceHandbook/blob/master/not
> >>> ebooks/03.07-Merge-and-Join.ipynb - this should take about 20 minutes
> >>> to read, and may be of use to you.
> >>>
> >>> Paul.
> >>>
> >>>
> >>>
> >>> On 28 June 2017 at 12:19, Bhaskar Dhariyal <dhariyalbhaskar at gmail.com>
> >>> wrote:
> >>>
> >>>> On Wednesday, 28 June 2017 14:43:48 UTC+5:30, Paul Barry  wrote:
> >>>> > This should do it:
> >>>> >
> >>>> > >>> import pandas as pd
> >>>> > >>>
> >>>> > >>> df1 = pd.DataFrame(['bhaskar', 'Rohit'], columns=['first_name'])
> >>>> > >>> df1
> >>>> >   first_name
> >>>> > 0    bhaskar
> >>>> > 1      Rohit
> >>>> > >>> df2 = pd.DataFrame(['dhariyal', 'Gavval'], columns=['last_name'])
> >>>> > >>> df2
> >>>> >   last_name
> >>>> > 0  dhariyal
> >>>> > 1    Gavval
> >>>> > >>> df = pd.DataFrame()
> >>>> > >>> df['name'] = df1['first_name'] + ' ' + df2['last_name']
> >>>> > >>> df
> >>>> >                name
> >>>> > 0  bhaskar dhariyal
> >>>> > 1      Rohit Gavval
> >>>> > >>>
> >>>> >
> >>>> > Again, I draw your attention to Jake VanderPlas's excellent book,
> >>>> which is
> >>>> > available for free on the web.  All of these kind of data
> >>>> manipulations are
> >>>> > covered there:  https://github.com/jakevdp/PythonDataScienceHandbook
> >>>> - the
> >>>> > hard copy is worth owning too (if you plan to do a lot of work using
> >>>> > numpy/pandas).
> >>>> >
> >>>> > I'd also recommend the upcoming 2nd edition of Wes McKinney's "Python
> >>>> for
> >>>> > Data Analysis" book - I've just finished tech reviewing it for
> >>>> O'Reilly,
> >>>> > and it is very good, too - highly recommended.
> >>>> >
> >>>> > Regards.
> >>>> >
> >>>> > Paul.
> >>>> >
> >>>> > On 28 June 2017 at 07:11, Bhaskar Dhariyal <dhariyalbhaskar at gmail.com
> >>>> >
> >>>> > wrote:
> >>>> >
> >>>> > > Hi!
> >>>> > >
> >>>> > > I have 2 dataframe i.e. df1['first_name'] and df2['last_name']. I
> >>>> want to
> >>>> > > make it as df['name']. How to do it using pandas dataframe.
> >>>> > >
> >>>> > > first_name
> >>>> > > ----------
> >>>> > > bhaskar
> >>>> > > Rohit
> >>>> > >
> >>>> > >
> >>>> > > last_name
> >>>> > > -----------
> >>>> > > dhariyal
> >>>> > > Gavval
> >>>> > >
> >>>> > > should appear as
> >>>> > >
> >>>> > > name
> >>>> > > ----------
> >>>> > > bhaskar dhariyal
> >>>> > > Rohit Gavval
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > > Thanks
> >>>> > > --
> >>>> > > https://mail.python.org/mailman/listinfo/python-list
> >>>> > >
> >>>> >
> >>>> >
> >>>> >
> >>>> > --
> >>>> > Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w:
> >>>> > http://paulbarry.itcarlow.ie - e: paul.barry at itcarlow.ie
> >>>> > Lecturer, Computer Networking: Institute of Technology, Carlow,
> >>>> Ireland.
> >>>>
> >>>> https://drive.google.com/open?id=0Bw2Avni0DUa3aFJKdC1Xd2trM2c
> >>>> link to code
> >>>> --
> >>>> https://mail.python.org/mailman/listinfo/python-list
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w:
> >>> http://paulbarry.itcarlow.ie - e: paul.barry at itcarlow.ie
> >>> Lecturer, Computer Networking: Institute of Technology, Carlow, Ireland.
> >>>
> >>
> >>
> >
> >
> > --
> > Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w:
> > http://paulbarry.itcarlow.ie - e: paul.barry at itcarlow.ie
> > Lecturer, Computer Networking: Institute of Technology, Carlow, Ireland.
> >
> 
> 
> 
> --
> Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w:
> http://paulbarry.itcarlow.ie - e: paul.barry at itcarlow.ie
> Lecturer, Computer Networking: Institute of Technology, Carlow, Ireland.
> --
> https://mail.python.org/mailman/listinfo/python-list

Hi Albert!
Thanks for replying.
That issue was resolved. But I m struck with a new problem.
I generated tfidf representation for  pandas dataframe where each row contains some text. I also had some numerical feature which I wanted to combine with tfidf matrix. But this is giving memory error.



More information about the Python-list mailing list