Combining 2 data series into one

Bhaskar Dhariyal dhariyalbhaskar at gmail.com
Sat Jul 1 06:24:05 EDT 2017


Thanks Albert!
I have successfully completed the project. Thanks all for your support.

On Sat, Jul 1, 2017 at 1:59 PM, Albert-Jan Roskam <sjeik_appie at hotmail.com>
wrote:

> Hi,
>
> Does your code run on a sample of the data?
> Does your code have categorical data in it? If so:
> https://pandas.pydata.org/pandas-docs/stable/categorical.html. Also,
> check out http://www.pytables.org.
>
> Albert-Jan
> ------------------------------
> *From:* Python-list <python-list-bounces+sjeik_appie=
> hotmail.com at python.org> on behalf of Bhaskar Dhariyal <
> dhariyalbhaskar at gmail.com>
> *Sent:* Thursday, June 29, 2017 4:34:56 AM
> *To:* python-list at python.org
> *Subject:* Re: Combining 2 data series into one
>
> On Wednesday, 28 June 2017 23:43:57 UTC+5:30, Albert-Jan Roskam  wrote:
> > (sorry for top posting)
> > Yes, I'd try pd.concat([df1, df2]).
> > Or this:
> > df['both_names'] = df.apply(lambda row: row.name + ' ' + row.surname,
> axis=1)
> > ________________________________
> > From: Python-list <python-list-bounces+sjeik_appie=
> hotmail.com at python.org> on behalf of Paul Barry <
> paul.james.barry at gmail.com>
> > Sent: Wednesday, June 28, 2017 12:30:25 PM
> > To: Bhaskar Dhariyal
> > Cc: python-list at python.org
> > Subject: Re: Combining 2 data series into one
> >
> > Maybe look at using .concat instead of +
> >
> > See:
> > http://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/
> blob/master/notebooks/03.06-Concat-And-Append.ipynb
> >
> > On 28 June 2017 at 13:02, Paul Barry <paul.james.barry at gmail.com> wrote:
> >
> > >
> > > Maybe try your code on a sub-set of your data - perhaps 1000 lines of
> > > data? - to see if that works.
> > >
> > > Anyone else on the list suggest anything to try here?
> > >
> > > On 28 June 2017 at 12:50, Bhaskar Dhariyal <dhariyalbhaskar at gmail.com>
> > > wrote:
> > >
> > >> No it didn't work. I am getting memory error. Using 32GB RAM system
> > >>
> > >> On Wed, Jun 28, 2017 at 5:17 PM, Paul Barry <
> paul.james.barry at gmail.com>
> > >> wrote:
> > >>
> > >>> On the line that's failing, your code is this:
> > >>>
> > >>>     combinedX=combinedX+dframe['tf']
> > >>>
> > >>> which uses combinedX on both sides of the assignment statement - note
> > >>> that Python is reporting a 'MemoryError", which may be happening due
> to
> > >>> this "double use" (maybe).  What happens if you create a new
> dataframe,
> > >>> like this:
> > >>>
> > >>>     newX = combinedX + dframe['tf']
> > >>>
> > >>> Regardless, it looks like you are doing a dataframe merge.  Jake V's
> > >>> book has an excellent section on it here: http://nbviewer.jupyter.
> > >>> org/github/jakevdp/PythonDataScienceHandbook/blob/master/not
> > >>> ebooks/03.07-Merge-and-Join.ipynb - this should take about 20
> minutes
> > >>> to read, and may be of use to you.
> > >>>
> > >>> Paul.
> > >>>
> > >>>
> > >>>
> > >>> On 28 June 2017 at 12:19, Bhaskar Dhariyal <
> dhariyalbhaskar at gmail.com>
> > >>> wrote:
> > >>>
> > >>>> On Wednesday, 28 June 2017 14:43:48 UTC+5:30, Paul Barry  wrote:
> > >>>> > This should do it:
> > >>>> >
> > >>>> > >>> import pandas as pd
> > >>>> > >>>
> > >>>> > >>> df1 = pd.DataFrame(['bhaskar', 'Rohit'],
> columns=['first_name'])
> > >>>> > >>> df1
> > >>>> >   first_name
> > >>>> > 0    bhaskar
> > >>>> > 1      Rohit
> > >>>> > >>> df2 = pd.DataFrame(['dhariyal', 'Gavval'],
> columns=['last_name'])
> > >>>> > >>> df2
> > >>>> >   last_name
> > >>>> > 0  dhariyal
> > >>>> > 1    Gavval
> > >>>> > >>> df = pd.DataFrame()
> > >>>> > >>> df['name'] = df1['first_name'] + ' ' + df2['last_name']
> > >>>> > >>> df
> > >>>> >                name
> > >>>> > 0  bhaskar dhariyal
> > >>>> > 1      Rohit Gavval
> > >>>> > >>>
> > >>>> >
> > >>>> > Again, I draw your attention to Jake VanderPlas's excellent book,
> > >>>> which is
> > >>>> > available for free on the web.  All of these kind of data
> > >>>> manipulations are
> > >>>> > covered there:  https://github.com/jakevdp/
> PythonDataScienceHandbook
> > >>>> - the
> > >>>> > hard copy is worth owning too (if you plan to do a lot of work
> using
> > >>>> > numpy/pandas).
> > >>>> >
> > >>>> > I'd also recommend the upcoming 2nd edition of Wes McKinney's
> "Python
> > >>>> for
> > >>>> > Data Analysis" book - I've just finished tech reviewing it for
> > >>>> O'Reilly,
> > >>>> > and it is very good, too - highly recommended.
> > >>>> >
> > >>>> > Regards.
> > >>>> >
> > >>>> > Paul.
> > >>>> >
> > >>>> > On 28 June 2017 at 07:11, Bhaskar Dhariyal <
> dhariyalbhaskar at gmail.com
> > >>>> >
> > >>>> > wrote:
> > >>>> >
> > >>>> > > Hi!
> > >>>> > >
> > >>>> > > I have 2 dataframe i.e. df1['first_name'] and df2['last_name'].
> I
> > >>>> want to
> > >>>> > > make it as df['name']. How to do it using pandas dataframe.
> > >>>> > >
> > >>>> > > first_name
> > >>>> > > ----------
> > >>>> > > bhaskar
> > >>>> > > Rohit
> > >>>> > >
> > >>>> > >
> > >>>> > > last_name
> > >>>> > > -----------
> > >>>> > > dhariyal
> > >>>> > > Gavval
> > >>>> > >
> > >>>> > > should appear as
> > >>>> > >
> > >>>> > > name
> > >>>> > > ----------
> > >>>> > > bhaskar dhariyal
> > >>>> > > Rohit Gavval
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > > Thanks
> > >>>> > > --
> > >>>> > > https://mail.python.org/mailman/listinfo/python-list
> > >>>> > >
> > >>>> >
> > >>>> >
> > >>>> >
> > >>>> > --
> > >>>> > Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w:
> > >>>> > http://paulbarry.itcarlow.ie - e: paul.barry at itcarlow.ie
> > >>>> > Lecturer, Computer Networking: Institute of Technology, Carlow,
> > >>>> Ireland.
> > >>>>
> > >>>> https://drive.google.com/open?id=0Bw2Avni0DUa3aFJKdC1Xd2trM2c
> > >>>> link to code
> > >>>> --
> > >>>> https://mail.python.org/mailman/listinfo/python-list
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w:
> > >>> http://paulbarry.itcarlow.ie - e: paul.barry at itcarlow.ie
> > >>> Lecturer, Computer Networking: Institute of Technology, Carlow,
> Ireland.
> > >>>
> > >>
> > >>
> > >
> > >
> > > --
> > > Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w:
> > > http://paulbarry.itcarlow.ie - e: paul.barry at itcarlow.ie
> > > Lecturer, Computer Networking: Institute of Technology, Carlow,
> Ireland.
> > >
> >
> >
> >
> > --
> > Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w:
> > http://paulbarry.itcarlow.ie - e: paul.barry at itcarlow.ie
> > Lecturer, Computer Networking: Institute of Technology, Carlow, Ireland.
> > --
> > https://mail.python.org/mailman/listinfo/python-list
>
> Hi Albert!
> Thanks for replying.
> That issue was resolved. But I m struck with a new problem.
> I generated tfidf representation for  pandas dataframe where each row
> contains some text. I also had some numerical feature which I wanted to
> combine with tfidf matrix. But this is giving memory error.
> --
> https://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list