Fwd: Unable to convert pandas object to string

Paul Barry paul.james.barry at gmail.com
Sat Jun 24 07:31:48 EDT 2017


Forgot to include this reply to the list (as others may want to comment).

---------- Forwarded message ----------
From: Paul Barry <paul.james.barry at gmail.com>
Date: 24 June 2017 at 12:21
Subject: Re: Unable to convert pandas object to string
To: Bhaskar Dhariyal <dhariyalbhaskar at gmail.com>


Note that .info(), according to its docs, gives you a "Concise summary of a
DataFrame".  Everything is an object in Python, including strings, so the
output from .info() is technically correct (but maybe not very helpful in
your case).

As I've shown, we can work out that the data you want to work with is in
fact a string, so I've added some code to my notebook to show you how to
tokenize the first row of data.  This should get you started on doing this
to the rest of your data.

Note, too, that some of the data in these specific columns contains
something other than a string, so you'll need to clean up that first (see
the end of the updated notebook, attached, for how I worked out that this
was indeed the case).

I hope this all helps.

Paul.



On 24 June 2017 at 11:31, Bhaskar Dhariyal <dhariyalbhaskar at gmail.com>
wrote:

> The data type showing there is object. In[4] in the first page. I wanted
> to tokenize the name & desc column and clean it
>
>
> On Sat, Jun 24, 2017 at 3:54 PM, Paul Barry <paul.james.barry at gmail.com>
> wrote:
>
>> Hi Bhaskar.
>>
>> Please see attached PDF of a small Jupyter notebook.  As you'll see, the
>> data in the fields you mentioned are *already* strings.  What is it you are
>> trying to do here?
>>
>> Paul.
>>
>> On 24 June 2017 at 10:51, Bhaskar Dhariyal <dhariyalbhaskar at gmail.com>
>> wrote:
>>
>>>>>>  train.csv
>>> <https://drive.google.com/file/d/0B1D4AyluMGU0enoxbElGTV94Q0E/view?usp=drive_web>
>>> ​here it is thanks for quick reply
>>>
>>> On Sat, Jun 24, 2017 at 3:14 PM, Paul Barry <paul.james.barry at gmail.com>
>>> wrote:
>>>
>>>> Any chance you could post one line of data so we can see what we have
>>>> to work with?
>>>>
>>>> Also - have you taken a look at Jake VanderPlas's notebooks? There's
>>>> lot of help with pandas to be found there: https://github.com/jake
>>>> vdp/PythonDataScienceHandbook
>>>>
>>>> Paul.
>>>>
>>>> On 24 June 2017 at 10:32, Bhaskar Dhariyal <dhariyalbhaskar at gmail.com>
>>>> wrote:
>>>>
>>>>> <class 'pandas.core.frame.DataFrame'>
>>>>> Int64Index: 171594 entries, 0 to 63464
>>>>> Data columns (total 7 columns):
>>>>> project_id          171594 non-null object
>>>>> desc                171594 non-null object
>>>>> goal                171594 non-null float64
>>>>> keywords            171594 non-null object
>>>>> diff_creat_laun     171594 non-null int64
>>>>> diff_laun_status    171594 non-null int64
>>>>> diff_status_dead    171594 non-null int64
>>>>> dtypes: float64(1), int64(3), object(3)
>>>>>
>>>>> not able to convert desc and keywords to string for preprocessing.
>>>>> Tried astype(str). Please help
>>>>> --
>>>>> https://mail.python.org/mailman/listinfo/python-list
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w:
>>>> http://paulbarry.itcarlow.ie - e: paul.barry at itcarlow.ie
>>>> Lecturer, Computer Networking: Institute of Technology, Carlow, Ireland.
>>>>
>>>
>>>
>>
>>
>> --
>> Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w:
>> http://paulbarry.itcarlow.ie - e: paul.barry at itcarlow.ie
>> Lecturer, Computer Networking: Institute of Technology, Carlow, Ireland.
>>
>
>


-- 
Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w:
http://paulbarry.itcarlow.ie - e: paul.barry at itcarlow.ie
Lecturer, Computer Networking: Institute of Technology, Carlow, Ireland.



-- 
Paul Barry, t: @barrypj <https://twitter.com/barrypj> - w:
http://paulbarry.itcarlow.ie - e: paul.barry at itcarlow.ie
Lecturer, Computer Networking: Institute of Technology, Carlow, Ireland.



More information about the Python-list mailing list