[scikit-learn] Help With Text Classification

pybokeh pybokeh at gmail.com
Wed Aug 2 23:12:36 EDT 2017


Thanks Joel for recommending FeatureUnion.  I did run across that.  But for
just 2 features, I thought that might be overkill.  I am aware of Pipeline
which the scikit-learn example explains very well, which I was going to
utilize once I finalize my script.  I did not want to abstract away too
much early on since I am in the beginning stages of learning machine
learning and scikit-learn.

- Daniel

On Wed, Aug 2, 2017 at 10:38 PM, Joel Nothman <joel.nothman at gmail.com>
wrote:

> Use a Pipeline to help avoid this kind of issue (and others). You might
> also want to do something like http://scikit-learn.org/
> stable/auto_examples/hetero_feature_union.html
>
> On 3 August 2017 at 12:01, pybokeh <pybokeh at gmail.com> wrote:
>
>> Hello,
>> I am studying this example from scikit-learn's site:
>> http://scikit-learn.org/stable/tutorial/text_analytics/worki
>> ng_with_text_data.html
>>
>> The problem that I need to solve is very similar to this example, except
>> I have one
>> additional feature column (part #) that is categorical of type string.
>> My label or target
>> values consist of just 2 values: 0 or 1.
>>
>> With that additional feature column, I am transforming it with a
>> LabelEncoder and
>> then I am encoding it with the OneHotEncoder.
>>
>> Then I am concatenating that one-hot encoded column (part #) to the
>> text/document
>> feature column (complaint), which I had applied the CountVectorizer and
>> TfidfTransformer transformations.
>>
>> Then I chose the MultinomialNB model to fit my concatenated training data
>> with.
>>
>> The problem I run into is when I invoke the prediction, I get a dimension
>> mis-match error.
>>
>> Here's my jupyter notebook gist:
>> http://nbviewer.jupyter.org/gist/anonymous/59ba930a783571c85
>> ef86ba41424b311
>>
>> I would gladly appreciate it if someone can guide me where I went wrong.
>> Thanks!
>>
>> - Daniel
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170802/10685ecf/attachment.html>


More information about the scikit-learn mailing list