Sentiment analysis using sklearn

Dan Stromberg drsalists at gmail.com
Sat Jan 27 20:20:46 EST 2018


On Sat, Jan 27, 2018 at 1:05 PM, qrious <mittra at juno.com> wrote:
> I am attempting to understand how scikit learn works for sentiment analysis and came across this blog post:
>
> https://marcobonzanini.wordpress.com/2015/01/19/sentiment-analysis-with-python-and-scikit-learn
>
> The corresponding code is at this location:
>
> https://gist.github.com/bonzanini/c9248a239bbab0e0d42e
>
> My question is while trying to predict, why does the curr_class in Line 44 of the code need a classification (pos or neg) for the test data? After all, am I not trying to predict it? Without any initial value of curr_class, the program has a run time error.

I'm a real neophyte when it comes to modern AI, but I believe the
intent is to divide your inputs into "training data" and "test data"
and "real world data".

So you create your models using training data including correct
classifications as part of the input.

And you check how well your models are doing on inputs they haven't
seen before with test data, which also is classified in advance, to
verify how well things are working.

And then you use real world, as-yet-unclassified data in production,
after you've selected your best model, to derive a classification from
what your model has seen in the past.

So both the training data and test data need accurate labels in
advance, but the real world data trusts the model to do pretty well
without further labeling.



More information about the Python-list mailing list