[scikit-learn] Predict Method of OneVsRestClassifier Integration with Google Cloud ML

Sebastian Raschka mail at sebastianraschka.com
Wed Apr 10 13:35:07 EDT 2019


Hi Liam,

not sure what your exact error message is, but it may also be that the XGBClassifier only accepts dense arrays? I think the TfidfVectorizer returns sparse arrays. You could probably fix your issues by inserting a "DenseTransformer" into your pipelone (a simple class that just transforms an array from a sparse to a dense format). I've implemented sth like that that you can import or copy&paste it from here:

https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py

The usage would then basically be

model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense', DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))])

Best,
Sebastian




> On Apr 10, 2019, at 12:25 PM, Liam Geron <liam at chatdesk.com> wrote:
> 
> Hi all,
> 
> I was hoping to get some guidance re: changing the result of the predict method of the OneVsRestClassifier to return a dense array rather than a sparse array, given that Google Cloud ML only accepts dense numpy arrays as a result of a given models predict method. Right now my model architecture looks like:
> 
> model = Pipeline([('tfidf', TfidfVectorizer()), ('clf', OneVsRestClassifier(XGBClassifier()))])
> 
> Which returns a sparse array with the predict method. I saw the Stack Overflow post here: https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba
> 
> which recommends overwriting the predict method with the predict_proba method, however I found that I can't serialize the model after doing so. I also have a stack overflow post here: https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a which details the specific pickling error.
> 
> Is this a known issue? Is there an accepted way to convert this into a dense array?
> 
> Thanks,
> Liam Geron
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn



More information about the scikit-learn mailing list