Introducing NimbusML - experimental Python bindings for ML.NET

Gani Nazirov ganaziro at microsoft.com
Fri Nov 2 17:38:27 EDT 2018


We are excited to announce that yesterday we released and open sourced NimbusML<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMicrosoft%2FNimbusML&data=02%7C01%7Cganaziro%40microsoft.com%7C0c37bb14ad3545d507c008d6410025dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636767866514956851&sdata=1irzCc9xFFC0OID4SNpVniylBH7dxjgIXCv2L8pT01E%3D&reserved=0> ! This project provides experimental Python bindings for ML.NET<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdotnet%2Fmachinelearning&data=02%7C01%7Cganaziro%40microsoft.com%7C0c37bb14ad3545d507c008d6410025dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636767866514966861&sdata=CA6yPTUsjOjnHhEkXpugahOPgg%2BVRYusE%2B%2BI%2FiQmyuI%3D&reserved=0> (an open source and cross-platform machine learning framework for .NET)


NimbusML allows you to build ML.NET pipelines in Python and also integrate them into Scikit-Learn pipelines.



Highlights



·         Cross-platform: NimbusML is supported on Mac, Linux, and Windows.

·         Efficient interop with Scikit-learn/Pandas: NimbusML can accept Pandas dataframes as input and its components can also be used within Scikit-learn pipelines.

·         Majority of ML.NET components are available: Most ML.NET components can be used through NimbusML.

·         Performance parity with ML.NET: When using only NimbusML components (loaders, transforms, scorers, and evaluators), NimbusML performance matches ML.NET performance.

·         Familiar APIs for Scikit-learn users: NimbusML adheres to existing Scikit-learn conventions but also introduces some new concepts such as how to work with multiple columns in the pipelines.

·         Open-source: NimbusML will be built in the open and we encourage any non-confidential issues/questions to be added on GitHub. Please let us know if you are interested in contributing.

·         Interop with ML.NET models: models trained in NimbusML can be deployed in .NET applications using ML.NET (see here<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2FNimbusML%2Floadsavemodels&data=02%7C01%7Cganaziro%40microsoft.com%7C0c37bb14ad3545d507c008d6410025dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636767866514976865&sdata=WRAvv8XWuUs%2BSQvAHwGj8eP1XXOrgwKbkWSmhV5ipyo%3D&reserved=0> for an example).



Click here to view the NimbusML repo.<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMicrosoft%2FNimbusML&data=02%7C01%7Cganaziro%40microsoft.com%7C0c37bb14ad3545d507c008d6410025dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636767866514976865&sdata=%2FmjPQoZIildRuLJPocDTlMf0Xn65yAyn9R7oNUveNYw%3D&reserved=0>

Click here to view the NimbusML samples.<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMicrosoft%2FNimbusML-samples&data=02%7C01%7Cganaziro%40microsoft.com%7C0c37bb14ad3545d507c008d6410025dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636767866514986870&sdata=LJVm0IQClk4IgFmkpJbGsFp0r%2BZwSKpRALWA4T3QVN8%3D&reserved=0>

Click here to view the NimbusML docs.<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2FNimbusML%2Foverview&data=02%7C01%7Cganaziro%40microsoft.com%7C0c37bb14ad3545d507c008d6410025dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636767866514986870&sdata=jcgvpNNckRDO0bsGhgYD%2BVesxVHie3hU5doRQ7VfR7A%3D&reserved=0>



Installation



NimbusML can be installed using pip:

pip install nimbusml




You can run a quick test with:

python -m nimbusml.examples.FastLinearClassifier




NimbusML has been tested on Windows 10, MacOS 10.13, Ubuntu 14.04, Ubuntu 16.04, Ubuntu 18.04, CentOS 7, and RHEL 7.



NimbusML requires Python 2.7, 3.5, or 3.6 (64 bit). Python 3.7 is not supported yet.



Getting Started



Documentation can be found here<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2FNimbusML%2Foverview&data=02%7C01%7Cganaziro%40microsoft.com%7C0c37bb14ad3545d507c008d6410025dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636767866514996884&sdata=RboJY%2B%2FkznoN0g1ecoE0neZ4UeEKLvAIsIoFKDPID4I%3D&reserved=0>. Sample notebooks can be found here<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMicrosoft%2FNimbusML-Samples&data=02%7C01%7Cganaziro%40microsoft.com%7C0c37bb14ad3545d507c008d6410025dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636767866514996884&sdata=tIRDpTjIv2ZrKoxzMwYAsUCFumzgqKChnpsJCsmtK1c%3D&reserved=0>. A few examples:



·         Twitter Sentiment Analysis<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2FNimbusML%2Ftutorials%2Fb_b-sentiment-analysis-2-data-streaming-with-filedatastream&data=02%7C01%7Cganaziro%40microsoft.com%7C0c37bb14ad3545d507c008d6410025dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636767866515006889&sdata=wBQ3XbiaM7UqT7GsB0f%2BBirYIgBtHw38GaTsh6ySkPs%3D&reserved=0>

·         Ranking with LightGBM<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2FNimbusML%2Ftutorials%2Fb_e-learning-to-rank-with-microsoft-bing-data&data=02%7C01%7Cganaziro%40microsoft.com%7C0c37bb14ad3545d507c008d6410025dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636767866515006889&sdata=6C5DN89JLuvOuG1BFEgvFACDnX1PyrDCQzk5O2j1g84%3D&reserved=0>

·         Image clustering using a TensorFlow model<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2FNimbusML%2Ftutorials%2Fb_f-image-processing-clustering&data=02%7C01%7Cganaziro%40microsoft.com%7C0c37bb14ad3545d507c008d6410025dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636767866515016898&sdata=7bE9pUF7D6ELHjZ8nHznlTe40oC3O5EWrOYR7jceU6s%3D&reserved=0>

·         Binary classification with Logistic Regression<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fpython%2Fapi%2Fnimbusml%2Fnimbusml.linear_model.logisticregressionbinaryclassifier%3Fview%3Dnimbusml-py-latest&data=02%7C01%7Cganaziro%40microsoft.com%7C0c37bb14ad3545d507c008d6410025dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636767866515016898&sdata=XBeUH4m6J5R3M2rYWsW8HNdkEoi5GMwlogzsN%2BFYo2I%3D&reserved=0>

·         Save and load models (and use NimbusML models in ML.NET)<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fnimbusml%2Floadsavemodels%3Fview%3Dnimbusml-py-latest&data=02%7C01%7Cganaziro%40microsoft.com%7C0c37bb14ad3545d507c008d6410025dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636767866515026911&sdata=5pUTzUcE50TYsqQlRLPG8MV%2BW60a5OpGaLmAgn5XWmI%3D&reserved=0>



Sentiment analysis example with NimbusML components:



from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.ensemble import FastTreesBinaryClassifier
from nimbusml.feature_extraction.text import NGramFeaturizer



train_file = get_dataset('gen_twittertrain').as_filepath()
test_file = get_dataset('gen_twittertest').as_filepath()

train_data = FileDataStream.read_csv(train_file, sep='\t')
test_data = FileDataStream.read_csv(test_file, sep='\t')



pipeline = Pipeline([ # nimbusml pipeline
    NGramFeaturizer(columns={'Features': ['Text']}),
    FastTreesBinaryClassifier(feature=['Features'], label='Label')
])



# fit and predict
pipeline.fit(train_data)
results = pipeline.predict(test_data)




A complete notebook for this example can be found here<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMicrosoft%2FNimbusML-Samples%2Fblob%2Fmaster%2Fsamples%2F2.2%2520%255BText%255D%2520Sentiment%2520Analysis%25202%2520-%2520Data%2520Streaming%2520with%2520FileDataStream.ipynb&data=02%7C01%7Cganaziro%40microsoft.com%7C0c37bb14ad3545d507c008d6410025dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636767866515026911&sdata=WHuyTVINzb5RCO2fxQqeyhEgXf%2Bh1rfxDjhMjQJtOV4%3D&reserved=0>.

Sentiment analysis example with NimbusML + Scikit-Learn components:

from nimbusml.datasets import get_dataset
from nimbusml.ensemble import FastTreesBinaryClassifier

from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd



train_file = get_dataset('gen_twittertrain').as_filepath()
test_file = get_dataset('gen_twittertest').as_filepath()

train_data = pd.read_csv(train_file, sep='\t')
test_data = pd.read_csv(test_file, sep='\t')



pipeline = Pipeline([ # sklearn pipeline
    ('tfidf', TfidfVectorizer()), # sklearn transform
    ('clf', FastTreesBinaryClassifier()) # nimbusml learner
])



# fit and predict
pipeline.fit(train_data["Text"], train_data["Label"])
results = pipeline.predict(test_data["Text"])




A complete notebook for this example can be found here<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMicrosoft%2FNimbusML-Samples%2Fblob%2Fmaster%2Fsamples%2F2.3%2520%255BText%255D%2520Sentiment%2520Analysis%25203%2520-%2520Combining%2520NimbusML%2520and%2520Scikit-learn.ipynb&data=02%7C01%7Cganaziro%40microsoft.com%7C0c37bb14ad3545d507c008d6410025dc%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636767866515036907&sdata=ONit2cuOtU8AznT7uD6QzK9oltVikb3yM8e2MZ9DiI4%3D&reserved=0>.



Thank you!



-ML.NET Team



More information about the Python-announce-list mailing list