Analyze sentiment using the ML.NET CLI - ML.NET (original) (raw)

Learn how to use ML.NET CLI to automatically generate an ML.NET model and underlying C# code. You provide your dataset and the machine learning task you want to implement, and the CLI uses the AutoML engine to create model generation and deployment source code, as well as the classification model.

In this tutorial, you will do the following steps:

Note

This article refers to the ML.NET CLI tool, which is currently in preview, and material is subject to change. For more information, visit the ML.NET page.

The ML.NET CLI is part of ML.NET and its main goal is to "democratize" ML.NET for .NET developers when learning ML.NET so you don't need to code from scratch to get started.

You can run the ML.NET CLI on any command-prompt (Windows, Mac, or Linux) to generate good quality ML.NET models and source code based on training datasets you provide.

Prerequisites

You can either run the generated C# code projects from Visual Studio or with dotnet run (.NET CLI).

Prepare your data

We are going to use an existing dataset used for a 'Sentiment Analysis' scenario, which is a binary classification machine learning task. You can use your own dataset in a similar way, and the model and code will be generated for you.

  1. Download The UCI Sentiment Labeled Sentences dataset zip file (see citations in the following note), and unzip it on any folder you choose.
    Note
    The datasets this tutorial uses a dataset from the 'From Group to Individual Labels using Deep Features', Kotzias et al,. KDD 2015, and hosted at the UCI Machine Learning Repository - Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml\]. Irvine, CA: University of California, School of Information and Computer Science.
  2. Copy the yelp_labelled.txt file into any folder you previously created (such as /cli-test).
  3. Open your preferred command prompt and move to the folder where you copied the dataset file. For example:
cd /cli-test  

Using any text editor such as Visual Studio Code, you can open, and explore the yelp_labelled.txt dataset file. You can see that the structure is:

Run the 'mlnet classification' command

  1. Run the following ML.NET CLI command:
mlnet classification --dataset "yelp_labelled.txt" --label-col 1 --has-header false --train-time 10  

This command runs the mlnet classification command:

  1. The previous command execution has generated the following assets:
    • A serialized model .zip ("best model") ready to use.
    • C# code to run/score that generated model (To make predictions in your end-user apps with that model).
    • C# training code used to generate that model (Learning purposes).
    • A log file with all the iterations explored having specific detailed information about each algorithm tried with its combination of hyper-parameters and data transformations.
      The first two assets (.ZIP file model and C# code to run that model) can directly be used in your end-user apps (ASP.NET Core web app, services, desktop app, etc.) to make predictions with that generated ML model.
      The third asset, the training code, shows you what ML.NET API code was used by the CLI to train the generated model, so you can investigate what specific trainer/algorithm and hyper-parameters were selected by the CLI.

Those enumerated assets are explained in the following steps of the tutorial.

Explore the generated C# code to use for running the model to make predictions

  1. In Visual Studio, open the solution generated in the folder named SampleClassification within your original destination folder (it was named /cli-test in the tutorial). You should see a solution similar to:
    VS solution generated by the CLI
    Note
    The tutorial suggests using Visual Studio, but you can also explore the generated C# code (two projects) with any text editor and run the generated console app with the dotnet CLI on a macOS, Linux, or Windows machine.
    • The generated console app contains execution code that you must review and then you usually reuse the 'scoring code' (code that runs the ML model to make predictions) by moving that simple code (just a few lines) to your end-user application where you want to make the predictions.
    • The generated mbconfig file is a config file that can be used to retrain your model, either through the CLI or through Model Builder. This will also have two code files associated with it and a zip file.
      * The training file contains the code to build the model pipeline using the ML.NET API.
      * The consumption file contains the code to consume the model.
      * The zip file that is the generated model from the CLI.
  2. Open the SampleClassification.consumption.cs file within the mbconfig file. You'll see that there are input and output classes. These are data classes, or POCO classes, used to hold data. The classes contain boilerplate code that's useful if your dataset has tens or even hundreds of columns.
    • The ModelInput class is used when reading data from the dataset.
    • The ModelOutput class is used to get the prediction result (prediction data).
  3. Open the Program.cs file and explore the code. In just a few lines, you are able to run the model and make a sample prediction.
static void Main(string[] args)  
{  
    // Create single instance of sample data from first line of dataset for model input  
    ModelInput sampleData = new ModelInput()  
    {  
        Col0 = @"Wow... Loved this place.",  
    };  
    // Make a single prediction on the sample data and print results  
    var predictionResult = SampleClassification.Predict(sampleData);  
    Console.WriteLine("Using model to make single prediction -- Comparing actual Col1 with predicted Col1 from sample data...\n\n");  
    Console.WriteLine($"Col0: {sampleData.Col0}");  
    Console.WriteLine($"\n\nPredicted Col1 value {predictionResult.PredictedLabel} \nPredicted Col1 scores: [{String.Join(",", predictionResult.Score)}]\n\n");  
    Console.WriteLine("=============== End of process, hit any key to finish ===============");  
    Console.ReadKey();  
}  
ModelInput sampleData = new ModelInput()  
{  
    Col0 = "The ML.NET CLI is great for getting started. Very cool!"  
};  
  1. Run the project, either using the original sample data loaded from the first row of the dataset or by providing your own custom hard-coded sample data. You should get a prediction comparable to:
    ML.NET CLI run the app from Visual Studio

Try changing the hard-coded sample data to other sentences with different sentiments and see how the model predicts positive or negative sentiment.

Infuse your end-user applications with ML model predictions

You can use similar 'ML model scoring code' to run the model in your end-user application and make predictions.

For instance, you could directly move that code to any Windows desktop application such as WPF and WinForms and run the model in the same way than it was done in the console app.

However, the way you implement those lines of code to run an ML model should be optimized (that is, cache the model .zip file and load it once) and have singleton objects instead of creating them on every request, especially if your application needs to be scalable such as a web application or distributed service, as explained in the following section.

Running ML.NET models in scalable ASP.NET Core web apps and services (multi-threaded apps)

The creation of the model object (ITransformer loaded from a model's .zip file) and the PredictionEngine object should be optimized especially when running on scalable web apps and distributed services. For the first case, the model object (ITransformer) the optimization is straightforward. Since the ITransformer object is thread-safe, you can cache the object as a singleton or static object so you load the model once.

For the second object, the PredictionEngine object, it is not so easy because the PredictionEngine object is not thread-safe, therefore you cannot instantiate this object as singleton or static object in an ASP.NET Core app. This thread-safe and scalability problem is deeply discussed in this Blog Post.

However, things got a lot easier for you than what's explained in that blog post. We worked on a simpler approach for you and have created a '.NET Integration Package' that you can easily use in your ASP.NET Core apps and services by registering it in the application DI services (Dependency Injection services) and then directly use it from your code. Check the following tutorial and example for doing that:

Explore the generated C# code that was used to train the "best quality" model

For more advanced learning purposes, you can also explore the generated C# code that was used by the CLI tool to train the generated model.

That training model code is generated in the file named SampleClassification.training.cs, so you can investigate that training code.

More importantly, for this particular scenario (Sentiment Analysis model) you can also compare that generated training code with the code explained in the following tutorial:

It is interesting to compare the chosen algorithm and pipeline configuration in the tutorial with the code generated by the CLI tool. Depending on how much time you spend iterating and searching for better models, the chosen algorithm might be different along with its particular hyper-parameters and pipeline configuration.

In this tutorial, you learned how to:

See also