cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - Did you know you can set a signature that will be added to all your posts? Set it here! X

From Scikit-Learn To ThingWorx Analytics

100% helpful (1/1)

In ThingWorx Analytics, you have the possibility to use an external model for scoring. In this written tutorial, I would like to provide an overview of how you can use a model developed in Python, using the scikit-learn library in ThingWorx Analytics.

The provided attachment contains an archive with the following files:

  • iris_data.csv: A dataset for pattern recognition that has a categorical goal. You can click here to read more about this dataset
  • TestRFToPmml.ipynb: A Jupyter notebook file with the source code for the Python model as well as the steps to export it to PMML
  • RF_Iris.pmml: The PMML file with the model that you can directly upload in Analytics without going through the steps of training the model in Python

The tutorial assumes you already have some knowledge of ThingWorx and ThingWorx Analytics. Also, if you plan to run the Python code and train the model yourself, you need to have Jupyter notebook installed (I used the one from the Anaconda distribution).

For demonstration purposes, I have created a very simple random forest model in Python. To convert the model to PMML, I have used the sklearn2pmml library. Because ThingWorx Analytics supports PMML format 4.3, you need to install sklearn2pmml version 0.56.2 (the highest version that supports PMML 4.3). To read more about this library, please click here

Furthermore, to use your model with the older version of the sklearn2pmml, I have installed scikit-learn version 0.23.2.  You will find the commands to install the two libraries in the first two cells of the notebook.

 

Code Walkthrough

The first step is to import the required libraries (please note that pandas library is also required to transform the .csv to a Dataframe object):

 

import pandas
from sklearn.ensemble import RandomForestClassifier
from sklearn2pmml import sklearn2pmml
from sklearn.model_selection import GridSearchCV
from sklearn2pmml.pipeline import PMMLPipeline

 

After importing the required libraries, we convert the iris_data.csv to a pandas dataframe and then create the features (X) as well as the goal (Y) vectors:

 

iris_df = pandas.read_csv("iris_data.csv")
iris_X = iris_df[iris_df.columns.difference(["class"])]
iris_y = iris_df["class"]

 

To best tune the random forest, we will use the GridSearchCSV and cross-validation. We want to test what parameters have the best validation metrics and for this, we will use a utility function that will print the results:

 

def print_results(results):
    print('BEST PARAMS: {}\n'.format(results.best_params_))
    means = results.cv_results_['mean_test_score']
    stds = results.cv_results_['std_test_score']
    for mean, std, params in zip(means, stds, results.cv_results_['params']):
        print('{} (+/-{}) for {}'.format(round(mean, 3), round(std * 2, 3), params))

 

We create the random forest model and train it with different numbers of estimators and maximum depth. We will then call the previous function to compare the results for the different parameters:

 

rf = RandomForestClassifier()
parameters = {
    'n_estimators': [5, 50, 250],
    'max_depth': [2, 4, 8, 16, 32, None]

}
cv = GridSearchCV(rf, parameters, cv=5)
cv.fit(iris_X, iris_y)
print_results(cv)

 

To convert the model to a PMML file, we need to create a PMMLPipeline object, in which we pass the RandomForestClassifier with the tuning parameters we identified in the previous step (please note that in your case, the parameters can be different than in my example). You can check the sklearn2pmml  documentation  to see other examples for creating this PMMLPipeline object :

 

pipeline = PMMLPipeline([
                ("classifier", RandomForestClassifier(max_depth=4,n_estimators=5))

])
pipeline.fit(iris_X, iris_y)

 

Then we perform the export:

 

sklearn2pmml(pipeline, "RF_Iris.pmml", with_repr = True)

 

The model has now been exported as a PMML file in the same folder as the Jupyter Notebook file and we can upload it to ThingWorx Analytics.

 

Uploading and Exploring the PMML in Analytics

To upload and use the model for scoring, there are two steps that you need to do:

  1. First, the PMML file needs to be uploaded to a ThingWorx File Repository
  2. Then, go to your Analytics Results thing (the name should be YourAnalyticsGateway_ResultsThing) and execute the service UploadModelFromRepository. Here you will need to specify the repository name and path for your PMML file, as well as a name for your model (and optionally a description)

image

 

If everything goes well, the result of the service will be an id. You can save this id to a separate file because you will use it later on.

You can verify the status of this model and if it’s ready to use by executing the service GetDetails:

image

 

image

Assuming you want to use the PMML for scoring, but you were not the one to develop the model, maybe you don’t know what the expected inputs and the output of the model are. There are two services that can help you with this:

  • QueryInputFields – to verify the fields expected as input parameters for a scoring job

image

 

  • QueryOutputFields – to verify the expected output of the model

image

The resultType input parameter can be either MODELS or CLUSTERS, depending on the type of model, 

 

Using the PMML for Scoring

With all this information at hand, we are now ready to use this PMML for real-time scoring. In a Thing of your choice, define a service to test out the scoring for the PMML we have just uploaded.

Create a new service with an infotable as the output (don’t add a datashape). The input data for scoring will be hardcoded in the service, but you can also add it as service input parameters and pass them via a Mashup or from another source. The script will be as follows:

 

// Values: INFOTABLE dataShape: ""
let datasetRef = DataShapes["AnalyticsDatasetRef"].CreateValues();
// Values: INFOTABLE dataShape: ""
let data = DataShapes["IrisData"].CreateValues();
data.AddRow({
    sepal_length: 2.7,
    sepal_width: 3.1,
    petal_length: 2.1,
    petal_width: 0.4
});
datasetRef.AddRow({ data: data});
// predictiveScores: INFOTABLE dataShape: ""
let result = Things["AnalyticsServer_PredictionThing"].RealtimeScore({
                modelUri: "results:/models/" + "97471e07-137a-41bb-9f29-f43f107bf9ca", //replace with your own id
                datasetRef: datasetRef /* INFOTABLE */,
});

 

Once you execute the service, the output should look like this (as we would have expected, according to the output fields in the PMML model):

image

 

As you have seen, it is easy to use a model built in Python in ThingWorx Analytics. Please note that you may use it only for scoring, and the model will not appear in Analytics Builder since you have created it on a different platform.

If you have any questions about this brief written tutorial, let me know.

Version history
Last update:
‎Jun 02, 2022 08:27 AM
Updated by:
Attachments