In ThingWorx Analytics, you have the possibility to use an external model for scoring. In this written tutorial, I would like to provide an overview of how you can use a model developed in Python, using the scikit-learn library in ThingWorx Analytics. The provided attachment contains an archive with the following files: iris_data.csv: A dataset for pattern recognition that has a categorical goal. You can click here to read more about this dataset TestRFToPmml.ipynb: A Jupyter notebook file with the source code for the Python model as well as the steps to export it to PMML RF_Iris.pmml: The PMML file with the model that you can directly upload in Analytics without going through the steps of training the model in Python The tutorial assumes you already have some knowledge of ThingWorx and ThingWorx Analytics. Also, if you plan to run the Python code and train the model yourself, you need to have Jupyter notebook installed (I used the one from the Anaconda distribution). For demonstration purposes, I have created a very simple random forest model in Python. To convert the model to PMML, I have used the sklearn2pmml library. Because ThingWorx Analytics supports PMML format 4.3, you need to install sklearn2pmml version 0.56.2 (the highest version that supports PMML 4.3). To read more about this library, please click here Furthermore, to use your model with the older version of the sklearn2pmml, I have installed scikit-learn version 0.23.2. You will find the commands to install the two libraries in the first two cells of the notebook. Code Walkthrough The first step is to import the required libraries (please note that pandas library is also required to transform the .csv to a Dataframe object): import pandas
from sklearn.ensemble import RandomForestClassifier
from sklearn2pmml import sklearn2pmml
from sklearn.model_selection import GridSearchCV
from sklearn2pmml.pipeline import PMMLPipeline After importing the required libraries, we convert the iris_data.csv to a pandas dataframe and then create the features (X) as well as the goal (Y) vectors: iris_df = pandas.read_csv("iris_data.csv")
iris_X = iris_df[iris_df.columns.difference(["class"])]
iris_y = iris_df["class"] To best tune the random forest, we will use the GridSearchCSV and cross-validation. We want to test what parameters have the best validation metrics and for this, we will use a utility function that will print the results: def print_results(results):
print('BEST PARAMS: {}\n'.format(results.best_params_))
means = results.cv_results_['mean_test_score']
stds = results.cv_results_['std_test_score']
for mean, std, params in zip(means, stds, results.cv_results_['params']):
print('{} (+/-{}) for {}'.format(round(mean, 3), round(std * 2, 3), params)) We create the random forest model and train it with different numbers of estimators and maximum depth. We will then call the previous function to compare the results for the different parameters: rf = RandomForestClassifier()
parameters = {
'n_estimators': [5, 50, 250],
'max_depth': [2, 4, 8, 16, 32, None]
}
cv = GridSearchCV(rf, parameters, cv=5)
cv.fit(iris_X, iris_y)
print_results(cv) To convert the model to a PMML file, we need to create a PMMLPipeline object, in which we pass the RandomForestClassifier with the tuning parameters we identified in the previous step (please note that in your case, the parameters can be different than in my example). You can check the sklearn2pmml documentation to see other examples for creating this PMMLPipeline object : pipeline = PMMLPipeline([
("classifier", RandomForestClassifier(max_depth=4,n_estimators=5))
])
pipeline.fit(iris_X, iris_y) Then we perform the export: sklearn2pmml(pipeline, "RF_Iris.pmml", with_repr = True) The model has now been exported as a PMML file in the same folder as the Jupyter Notebook file and we can upload it to ThingWorx Analytics. Uploading and Exploring the PMML in Analytics To upload and use the model for scoring, there are two steps that you need to do: First, the PMML file needs to be uploaded to a ThingWorx File Repository Then, go to your Analytics Results thing (the name should be YourAnalyticsGateway_ResultsThing) and execute the service UploadModelFromRepository. Here you will need to specify the repository name and path for your PMML file, as well as a name for your model (and optionally a description) If everything goes well, the result of the service will be an id. You can save this id to a separate file because you will use it later on. You can verify the status of this model and if it’s ready to use by executing the service GetDetails: Assuming you want to use the PMML for scoring, but you were not the one to develop the model, maybe you don’t know what the expected inputs and the output of the model are. There are two services that can help you with this: QueryInputFields – to verify the fields expected as input parameters for a scoring job QueryOutputFields – to verify the expected output of the model The resultType input parameter can be either MODELS or CLUSTERS, depending on the type of model, Using the PMML for Scoring With all this information at hand, we are now ready to use this PMML for real-time scoring. In a Thing of your choice, define a service to test out the scoring for the PMML we have just uploaded. Create a new service with an infotable as the output (don’t add a datashape). The input data for scoring will be hardcoded in the service, but you can also add it as service input parameters and pass them via a Mashup or from another source. The script will be as follows: // Values: INFOTABLE dataShape: ""
let datasetRef = DataShapes["AnalyticsDatasetRef"].CreateValues();
// Values: INFOTABLE dataShape: ""
let data = DataShapes["IrisData"].CreateValues();
data.AddRow({
sepal_length: 2.7,
sepal_width: 3.1,
petal_length: 2.1,
petal_width: 0.4
});
datasetRef.AddRow({ data: data});
// predictiveScores: INFOTABLE dataShape: ""
let result = Things["AnalyticsServer_PredictionThing"].RealtimeScore({
modelUri: "results:/models/" + "97471e07-137a-41bb-9f29-f43f107bf9ca", //replace with your own id
datasetRef: datasetRef /* INFOTABLE */,
}); Once you execute the service, the output should look like this (as we would have expected, according to the output fields in the PMML model): As you have seen, it is easy to use a model built in Python in ThingWorx Analytics. Please note that you may use it only for scoring, and the model will not appear in Analytics Builder since you have created it on a different platform. If you have any questions about this brief written tutorial, let me know.
View full tip