How to interpret Predictive Scoring & Important Field Weights

Question

Hello how are you community, I hope very well.
I want to share with you a question about Thingworx Analytics, specifically about how to use the Predictive Scoring option available in Analytics Builder and interpret its results. I finished the learning path on "Vehicle Predictive Pre-Failure Detection with ThingWorx Platform", which helped me to understand several concepts about Thingworx Analytics, managing to generate predictions for values in "real time".
 
I would like to complement the predictions obtained with uncertainty probability or other practical information. Unfortunately, this guide does not cover topics that complement the predictions with information such as Predictive Scoring or confidence modeling. For my part I wanted to try and used the data and the model created to perform Predictive Scoring tests obtaining successful results but without knowing how to give a practical meaning to the Important Field Weights. On the other hand, according to the ThingWorx Analytics 9 documentation, the confidence models (which provide a probability of uncertainty about the prediction) are only available for continuous or ordinal data.
 
So I would like to know if there is extra information with which I can complement the predictions for the example "Vehicle Predictive Pre-Failure Detection with ThingWorx Platform", and how I could interpret the Important Field Weights. 
 
At the end of the text I attach an image with 2 predictive scoring results and Important Field Weights (Feature Weigth). 
 
Thank you for reading.

sniculescu · Accepted Answer

Hello @unknown ,

In order to deliver practical value, I recommend you discuss with our Field team. They can advise once you have a concrete use case, or discuss sample use cases.

Regarding your questions:

1) You can check for model calibration by splitting your validation set predictions in bins: [0-0.1),[0.1-0.2), ..., [0.9-1]. Then, in each bin you compute the predicted (average predictions) and actual risk (percent failure from your data). Plot the predicted vs actual risk for each bin. If the model is calibrated as a probability, then your points will arrange somewhat close to the main diagonal. Note that even if the model is not calibrated, it can still be very successfully used for risk prediction, but the predictions cannot be interpreted as probabilities. In that case, the model automatically identifies the "optimal" threshold to transform the scores ("_mo" values) into failure predictions.

2) Typically, the requirements for the solution are use case dependent. From a data science perspective, you want to have enough high quality data to build accurate models. There is not a hard and fast number for the size of the dataset, but generally speaking, the more variables you want to model, the more data you need. Also, if failures are rare, you need to track the process / assets over a longer period of time to ensure enough failures are collected. Before building models you may want to perform some data cleanup (drop variables with too much missing data, fill missing data otherwise, check for outliers / incorrect values, etc).

Regards,

--Stefan

nsampat · Answer

@unknown ,

Thank you for posting to the PTC Community.

Your questions are a bit advanced, I will do my best to assist you with your questions.

Have you had the opportunity to review our HelpCenter Documentation: Working With Predictive Models

We also have an older Community Post, which can be found here: PTC Community - Predictive Analytics

Regards,

Neel

Sign up

Please use your PTC eSupport account.

Welcome to the PTC Community

Please use your PTC eSupport account.

Scanning file for viruses.

This file cannot be downloaded