Skip to main content
1-Visitor
December 3, 2021
Solved

How to interpret Predictive Scoring & Important Field Weights

  • December 3, 2021
  • 1 reply
  • 3699 views

Hello how are you community, I hope very well.


I want to share with you a question about Thingworx Analytics, specifically about how to use the Predictive Scoring option available in Analytics Builder and interpret its results. I finished the learning path on "Vehicle Predictive Pre-Failure Detection with ThingWorx Platform", which helped me to understand several concepts about Thingworx Analytics, managing to generate predictions for values in "real time".

 

I would like to complement the predictions obtained with uncertainty probability or other practical information. Unfortunately, this guide does not cover topics that complement the predictions with information such as Predictive Scoring or confidence modeling. For my part I wanted to try and used the data and the model created to perform Predictive Scoring tests obtaining successful results but without knowing how to give a practical meaning to the Important Field Weights. On the other hand, according to the ThingWorx Analytics 9 documentation, the confidence models (which provide a probability of uncertainty about the prediction) are only available for continuous or ordinal data.

 

So I would like to know if there is extra information with which I can complement the predictions for the example "Vehicle Predictive Pre-Failure Detection with ThingWorx Platform", and how I could interpret the Important Field Weights. 

 

At the end of the text I attach an image with 2 predictive scoring results and Important Field Weights (Feature Weigth). 

 

Thank you for reading.

 

predictive_scoring_results.png

 

Best answer by sniculescu

Hello @unknown ,

 

In order to deliver practical value, I recommend you discuss with our Field team. They can advise once you have a concrete use case, or discuss sample use cases.

 

Regarding your questions:

 

1) You can check for model calibration by splitting your validation set predictions in bins: [0-0.1),[0.1-0.2), ..., [0.9-1]. Then, in each bin you compute the predicted (average predictions) and actual risk (percent failure from your data). Plot the predicted vs actual risk for each bin. If the model is calibrated as a probability, then your points will arrange somewhat close to the main diagonal. Note that even if the model is not calibrated, it can still be very successfully used for risk prediction, but the predictions cannot be interpreted as probabilities. In that case, the model automatically identifies the "optimal" threshold to transform the scores ("_mo" values) into failure predictions.

 

2) Typically, the requirements for the solution are use case dependent. From a data science perspective, you want to have enough high quality data to build accurate models. There is not a hard and fast number for the size of the dataset, but generally speaking, the more variables you want to model, the more data you need. Also, if failures are rare, you need to track the process / assets over a longer period of time to ensure enough failures are collected. Before building models you may want to perform some data cleanup (drop variables with too much missing data, fill missing data otherwise, check for outliers / incorrect values, etc).

 

Regards,

 

--Stefan

1 reply

17-Peridot
December 3, 2021

@unknown ,

 

Thank you for posting to the PTC Community.

 

Your questions are a bit advanced, I will do my best to assist you with your questions.

 

Have you had the opportunity to review our HelpCenter Documentation: Working With Predictive Models

 

We also have an older Community Post, which can be found here: PTC Community - Predictive Analytics

Regards,

 

Neel

 

1-Visitor
December 9, 2021

About complementary information I have found information that helps to validate, compare and refine the models, but I have not found so much complementary information about individual predictions.
I have read the documentation  finding several interesting parameters such as ROC, RSME, Pearson Correlation, Confution Matrix, etc which are useful to refine the model, but these parameters are oriented to validate, compare and refine the models, not individual predictions. .
As I mentioned before it would be very useful to deliver the prediction and complement it with for example the probability of uncertainty of this one.


I take this opportunity to ask you about Prescriptive Models with an example that I found very interesting. At the end of the post you will find an image in which the current probability of engine failure is indicated. Then actions are recommended on relevant variables of the process, which impact on failure probability reducing it significantly. That said, my questions are the following:

predictive_scoring.png

 


1) How could I set up a model of this nature, i.e. target variable indicating the probability of failure for 3 different parts?
2) What would be the nature of the target variable (I assume it is probability of failure/failure)? This probability could come from confidence model for a boolean data (failure - no failure)?3) Where in ThingWorx Analytics do I declare the actions to start building prescriptive models?

prescriptive_scoring.png

 

These questions could perhaps be continued in another thread.

12-Amethyst
December 13, 2021

Hello @unknown ,

 

In order to deliver practical value, I recommend you discuss with our Field team. They can advise once you have a concrete use case, or discuss sample use cases.

 

Regarding your questions:

 

1) You can check for model calibration by splitting your validation set predictions in bins: [0-0.1),[0.1-0.2), ..., [0.9-1]. Then, in each bin you compute the predicted (average predictions) and actual risk (percent failure from your data). Plot the predicted vs actual risk for each bin. If the model is calibrated as a probability, then your points will arrange somewhat close to the main diagonal. Note that even if the model is not calibrated, it can still be very successfully used for risk prediction, but the predictions cannot be interpreted as probabilities. In that case, the model automatically identifies the "optimal" threshold to transform the scores ("_mo" values) into failure predictions.

 

2) Typically, the requirements for the solution are use case dependent. From a data science perspective, you want to have enough high quality data to build accurate models. There is not a hard and fast number for the size of the dataset, but generally speaking, the more variables you want to model, the more data you need. Also, if failures are rare, you need to track the process / assets over a longer period of time to ensure enough failures are collected. Before building models you may want to perform some data cleanup (drop variables with too much missing data, fill missing data otherwise, check for outliers / incorrect values, etc).

 

Regards,

 

--Stefan