Hello how are you community, I hope very well.
I want to share with you a question about Thingworx Analytics, specifically about how to use the Predictive Scoring option available in Analytics Builder and interpret its results. I finished the learning path on "Vehicle Predictive Pre-Failure Detection with ThingWorx Platform", which helped me to understand several concepts about Thingworx Analytics, managing to generate predictions for values in "real time".
I would like to complement the predictions obtained with uncertainty probability or other practical information. Unfortunately, this guide does not cover topics that complement the predictions with information such as Predictive Scoring or confidence modeling. For my part I wanted to try and used the data and the model created to perform Predictive Scoring tests obtaining successful results but without knowing how to give a practical meaning to the Important Field Weights. On the other hand, according to the ThingWorx Analytics 9 documentation, the confidence models (which provide a probability of uncertainty about the prediction) are only available for continuous or ordinal data.
So I would like to know if there is extra information with which I can complement the predictions for the example "Vehicle Predictive Pre-Failure Detection with ThingWorx Platform", and how I could interpret the Important Field Weights.
At the end of the text I attach an image with 2 predictive scoring results and Important Field Weights (Feature Weigth).
Thank you for reading.
Solved! Go to Solution.
Hello @JimmyZamora ,
In order to deliver practical value, I recommend you discuss with our Field team. They can advise once you have a concrete use case, or discuss sample use cases.
Regarding your questions:
1) You can check for model calibration by splitting your validation set predictions in bins: [0-0.1),[0.1-0.2), ..., [0.9-1]. Then, in each bin you compute the predicted (average predictions) and actual risk (percent failure from your data). Plot the predicted vs actual risk for each bin. If the model is calibrated as a probability, then your points will arrange somewhat close to the main diagonal. Note that even if the model is not calibrated, it can still be very successfully used for risk prediction, but the predictions cannot be interpreted as probabilities. In that case, the model automatically identifies the "optimal" threshold to transform the scores ("_mo" values) into failure predictions.
2) Typically, the requirements for the solution are use case dependent. From a data science perspective, you want to have enough high quality data to build accurate models. There is not a hard and fast number for the size of the dataset, but generally speaking, the more variables you want to model, the more data you need. Also, if failures are rare, you need to track the process / assets over a longer period of time to ensure enough failures are collected. Before building models you may want to perform some data cleanup (drop variables with too much missing data, fill missing data otherwise, check for outliers / incorrect values, etc).
Regards,
--Stefan
Thank you for posting to the PTC Community.
Your questions are a bit advanced, I will do my best to assist you with your questions.
Have you had the opportunity to review our HelpCenter Documentation: Working With Predictive Models
We also have an older Community Post, which can be found here: PTC Community - Predictive Analytics
Regards,
Neel
Hi Neel, thanks for your time
My goal is to explore the scope/limitations of Thingworx Analytics and the technical requirements to transform the business problem to an IoT + Analytics project, and to be able to explain these points to potential customers. Ideally it would be great to have a mockup on Prescriptive Models to show the great potential of Thingworx Analytics, but Predictive Models are enough for now. Any information that complements the prediction, such as the likehood of the prediction, is appreciated. For this purpose I have been studying the field weights of the Predictive Scores.
About the field weights the following is explained:
"Important field weights - For each important field, a field weight represents the relative impact of that field on the target variable. If the field weights of all fields in a training data set could be summed for a record, the sum would equal 1. In the sample results shown below, the weights of the important fields in each row add up to something less than one."
The interpretation of the weights seems to be similar to Signals, where a value of the relevance of the signal with respect to the target signal is given, but in this case only for one record. Unlike Signals there is no indication of the measurement method e.g. Mutual Information for Signals.
It would be great if you could confirm or refute this hypothesis.
About complementary information I have found information that helps to validate, compare and refine the models, but I have not found so much complementary information about individual predictions.
I have read the documentation finding several interesting parameters such as ROC, RSME, Pearson Correlation, Confution Matrix, etc which are useful to refine the model, but these parameters are oriented to validate, compare and refine the models, not individual predictions. .
As I mentioned before it would be very useful to deliver the prediction and complement it with for example the probability of uncertainty of this one.
I take this opportunity to ask you about Prescriptive Models with an example that I found very interesting. At the end of the post you will find an image in which the current probability of engine failure is indicated. Then actions are recommended on relevant variables of the process, which impact on failure probability reducing it significantly. That said, my questions are the following:
1) How could I set up a model of this nature, i.e. target variable indicating the probability of failure for 3 different parts?
2) What would be the nature of the target variable (I assume it is probability of failure/failure)? This probability could come from confidence model for a boolean data (failure - no failure)?3) Where in ThingWorx Analytics do I declare the actions to start building prescriptive models?
These questions could perhaps be continued in another thread.
As these questions are beyond my ability to answer, I have requested one of our Analytics Field Team and use case experts to review your questions.
Either myself or they will respond to this thread.
Regards,
Neel
@JimmyZamora ,
I have private messaged you regarding this post, as the people I requested to review indicated a case and a meeting would be the best path forward as it is use case, process based, and enablement based questions.
Regards,
Neel
Hello @JimmyZamora ,
There are several questions in the discussion above. Let's look at them individually:
If you would like to delve deeper into the details of your use cases, we would be happy to connect you with our Field team who can advise how to tailor the approach to your specific needs.
Regards,
Stefan
Hi @sniculescu,
Thank you very much for the answers Stefan, knowing that important field weightings perform a sensitivity analysis of the prediction clarifies the big picture for me. But I still can't figure out how to deliver practical value with this information. I need to give it some thought, if you have an example that would be very helpful.
The information on the construction of the multi-failure model for an asset is very interesting and confirms a couple of theories. The main question on this topic is
1) How to be sure that the model is correctly calibrated and the "_mo" value corresponds effectively to the real probability?
2) What are the requirements for this?
I reiterate my thanks for your willingness to help and solve with clarity the doubts of the community. For my part, I will be building and refining predictive models. Once I get good datasets and have correctly posed the data science problems I will move on to prescriptive models, which look super interesting.
Best regards,
Jimmy
Hello @JimmyZamora ,
In order to deliver practical value, I recommend you discuss with our Field team. They can advise once you have a concrete use case, or discuss sample use cases.
Regarding your questions:
1) You can check for model calibration by splitting your validation set predictions in bins: [0-0.1),[0.1-0.2), ..., [0.9-1]. Then, in each bin you compute the predicted (average predictions) and actual risk (percent failure from your data). Plot the predicted vs actual risk for each bin. If the model is calibrated as a probability, then your points will arrange somewhat close to the main diagonal. Note that even if the model is not calibrated, it can still be very successfully used for risk prediction, but the predictions cannot be interpreted as probabilities. In that case, the model automatically identifies the "optimal" threshold to transform the scores ("_mo" values) into failure predictions.
2) Typically, the requirements for the solution are use case dependent. From a data science perspective, you want to have enough high quality data to build accurate models. There is not a hard and fast number for the size of the dataset, but generally speaking, the more variables you want to model, the more data you need. Also, if failures are rare, you need to track the process / assets over a longer period of time to ensure enough failures are collected. Before building models you may want to perform some data cleanup (drop variables with too much missing data, fill missing data otherwise, check for outliers / incorrect values, etc).
Regards,
--Stefan