Solved: Re: create a predictive score including significan...

alessandro96 · ‎Feb 27, 2019

Hi,

I created a model with a boolean variable goal. The name of the is Class.

After i would like to create a predictive score with additional data that includes the fields that is classified as the most important by Signal.

I did not understand the reason why by creating a score including fields in a first case and excluding them in another the values of model output does not change. I used the full range technique in all case.

The fields name are: bj_000 and ck_000.

An row example

case with 2 important fields includes:

bj_000

ck_000

class

class_mo

Feature_1_Name

Feature_1_Weight

Feature_2_Name

Feature_2_Weight

errorMessage

0.3828125

1.0

false

0.33333333333333337

bb_000

0.12

az_000

0.08000000000000002

case without important fields:

class

class_mo

errorMessage

false

0.33333333333333337

Does anyone have any suggestions?

cmorfin · ‎Mar 04, 2019

Hi @alessandro96

Thank you to clarifying.

Here are some answers fro you:

- the class_mo is an internal field which you should not worry too much about. The value that you want is the class field. For Boolean goal the _mo field is roughly a value of the likelihood to be false or true. The closer _mo is to 0, the more likely it is to be false. The closer to 1, the more likely it is to be true.
Thingworx Analytics uses a threshold to switch between false and true (it is not always 0.5, but usually close)

- Using different learners will indeed change the score values, this is expected. All learners are different algorithm that try to predict the new score. This is based on statistics, so each algorithm will reach a different value.
This is why a big part of machine learning model creation is about trial to check which algorithm give the best results based on a specific dataset.
As mentioned earlier, using an ensemble technique (like elite average) allow to make it easier because it will average the output of the best algorithm selected. But you can have cases where a single learner will bring better result, so this is why testing is required.

- To score new data, if you do it through Builder UI, the easiest is to add a new column beforehand in your dataset to use as a filter. For the original data, you can set a value of training.

Then when you add new data, you can set it to scoring04032019 and score only those filtered rows
This is explained at https://community.ptc.com/t5/IoT-Tech-Tips/How-to-score-new-data-with-ThingWorx-Analytics-8-3-x/m-p/571051 (check time 2:05 in the video).

If you use the services of the predictionThing, instead of Builder, then you can specify the new scoring dataset directly to the score job, without creating filters.

Hope this helps

Kind regards

Christophe

View solution in original post

cmorfin · ‎Feb 28, 2019

Hi @alessandro96

When you perform a scoring with a request for important feature, this does not change the scoring result (the prediction). What it does is , in addition to the actual score, output the x number of most important feature that impact that score. So if you ask for 2 important features, for each record you will have the actual score and the 2 feature with their weight that impacted the most the prediction for that specific record.
In the example you submitted, the 2 most important features for this specific record are bb_000 and az_000.

See also Causal scoring at https://support.ptc.com/help/thingworx_hc/thingworx_analytics_8/#page/analytics%2Ftwxa-thingpredictor.html

Hope this helps

Kind regards

Christophe

alessandro96 · ‎Feb 28, 2019

Thanks, so if I understand correctly to make a predictive score that is influenced by this two fields i have to use a filter excluding the others. is it correct?

If i would like do this without excluding the others, how can i do?

cmorfin · ‎Feb 28, 2019

Hi @alessandro96

I am not understanding well what you are trying to do.

Why do you want to filter out those fields ?

Can you describe what you are trying to achieve so I can understand better ?

Thank you

Kind regards

Christophe

alessandro96 · ‎Feb 28, 2019

I would like to use the signal output to get a predicted score based on this field to see if the result is better than the result with all field.

Another question is with a model with a boolean goal variable created by a dataset of 60000 rows and 171 fields what is the better learner between Linear Regression SVM and NN and why?

cmorfin · ‎Feb 28, 2019

Hi @alessandro96

If you want to see the impact of the field with highest value in signal you will need to create 2 prediction models.

For model 1, you can exclude the feature with low value from Signal.

For model 2 you include all the features.

You can then score the same row against each model and see the difference.
The Important Feature can be left to 0 as this doe snot impact the scoring result - it only tells you what was having the most impact to find a specific prediction.

Regarding the model to choose (Linear Regression, SVM, Neural Net ...), there is no one answer.

It all depends on the data that you have. The fact that your goal is of a specific type is not enough to decide what algorithm/learner to use.

If you want to have the answer for your specific dataset, you can simply create a model with a single learner for each of Linear regression, SVM, Neural Net and check the one that give the best prediction.

Another alternative is to actually use an ensemble technique with elite average, which will let ThingWorx Analytics pick up the best results out of the chosen learners.

Hope this helps

Kind rgerads

Christophe

alessandro96 · ‎Mar 01, 2019

Thanks, I thought I had finished but after that i created the model and the score, i see many negative value in the score file.

you know what does means?

cmorfin · ‎Mar 01, 2019

Hi @alessandro96

I don't have enough information to answer your question.

Could you please upload your dataset csv + json files, as well as the score output with the negative value that you receive.

Also what exact version of ThingWorx Analytics are you using ?

I would also need the details of the model configuration so I can create a model similar to yours

I should then be able to look into this.

Thanks

Christophe

alessandro96 · ‎Mar 02, 2019

I use Analytics server 8.3.

To create the model i used only Linear regression in solist mode.

I create 2 models with different fields but in all of these there are any negative parameter.

First i use all field, second only the fields of first population of profile.

cmorfin · ‎Mar 04, 2019

Hi @alessandro96

Thank you for the data.

When you say you see some negative value, do you see this for the feature named class_mo ? or are you referring to something else ?

Thanks

Christophe

alessandro96 · ‎Mar 04, 2019

Thanks you for the help.

Yes i see this for the feature named class_mo.

But i don't know how work the model to create the score.

I don't know the why with different Learners with the same dataset the dimension of the score is different

and the why i don't create a score with only a dataset test, i can only upload this in addition on dataset used to create a model or there is a solution that allows you to do this?.

cmorfin · ‎Mar 04, 2019

Hi @alessandro96

Thank you to clarifying.

Here are some answers fro you:

- the class_mo is an internal field which you should not worry too much about. The value that you want is the class field. For Boolean goal the _mo field is roughly a value of the likelihood to be false or true. The closer _mo is to 0, the more likely it is to be false. The closer to 1, the more likely it is to be true.
Thingworx Analytics uses a threshold to switch between false and true (it is not always 0.5, but usually close)

- Using different learners will indeed change the score values, this is expected. All learners are different algorithm that try to predict the new score. This is based on statistics, so each algorithm will reach a different value.
This is why a big part of machine learning model creation is about trial to check which algorithm give the best results based on a specific dataset.
As mentioned earlier, using an ensemble technique (like elite average) allow to make it easier because it will average the output of the best algorithm selected. But you can have cases where a single learner will bring better result, so this is why testing is required.

- To score new data, if you do it through Builder UI, the easiest is to add a new column beforehand in your dataset to use as a filter. For the original data, you can set a value of training.

Then when you add new data, you can set it to scoring04032019 and score only those filtered rows
This is explained at https://community.ptc.com/t5/IoT-Tech-Tips/How-to-score-new-data-with-ThingWorx-Analytics-8-3-x/m-p/571051 (check time 2:05 in the video).

If you use the services of the predictionThing, instead of Builder, then you can specify the new scoring dataset directly to the score job, without creating filters.

Hope this helps

Kind regards

Christophe

create a predictive score including significant fields

create a predictive score including significant fields