Solved: Generating dataset for predictive model

drieder · ‎Apr 30, 2018

Hello everyone,

i would like to create predictive model for a simple thing with a few properties i defined. These properties are already logged and the thing has a value stream assigned.

My question is, how can i retrieve a Dataset (as CSV) and a Dataset Field Configuration (as JSON) from it? Apparently i need these to files in order to analyse my thing.

Or maybe you can tell me if there is anything wrong with my approach in the first place. I would also like to know how much "history" data you actually need to make good predictions.

Best Regards,

Dominik

cmorfin · ‎Apr 30, 2018

Hi Dominik

Creating the csv and json file will require some manual implementation from your side. This thread does give some hint on how to export to CSV.

There is some ongoing work to have an easier way to do this in future release.

I did not test but it might also be possible to build an info table with the logged data and pass it to the data infotable of datasetRef in the training Thing CreateJob() service. Similarly create a info table with the metada and pass it to the metadata infotable of datasetRef.
I have done that with scoring so I am thinking it could work with training too, though I do not know how well this would scale. For large amount of data the csv option is probably better.

Regarding how much data you need for training, there is not a simple answer.
This depends on the number of fields and the correlation between them.

For example if you try to predict z, which is related to x and y with z=x+2y , then 5 to 10 points would be enough.

If you have a lot of fields with no obvious relations, then you do need a much larger number of fields to establish . https://www.ptc.com/en/support/article?n=CS255070 indicate at least 30 records per field, but with machine learning the more records the better.

Hope this helps

Christophe

View solution in original post

cmorfin · ‎Apr 30, 2018

Hi Dominik

Creating the csv and json file will require some manual implementation from your side. This thread does give some hint on how to export to CSV.

There is some ongoing work to have an easier way to do this in future release.

I did not test but it might also be possible to build an info table with the logged data and pass it to the data infotable of datasetRef in the training Thing CreateJob() service. Similarly create a info table with the metada and pass it to the metadata infotable of datasetRef.
I have done that with scoring so I am thinking it could work with training too, though I do not know how well this would scale. For large amount of data the csv option is probably better.

Regarding how much data you need for training, there is not a simple answer.
This depends on the number of fields and the correlation between them.

For example if you try to predict z, which is related to x and y with z=x+2y , then 5 to 10 points would be enough.

If you have a lot of fields with no obvious relations, then you do need a much larger number of fields to establish . https://www.ptc.com/en/support/article?n=CS255070 indicate at least 30 records per field, but with machine learning the more records the better.

Hope this helps

Christophe

drieder · ‎Apr 30, 2018

Hi Christophe,

thank you for the detailed hints, i will try out both approaches.

Best Regards,

Dominik

drieder · ‎Apr 30, 2018

Thank you again, i was able to solve this by using the CSV Parser extension. There i simply used the "WriteCSVFile" service from the CSVParserFunctions Service Collection.

For the data parameter i used the MyEntity.QueryPropertyHistory (I had a value Stream assigned to it and the necessary properties logged). This saved the corresponding CVS File in the ThingworxStorage/repository/MyRepository Folder.

I guess i wont have troubles using this cvs file as a dataset for TWx Analytics.