cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - Did you get called away in the middle of writing a post? Don't worry you can find your unfinished post later in the Drafts section of your profile page. X

IoT Tips

Sort by:
The following videos are provided to help users get started with ThingWorx: ThingWorx Installation Installing ThingWorx (Neo4j) in Windows ThingWorx PostgreSQL Setup for Windows ThingWorx PostgreSQL for RHEL ThingWorx Data Storage Introduction to Streams Introduction to Value Streams Introduction to DataTables Introduction to InfoTables ThingWorx Concepts & Functionality Introduction to Media Entities Using State Formatting in a Mashup Configuring Properties ThingWorx REST API REST API (Part 1) REST API (Part 2) ThingWorx Edge SDK Configuring File Transfer with the .NET SDK ThingWorx Analytics *new* Getting Started with ThingWorx Analytics Part 1 Getting Started with ThingWorx Analytics Part 2 Installing ThingWorx Analytics Builder Part 1 of 3 Installing ThingWorx Analytics Builder Part 2 of 3 Installing ThingWorx Analytics Builder Part 3 of 3 Creating Signals in the Analytics Builder How to Access the ThingWorx Analytics Interactive API Guide ThingWorx Widgets How to Create and Configure the Auto Refresh Widget How to Create and Define a Blog Widget How to Create and Configure a Button Widget How to Use the Divider and Shape Widgets How to Create and Configure a Chart Widget How to Use a Contained Mashup How to Use the Data Filter Widget How to Use an Expression Widget How to Create and Configure a Gauge Widget How to Create and Configure a Checkbox Widget How to Use a Contained Mashup Widget How to Use a Data Export Widget How to Use the DateTime Picker Widget How to Use the Editable Grid Widget Using Fieldset and Panel Widgets How to Use the File Upload Widget How to Use the Folding Panel Widget How to Use the Google Location Picker How to Use the Google Map Widget How to Use a Grid Widget How to Use an HTML TextArea Widget How to Use the List Widget How to Use a Label Widget How to Use the Layout Widget How to Use the LED Display Widget How to Use the List Widget How to Use the Masked Textbox Widget Navigation in ThingWorx: Using Menus, the Navigation Widget, Link Widget, and Contained Mashups How to Use the Numeric Entry Widget How to Use the Pie Chart Widget How to Use the Property Display Widget How to Use the Radio Button Widget How to Use the Repeater Widget How to Use the Slider Widget How to Use the SQUEAL Search Widget How to Use the Responsive Tab Widget How to Use the Tag Cloud Widget How to Use the Tag Picker Widget How to Use the TextArea and TextBox Widgets How to Use the Time Selector Widget How to Use the Tree Widget How to Use the Value Display Widget How to Use the Web Frame Widget How to Create and Define a Wiki How to Use the XY Chart Quick note: Thread will be updated with more videos as they are added.
View full tip
Anomaly Detection (also known as Outlier Detection) is a set of techniques that identify unusual occurrences in data. The premise is that such occurrences may be early indicators of future negative events (e.g. failure of assets or production lines).  Data Science algorithms for Anomaly Detection include both Supervised and Unsupervised methods. In Unsupervised Anomaly Detection, the algorithms make the assumption that most of the data points are "normal" (e.g. normal operation of the asset) and are looking for data points that are most dissimilar to the remainder of the dataset. Supervised Anomaly Detection requires a labeled set of Anomalies, in which case predictive algorithms can be applied directly on this data.   Thingworx employs a number of algorithms in support of Anomaly Detection: Simple threshold alerts. These are easy to setup on Thing properties but require a domain expert to provide such thresholds. Then, the alert will automatically fire when the value of the monitored property goes outside the predefined range of values, often seen as the "bad" side of the threshold. Statistical Process Control (SPC). This can be implemented using Thingworx Analytics Property Transforms. Most companies use a subset of SPC charts and rules to monitor production processes. Examples include the X Bar and R charts, as well as the Western Electric rules (e.g. one point outside the average +/- three sigma range). Explainable and widely accepted, SPC can also provide an earlier warning system compared to simple threshold alerts, in that it captures more complex patterns. Clustering. Using Thingworx Analytics, one can build an optimal clustering for the available data points. Under the assumption that data is representative of mostly normal operation and that there is not a significant pre-defined pattern of anomalies that form their own cluster, one can identify outliers by looking at the distance between points and their corresponding cluster centers. Points that are very far from their corresponding cluster center can be labeled as anomalies. Semi-supervised Anomaly Alerting (formerly known as ThingWatcher). This functionality identifies single property time series behavior that is statistically different than what was seen in a finite window of “known normal operation”. As such, it does not identify a “bad” event, or even a precursor to a “bad” event.  Rather, it points the end user to further investigate a situation which may lead to a “bad” event. Anomaly Alerts can be easily setup like any other Alert on a Thing property. Multiple Anomaly Alerts can be setup on the same or different properties of a Thing. Behind the scenes, the platform builds a time series neural network model for the known normal operation data, which is then applied to incoming data, and, if the errors are significantly different than those on known normal operation over a period of time, then an Anomaly Alert is produced. The techniques mentioned above are either unsupervised or semi-supervised. If the dataset contains labeled anomalies (e.g. asset faults, or suspicious patterns) then supervised predictive techniques (such as regression, decision trees, neural nets, or ensemble methods) are available to model the relationships between such anomalies (dependent variables) and various variables of interest (independent variables). These models can then be employed to monitor assets or production lines for upcoming anomalies. In many real-world use cases, anomalies are relatively rare; care needs to be taken when building such predictive models. Techniques such as up-sampling can prove beneficial in such situations.   What constitutes an Anomaly depends on the observed data and the current context. If only few data points are initially available, then it is possible that a lot of future data is predicted as an Anomaly, despite being normal operation. Also, in terms of context, if an Anomaly Detection is trained on a connected product in the Winter, it is likely to say all Summer operation is anomalous. This can be tackled by having multiple anomaly detection alerts implemented, one for each different context of operation (e.g. season, recipe being manufactured, operation done by a robot).   Another consideration is lead time vs explainability. For example, when a threshold alert fires, it is obvious why, but it may not be early enough to take action. As more advanced methods are employed, more complex patterns can be captured, hence more lead time, but typically at the expense of explainability. For example, semi supervised Anomaly Alerting (formerly known as ThingWatcher) uses time windows, aggregations, and derivatives of up to the third order, resulting in significantly less explainability when an Anomaly is presented to the end user.   Choosing the appropriate Anomaly Detection technique is use case dependent, balancing the desired lead time and explainability. If historical data on failures/anomalies is not available, a good place to start is Statistical Process Control, as it provides a balanced approach between the two dimensions, in addition to being already in use across many manufacturing companies.
View full tip
Predicting time to failure (TTF) or remaining useful life (RUL) is a common need in IIOT world. We are looking here at some  ways to implement it. We are going to use one of the Nasa dataset publicly available that simulates the Turbofan engine degradation (https://c3.nasa.gov/dashlink/resources/139/) . The original dataset has got 26 features as below Column 1 – asset id Column 2 – cycle/time of sensor data collection Column 3- 5 – operational setting Column 6-26 – sensor measurement In the training dataset the sensor measurement ends when the failure occurs.     Data Collection Since the prediction model is based on historic data, the data collection is a critical point. In some cases the data would have been already collected form the past and you need to make the best out of it. See the Data preparation chapter below. In situation where you are collecting data, a few points are good to keep in mind, some may or may not apply depending on the type of data to be collected. More frequent (higher frequency) collection is usually better, especially for electronic measure. In situation where one or more specific sensor values are known to impact the TTF, it is good to take measure at different values of this sensor until the failure without artificially modifying the values. For example, for a light bulb with normal working voltage of 1.5V, it is good to take some measure at let’s say 1V, 1.5V , 2V , 3V and 4V. But each time run till the failure. Do not start at 1.5V and switch to 4V after 1h. This would compromise what the model can learn. More variation is better as it helps the prediction model to generalize. In the same example of voltage it is best to collect data for 1V, 1.5V , 2V , 3V and 4V rather than just 1.5V which would be the normal running condition. This also depends on the use case, for example if we know for sure that voltage will always be between 1.45V and 1.55V, then we could focus only on data collection in this range. Once the failure is reached, stop collecting data. We are indeed not interested in what happens after the failure. Collecting data after the failure will also lead to lower prediction model accuracy. Each failure run should be a separate cycle in the dataset. In other word from a metadata stand point, each failure run should be represented by a different ENTITY_ID. TTF business need Before going into data preparation and model creation we need to understand what information is important in term of TTF prediction for our business need. There are several ways to conceive the TTF, for example: Exact time value when failure might occur This is probably going to be the most challenging to predict However one should consider if it is really necessary. Indeed do we need to know that a failure might occur in 12 min as opposed to 14 min ? Very often knowing that the time to failures is less than X min, is what is important, not the exact time. So the following options are often more appropriate. Threshold For some application knowing that a critical threshold is reached is all that is needed. In this case a Boolean goal, for example lessThan30min or healthy with yes/no values, can be used This is usually much easier than the exact value above. Range For other applications we may need to have a bit more insight and try to predict some ranges, for example: lessThan30min, 30to60min, 60to90min and moreThan90min In this case we will define an ordinal goal The caveat here is that currently ThingWorx Analytics Builder does not support ordinal goal, though ThingWorx Analytics Server does support it. So it only means that the model creation needs to be done through the API. This is the option we will take with the NASA dataset. The picture below shows the 3 different types of TTF listed above   Data Preparation   General Feature engineering Data Preparation is always a very important step for any machine learning work. It is important to present the data in the best suitable way for the algorithms to give the best results. There are a lot of practices that can be used but beyond the scope of this post. The Feature Engineering  post gives some starting point on this. There are also a lot of resources available on the Internet to get started, though the use of a data scientist may be necessary. As an example, in the original NASA dataset we can see that a few features have a constant value therefore there are unlikely to impact the prediction and will be removed. This will allow to free computational resources and prevent confusion in the model. Sensor data resampling The data sampling across the different sensor should be uniform. In a real case scenario we may though have sensors data collected at different time interval. Data transformation/extrapolation should  be done so that all sensor values are at the same frequency in the uploaded dataset. TTF feature Since we want to predict the time to failure, we do need a column in the dataset that represent this values for the data we have. In a real case scenario we obviously cannot measure the time to failure, but we usually have sensor data up to the point of failure, which we can use to derived the TTF values. This is what happens in the NASA dataset, the last cycle corresponds to the time when the failure occurred. We can therefore derive a new feature TTF in this dataset. This will start at 0 for the last cycle when failure occurred, and will be incremented by 1 up to the very first measurement, as shown below:   Once this TTF column is defined, we may need to transform it further depending on the path we choose for TTF prediction, as described in the TTF business need chapter. In the case of the NASA dataset we are choosing a range TTF with values of more100, 50to100, 10to50 and less10 to represent the number of remaining cycles till the predicted failure. This is the information we need to predict in order to plan a suitable maintenance action. Our transformed TTF column look as below:     Once the data in csv is ready, we need to create the json file to represent the metadata. In the case of range TTF this will be defined as an ordinal goal as below (see attachment for the full matadata json file) {         "fieldName": "TTF",         "values": ["less10",                   "10to50",                   "50to100",                   "more100"],         "range": null,         "dataType": "STRING",         "opType": "ORDINAL",         "timeSamplingInterval": null,         "isStatic": false   }   Model creation Once the data is ready it can be uploaded into ThingWorx Analytics and work on the prediction model can start. ThingWorx Analytics is designed to make machine learning easy and accessible to non data scientists, so this steps will be easier than when using other solutions. However some trial and error are needed to refine the model which may also involve reworking the dataset. Important considerations: When dealing with Time to failure prediction, it is usually needed to unset the Use Goal History in the Advanced parameters of the model creation wizard. If using API, the equivalent is to set the virtualSensor parameter to true. Tests with Redundancy Filter enabled should be done as this has shown to give better results. In a first attempt it is a good idea to keep lookback Size to 0. This indicates to ThingWorx Analytics to find the best lookback size between 2, 4, 8 and 16. If you need a different value or know that a different value is better suited, you can change this value accordingly. However bear in mind the following: Larger lookback size will lead to less data being available to train, since more data are needed to predict one goal. Larger lookback do lead to significant memory increase – See https://www.ptc.com/en/support/article?n=CS294545     In the case of the NASA dataset, since we are using an ordinal goal, we need to execute it through API. This can be done through mashup and services (see How to work with ordinal and categorical data in ThingWorx Analytics ? for an example) for a more productive way. As a test the TrainingThing.CreateJob service can be called from the Composer directly, as shown below:       Once the model is created we can check some performance statistics in ThingWorx Analytics Builder or, in the case of ordinal goal, via the ValidationThing.RetrieveResults service. The parameter most relevant in the case of ordinal goal will be the confusion matrix. Here is the confusion matrix I get   Another validation is to compute some PVA (Predicted Vs Actual) results for some validation data. ThingWorx Analytics does validation automatically when using ThingWorx Analytics Builder and present some useful performance metrics and graph. In the case of ordinal goal, we can still get this automatic validation run (hence the above confusion matrix), but no PVA graph or data is available. This can be done manually if some data are kept aside and not passed to the training microservice. Once the model is completed, we can then score (using PredictionThing.RealTimeScore or BatchScore for ordinal goal, or Builder UI for other goal) this validation dataset and compare the prediction result with the actual value. here is one example:     Depending on the business case this model can be deemed acceptable or may need rework, such as change the range values, change learners’ parameters, modify dataset … There is certainly a fair amount of experimentation before creating the optimal model but hopefully this post does give some good starting points.   Resources:   Original Dataset attached as train_FD001-original.csv Transformed dataset attached as train_FD001-TTF-transformed.csv json metadata file for transformed dataset attached as train_FD001-ttford.json                    
View full tip
Analytics projects typically involve using the Analytics API rather than the Analytics Builder to accomplish different tasks. The attached documentation provides examples of code snippets that can be used to automate the most common analytics tasks on a project such as: Creating a dataset Training a Model Real time scoring predictive and prescriptive Retrieving the validation metrics for a model Appending additional data to a dataset Retraining the model The documentation also provides examples that are specific to time series datasets. The attached .zip file contains both the document as well as some entities that you need to import in ThingWorx to access the services provided in the examples. 
View full tip
I have created a mashup which allows you to easily use and test the Prescriptions functionality in Thingworx Analytics (TWA). This is where you choose 1 or more fields for optimization, and TWA tells you how to adjust those fields to get an optimal outcome.   The functionality is based on a public sample dataset for concrete mixtures, full details are included in the attached documentation.  
View full tip
In ThingWorx Analytics, you have the possibility to use an external model for scoring. In this written tutorial, I would like to provide an overview of how you can use a model developed in Python, using the scikit-learn library in ThingWorx Analytics. The provided attachment contains an archive with the following files: iris_data.csv: A dataset for pattern recognition that has a categorical goal. You can click here to read more about this dataset TestRFToPmml.ipynb: A Jupyter notebook file with the source code for the Python model as well as the steps to export it to PMML RF_Iris.pmml: The PMML file with the model that you can directly upload in Analytics without going through the steps of training the model in Python The tutorial assumes you already have some knowledge of ThingWorx and ThingWorx Analytics. Also, if you plan to run the Python code and train the model yourself, you need to have Jupyter notebook installed (I used the one from the Anaconda distribution). For demonstration purposes, I have created a very simple random forest model in Python. To convert the model to PMML, I have used the sklearn2pmml library. Because ThingWorx Analytics supports PMML format 4.3, you need to install sklearn2pmml version 0.56.2 (the highest version that supports PMML 4.3). To read more about this library, please click here Furthermore, to use your model with the older version of the sklearn2pmml, I have installed scikit-learn version 0.23.2.  You will find the commands to install the two libraries in the first two cells of the notebook.   Code Walkthrough The first step is to import the required libraries (please note that pandas library is also required to transform the .csv to a Dataframe object):   import pandas from sklearn.ensemble import RandomForestClassifier from sklearn2pmml import sklearn2pmml from sklearn.model_selection import GridSearchCV from sklearn2pmml.pipeline import PMMLPipeline   After importing the required libraries, we convert the iris_data.csv to a pandas dataframe and then create the features (X) as well as the goal (Y) vectors:   iris_df = pandas.read_csv("iris_data.csv") iris_X = iris_df[iris_df.columns.difference(["class"])] iris_y = iris_df["class"]   To best tune the random forest, we will use the GridSearchCSV and cross-validation. We want to test what parameters have the best validation metrics and for this, we will use a utility function that will print the results:   def print_results(results): print('BEST PARAMS: {}\n'.format(results.best_params_)) means = results.cv_results_['mean_test_score'] stds = results.cv_results_['std_test_score'] for mean, std, params in zip(means, stds, results.cv_results_['params']): print('{} (+/-{}) for {}'.format(round(mean, 3), round(std * 2, 3), params))   We create the random forest model and train it with different numbers of estimators and maximum depth. We will then call the previous function to compare the results for the different parameters:   rf = RandomForestClassifier() parameters = { 'n_estimators': [5, 50, 250], 'max_depth': [2, 4, 8, 16, 32, None] } cv = GridSearchCV(rf, parameters, cv=5) cv.fit(iris_X, iris_y) print_results(cv)   To convert the model to a PMML file, we need to create a PMMLPipeline object, in which we pass the RandomForestClassifier with the tuning parameters we identified in the previous step (please note that in your case, the parameters can be different than in my example). You can check the sklearn2pmml  documentation  to see other examples for creating this PMMLPipeline object :   pipeline = PMMLPipeline([ ("classifier", RandomForestClassifier(max_depth=4,n_estimators=5)) ]) pipeline.fit(iris_X, iris_y)   Then we perform the export:   sklearn2pmml(pipeline, "RF_Iris.pmml", with_repr = True)   The model has now been exported as a PMML file in the same folder as the Jupyter Notebook file and we can upload it to ThingWorx Analytics.   Uploading and Exploring the PMML in Analytics To upload and use the model for scoring, there are two steps that you need to do: First, the PMML file needs to be uploaded to a ThingWorx File Repository Then, go to your Analytics Results thing (the name should be YourAnalyticsGateway_ResultsThing) and execute the service UploadModelFromRepository. Here you will need to specify the repository name and path for your PMML file, as well as a name for your model (and optionally a description)   If everything goes well, the result of the service will be an id. You can save this id to a separate file because you will use it later on. You can verify the status of this model and if it’s ready to use by executing the service GetDetails:   Assuming you want to use the PMML for scoring, but you were not the one to develop the model, maybe you don’t know what the expected inputs and the output of the model are. There are two services that can help you with this: QueryInputFields – to verify the fields expected as input parameters for a scoring job   QueryOutputFields – to verify the expected output of the model The resultType input parameter can be either MODELS or CLUSTERS, depending on the type of model,    Using the PMML for Scoring With all this information at hand, we are now ready to use this PMML for real-time scoring. In a Thing of your choice, define a service to test out the scoring for the PMML we have just uploaded. Create a new service with an infotable as the output (don’t add a datashape). The input data for scoring will be hardcoded in the service, but you can also add it as service input parameters and pass them via a Mashup or from another source. The script will be as follows:   // Values: INFOTABLE dataShape: "" let datasetRef = DataShapes["AnalyticsDatasetRef"].CreateValues(); // Values: INFOTABLE dataShape: "" let data = DataShapes["IrisData"].CreateValues(); data.AddRow({ sepal_length: 2.7, sepal_width: 3.1, petal_length: 2.1, petal_width: 0.4 }); datasetRef.AddRow({ data: data}); // predictiveScores: INFOTABLE dataShape: "" let result = Things["AnalyticsServer_PredictionThing"].RealtimeScore({ modelUri: "results:/models/" + "97471e07-137a-41bb-9f29-f43f107bf9ca", //replace with your own id datasetRef: datasetRef /* INFOTABLE */, });   Once you execute the service, the output should look like this (as we would have expected, according to the output fields in the PMML model):   As you have seen, it is easy to use a model built in Python in ThingWorx Analytics. Please note that you may use it only for scoring, and the model will not appear in Analytics Builder since you have created it on a different platform. If you have any questions about this brief written tutorial, let me know.
View full tip
Users of ThingWorx Analytics (TWA) may choose to create a predictive model using TWA or import a predictive model that was created using other software. When importing into or exporting out of TWA, this predictive model must be in a PMML (Predictive Model Markup Language) version 4.3+ format. This post describes how to complete the import and export processes. Exporting: The user may create a model in two main ways inside of TWA: using the Builder user interface, or by using ‘Create Job’ service that exists the Training Thing. Whichever method is used, a model Job Id is created automatically by TWA for that model. It is this model Job Id that is used to identify the model inside of TWA, regardless of what is being done with that model.   If a model is trained using Builder, the user may highlight that model, click ‘Job Details’, and then copy the Job ID. This is done as follows:   Next, the user will navigate to Browse --> Things --> …TrainingThing. This is the Training Microservice inside of TWA where all the functionality involved with training a model exists. Within the …TrainingThing, the user will execute the ‘RetrieveModel’ service under Services. When executing the service, the user will paste the model Job ID (ex. 49704f1a-7fcd-4e38-ab53-84ef46517d0a) they copied earlier, and press ‘Execute’. The resulting text can then be highlighted and copied to Notepad or some other text editor, and saved as .pmml format (ex. ‘ModelExport.pmml’).   Importing Through Results Microservice: To import a model that has been saved in PMML 4.3+ format into TWA using the Results Microservice, the user will navigate to Manage --> Repositories (ex. AnalyticsUploadStorage) --> Actions --> Upload, and choose the PMML file. The user will then navigate to Browse --> Things --> …ResultsThing. This is the Results Microservice inside of TWA where all the functionality exists related to previously trained models. Within the …ResultsThing, the user will execute the ‘UploadModel’ service under Services. Alternatively, the user can upload the model from any repository using ‘UploadModelFromRepository” service.   To create a model from the uploaded PMML inside of TWA, the user will fill out the filePath and name then execute the service. Note: This model will not show up in Builder, as that would require model validation information that is not part of the imported PMML file.   The resulting Job Id can be used to make predictions, such as by using the …PredictionThing’s BatchScore or RealtimeScore services. At this point, the uploaded model acts the same way as if the model were created inside of that TWA environment.       Importing Through Analytics Manager: To import a model that has been saved in PMML 4.3+ format into TWA using the Analytics Manager, the user will navigate to Analytics --> Analytics Manager --> Analysis Models, and click the green “New” button. Next the user will choose the provider name (or create a new one by navigating to Analytics --> Analytics Manager --> Analysis Providers). The user will also check the box to “Upload Model”, and click the grey “Choose File” button to find the PMML file. Finally, the user will click the black “Upload” button, then the green “Save” button.     At this point, the model is uploaded into ThingWorx Analytics, and the user may progress through the subsequent steps to set up “Analysis Events” and “Analysis Jobs” that will be powered by the imported model.
View full tip
Here is a tutorial to explain the process of uploading a PMML file from an external system to Thingworx Analytics. The tutorial steps are explained in the attached PDF and all referenced files can be found in the attached ZIP.  
View full tip
  PLEASE NOTE DataConnect has now been deprecated and no longer in use and supported.   We are regularly asked in the community how to send data from ThingWorx platform to ThingWorx Analytics in order to perform some analytics computation. There are different ways to achieve this and it will depend on the business needs. If the analytics need is about anomaly detection, the best way forward is to use ThingWatcher without the use of ThingWorx Analytics server. The ThingWatcher Help Center is an excellent place to start, and a quick start up program can be found in this blog. If the requirement is to perform a full blown analytics computation, then sending data to ThingWorx Analytics is required. This can be achieved by Using ThingWorx DataConnect, and this is what this blog will cover Using custom implementation. I will be very pleased to get your feedback on your experience in implementing custom solution as this could give some good ideas to others too. In this blog we will use the example of a smart Tractor in ThingWorx where we collect data points on: Speed Distance covered since last tyre change Tyre pressure Amount of gum left on the tyre Tyre size. From an Analytics perspective the gum left on the tyre would be the goal we want to analyse in order to know when the tyre needs changing.   We are going to cover the following: Background Workflow DataConnect configuration ThingWorx Configuration Data Analysis Definition Configuration Data Analysis Definition Execution Demo files   Background For people not familiar with ThingWorx Analytics, it is important to know that ThingWorx Analytics only accepts a single datafile in a .csv format. Each columns of the .csv file represents a feature that may have an impact on the goal to analyse. For example, in the case of the tyre wear, the distance covered, the speed, the pressure and tyre size will be our features. The goal is also included as a column in this .csv file. So any solution sending data to ThingWorx Analytics will need to prepare such a .csv file. DataConnect will perform this activity, in addition to some transformation too.   Workflow   Decide on the properties of the Thing to be collected, that are relevant to the analysis. Create service(s) that collect those properties. Define a Data Analysis Definition (DAD) object in ThingWorx. The DAD uses a Data Shape to define each feature that is to be collected and sends them to ThingWorx Analytics. Part of the collection process requires the use of the services created in point 2. Upon execution, the DAD will create one skinny csv file per feature and send those skinny .csv files to DataConnect. In the case of our example the DAD will create a speed.csv, distance.csv, pressure.csv, gumleft.csv, tyresize.csv and id.csv. DataConnect processes those skinny csv files to create a final single .csv file that contains all these features. During the processing, DataConnect will perform some transformation and synchronisation of the different skinny .csv files. The resulting dataset csv file is sent to ThingWorx Analytics Server where it can then be used as any other dataset file.     DataConnect configuration   As seen in this workflow a ThingWorx server, DataConnect server and a ThingWorx Analytics server will need to be installed and configured. Thankfully, the installation of DataConnect is rather simple and well described in the ThingWorx DataConnect User’s guide. Below I have provided a sample of a working dataconnect.conf file for reference, as this is one place where syntax can cause a problem:   ThingWorx Configuration The platform Subsystem needs to be configured to point to the DataConnect server . This is done under SYSTEM > Subsystems > PlatformSubsystem:     DAD Configuration The most critical part of the process is to properly configure the DAD as this is what will dictate the format and values filled in the skinny csv files for the specific features. The first step is to create a data shape with as many fields as feature(s)/properties collected.   Note that one field must be defined as the primary key. This field is the one that uniquely identifies the Thing (more on this later). We can then create the DAD using this data shape as shown below   For each feature, a datasource needs to be defined to tell the DAD how to collect the values for the skinny csv files. This is where custom services are usually needed. Indeed, the Out Of The Box (OOTB) services, such as QueryNumberPropertyHistory, help to collect logged values but the id returned by those services is continuously incremented. This does not work for the DAD. The id returned by each services needs to be what uniquely identifies the Thing. It needs to be the same for all records for this Thing amongst the different skinny csv files. It is indeed this field that is then used by DataConnect to merge all the skinny files into one master dataset csv file. A custom service can make use of the OOTB services, however it will need to override the id value. For example the below service uses QueryNumberPropertyHistory to retrieve the logged values and timestamp but then overrides the id with the Thing’s name.     The returned values of the service then needs to be mapped in the DAD to indicate which output corresponds to the actual property’s value, the Thing id and the timestamp (if needed). This is done through the Edit Datasource window (by clicking on Add Datasource link or the Datasource itself if already defined in the Define Feature window).   On the left hand side, we define the origin of the datasource. Here we have selected the service GetHistory from the Thing Template smartTractor. On the right hand side, we define the mapping between the service’s output and the skinny .csv file columns. Circled in grey are the output from the service. Circled in green are what define the columns in the .csv file. A skinny csv file will have 1 to 3 columns, as follow: One column for the ID. Simply tick the radio button corresponding to the service output that represents the ID One column representing the value of the Thing property. This is indicated by selecting the link icon on the left hand side in front of the returned data which represent the value (in our example the output data from the service is named value) One column representing the Timestamp. This is only needed when a property is time dependant (for example, time series dataset). On the example the property is Distance, the distance covered by the tyre does depend on the time, however we would not have a timestamp for the TyreSize property as the wheel size will remain the same. How many columns should we have (and therefore how many output should our service has)? The .csv file representing the ID will have one column, and therefore the service collecting the ID returns only one output (Thing name in our smartTractor example – not shown here but is available in the download) Properties that are not time bound will have a csv file with 2 columns, one for the ID and one for the value of the property. Properties that are time bound will have 3 columns: 1 for the ID, one for the value and one for the timestamp. Therefore the service will have 3 outputs.   Additionally the input for the service may need to be configured, by clicking on the icon.   Once the datasources are configured, you should configure the Time Sampling Interval in the General Information tab. This sampling interval will be used by DataConnect to synchronize all the skinny csv files. See the Help Center for a good explanation on this.   DAD Execution Once the above configuration is done, the DAD can be executed to collect properties’ value already logged on the ThingWorx platform. Select Execution Settings in the DAD and enter the time range for property collection:   A dataset with the same name as the DAD is then created in DataConnect as well as in ThingWorx Analytics Server Dataconnect:     ThingWorx Analytics:   The dataset can then be processed as any other dataset inside ThingWorx Analytics.   Demo files   For convenience I have also attached a ThingWorx entities export that can be imported into a ThingWorx platform for you to take a closer look at the setup discussed in this blog. Attached is also a small simulator to populate the properties of the Tractor_1 Thing. The usage is : java -jar TWXTyreSimulatorClient.jar  hostname_or_IP port AppKey For example: java -jar TWXTyreSimulatorClient.jar 192.168.56.106 8080 d82510b7-5538-449c-af13-0bb60e01a129   Again feel free to share your experience in the comments below as they will be very interesting for all of us. Thank you
View full tip
In this video we cover: a short introduction of Thingworx Analytics Builder The import of the Thingworx Analytics Builder extension   This video applies to ThingWorx Analytics 52.1 till 8.1   Updated Link for access to this video:  Installing Thingworx Analytics Builder:  Part 1 of 3
View full tip
Best Practices in Data Preparation for ThingWorx Analytics Data Preparation is an important phase in the process of Data Analysis when using ThingWorx Analytics. Basically, it is getting your Data from being Raw Data that you might have gathered through your Operational system or from your Data warehouse to the kind of Data ready to be analyzed. In this Document we will be using “Talend Data Preparation Free Desktop” as a Tool to illustrate some examples of the Data Preparations process. This tool could be downloaded under the following Link: https://www.talend.com/products/data-preparation (You could also choose to use another tool) We would also use the Beanpro Dataset in our Examples and illustrations. Checking data formats The analysis starts with a raw data file. The user needs to make sure that the data files can be read. Raw data files come in many different shapes and sizes. For example, spreadsheet data is formatted differently than web data or Sensors collected data and so forth. In ThingWorx Analytics the Data Format acceptable are CSV. So the Data retrieved needs to be inputted into that format before it could be uploaded to TWA Data Example (BeanPro dataset used): After that is done the user needs to actually look at what each field contains. For example, a field is listed as a character field could actually contains none character data. Verify data types Verifying the data types for each feature or field in the Dataset used.  All data falls into one of four categories that affect what sort of analytics that could be applied to it: Nominal data is essentially just a name or an identifier. Ordinal data puts records into order from lowest to highest. Interval data represents values where the differences between them are comparable. Ratio data is like interval data except that it also allows for a value of 0. It's important to understand which categories your data falls into before you feed it into ThingWorx Analytics. For example when doing Predictive Analytics TWA would not accept a Nominal Data Field as Goal. The Goal feature data would have to be of a numerical non nominal type so this needs to be confirmed in an early stage.                 Creating a Data Dictionary A data dictionary is a metadata description of the features included in the Dataset when displayed it consists of a table with 3 columns: - The first column represents a label: that is, the name of a feature, or a combination of multiple (up to 3) features which are fields in the used Dataset. It points to “fieldname” in the configuration json file. - The second column is the Datatype value attached to the label. (Integer, String, Datetime…). It points to “dataType” in the configuration json file. - The third column is a description of the Feature related to the label used in the first column. It points to “description” in the configuration json file. In the context of TWA this Metadata is represented by a Data configuration “json” file that would be uploaded before even uploading the Dataset itself. Sample of BeanPro dataset configuration file below: Verify data accuracy Once it is confirmed that the data is formatted the way that is acceptable by TWA, the user still need to make sure it's accurate and that it makes sense. This step requires some knowledge of the subject area that the Dataset is related to. There isn't really a cut-and-dried approach to verifying data accuracy. The basic idea is to formulate some properties that you think the data should exhibit and test the data to see if those properties hold. Are stock prices always positive? Do all the product codes match the list of valid ones? Essentially, you're trying to figure out whether the data really is what you've been told it is. Identifying outliers Outliers are data points that are distant from the rest of the distribution. They are either very large or very small values compared with the rest of the dataset. Outliers are problematic because they can seriously compromise the Training Models that TWA generates. A single outlier can have a huge impact on the value of the mean. Because the mean is supposed to represent the center of the data, in a sense, this one outlier renders the mean useless. When faced with outliers, the most common strategy is to delete them. Example of the effect of an Outlier in the Feature “AVG Technician Tenure” in BeanPro Dataset:   Dataset with No Outlier: Dataset with Outlier: Deal with missing values Missing values are one of the most common (and annoying) data problems you will encounter. In TWA dealing with the Null values is done by one of the below methods: - Dropping records with missing values from your Dataset. The problem with this is that missing values are frequently not just random little data glitches so this would consider as the last option. - Replacing the NULL values with average values of the responses from the other records of the same field to fill in the missing value Transforming the Dataset - Selecting only certain columns to load which would be relevant to records where salary is not present (salary = null). - Translating coded values: (e.g., if the source system codes male as "1" and female as "2", but the warehouse codes male as "M" and female as "F") - Deriving a new calculated value: (e.g., sale_amount = qty * unit_price) - Transposing or pivoting (turning multiple columns into multiple rows or vice versa) - Splitting a column into multiple columns (e.g., converting a comma-separated list, specified as a string in one column, into individual values in different columns) Please note that: Issue with Talend should be reported to the Talend Team Data preparation is outside the scope of PTC Technical Support so please use this article as an advisable Best Practices document
View full tip
This Guide contains all the Linux commands that you may have to use for ThingWorx Analytics Installation or day to day use. Command/Category Description Network/port ip a List the ips of all of the network interfaces ssh How to jump from one machine to another ping Send packets to a remote machine.  useful for testing connectivity netstat –anp Check active port cat < /dev/tcp/localhost/8080 Test connection to a port Replace localhost with desired hostname or ip, replace 8080 with desired port number (/dev/tcp/host/port) exit Exit my current sign in.  this lets one disconnect from remove ssh sessions or if one has changed one's user e.g. switched to root scp Retrieve something via ssh Resource Usage free -m Check memory -m is for output in Mb Mpstat -P ALL CPU usage top Process usage jvmtop Collect cpu usage of jvm and its thread https://github.com/patric-r/jvmtop (requires jdk to be installed) File Interaction cp / mv Copy and move respectively. mv just deletes the source file.  Usage: cp /source/location/file /output/location/file cat Mostly used to just print the contents of a file to the command line.  can also print multiple files at once:  cat /var/log/gridworker/warning.log /var/log/gridworker/error.log vim / vi A command line text editor. Not the most user friendly (none of them are) but really useful. Here's a cheat sheet for the commands https://www.fprintf.net/vimCheatSheet.html rm Remove files chmod Change the access permissions of files chown Change the user or group ownership of files grep A text based filtering.  Useful for making a larger list smaller and more targeted.  Almost always used after a pipe (see pipe below) less Generally used to view the contents of a file with more friendly scrolling locate Find a file by name Directory ls What’s in the directory.  Will do the current directory but you can also pass the directory e.g. ls /var/log/tomcat. Black writing is files Blue writing is directories Red writing is compressed file pwd Tell me which directory I'm currently in cd Change directory.  provide the directory to change to or just use cd to return to the user's home directory clear Clears the screen Terminal clears all provided commands mkdir Creates Directories Running Processes ps Query what services are running. usually use ps -aux to get a full, sorted list.  using grep with this is helpful systemctl The correct way to interact with services that are running Package installation yum install <packageName> Install a package. More useful commands: https://www.centos.org/docs/5/html/5.1/Deployment_Guide/s1-yum-useful-commands.html yum list installed List installed packages yum list <package> List available packages yum --showduplicates list java-1.7.0-openjdk-devel Use --showduplicates to see all versions Can use * for package name: *openjdk* rpm -ql <packagename> Find where package are installed Note: works if package installed with yum Yumdownloader --urls <packageName> Find URL where a package is downloaded from. Note: need to install yum-utils package Repoquery --requires <packageName> Find dependencies of a package Note: need to install yum-utils package repoquery --qf=%{name} -g --list --grouppkgs=all [groups] | xargs repotrack -a x86_64 -p /repos/Packages Download a package with all its dependencies. Need to install yum-utils package From  <http://unix.stackexchange.com/questions/50642/download-all-dependencies-with-yumdownloader-even-if-already-installed> Other Commands curl http://localhost:8080/1.0/about/versioninfo Send REST call via command line Use -X POST (default GET) for a POST (see man page - https://curl.haxx.se/docs/manual.html for example) See also http://www.codingpedia.org/ama/how-to-test-a-rest-api-from-command-line-with-curl/ Find / -type f -exec grep -I mystring {} \; Search string in files Sudo -u user command Execute a command as different user The below helpers are not commands themselves, but can be used in conjunction with the above commands. Helper Description 'pipe' The | character.  lets one chain commands.  e.g. ps -aux | grep java ./ The shorthand way to refer to this directory explicitly ../ The shorthand way to refer to the parent directory 'tab completion' Pressing tab will let linux guess what command/option best fits what's currently written.  very useful for navigating directories and long-named files (NOTE: not necessarily tab based upon one's keyboard layout/language) 'ctrl-r' Look up the mostly likely command that matches what one typing.  so if one earlier used ps -aux | grep java | less and the hit ctrl-r and typed -aux it would likely pull that command or at least the most recent one that matches
View full tip
There have been a number of questions from customers and partners on when they should use different tools for calculation of descriptive analytics within ThingWorx applications. The platform includes two different approaches for the implementation of many common statistical calculations on data for a property: descriptive services and property transforms. Both of these tools are easy to implement and orchestrate as part of a ThingWorx application. However, these tools are targeted for handling different scenarios and also differ in utilization of compute resources. When choosing between these two approaches it is important to consider the specific use case being implemented along with how the implemented approach will fit into the overall design and architecture of the ThingWorx environment. This article will provide some guidance on scenarios to use each of these approaches in ThingWorx applications and things to consider with each approach.   Let's look at the two different approaches and some guidelines for when they should be used.   Descriptive services (click for more details) provide a set of ThingWorx services to analyze a data set and perform many common data transformations.  These services are targeted for performing calculations and transformations on recent operating history of a single property.  Descriptive services are called on demand to perform batch calculations. Scenarios to use descriptive services: On demand calculations performed within a mashup, a service call or an event to determine action and calculation results are not (always) stored Regular occurring calculations on logged property values or generated datasets (batch calculations) Calculations are done regularly in minutes, hours or days on a discrete set of data.  Examples: average value in last hour, median value in last day, or max value in last half hour.  Time between data creation and analysis is minutes or hours.  Some latency in the calculation result is acceptable for the use case. Input data set has 10s to 100s to 1000s of values.  Keep the size of the input data at 10,800 values or less.  If larger data sizes are required, then break them into micro batches if possible or use other tools to handle the processing. Multiple calculations need to be done from the same set of input data.  Examples: average value in last hour, max value in the last hour and standard deviation value in the last hour are all required to be calculated. Things to consider when using descriptive services Requires input dataset to be in the specific datashape format that is used by descriptive services.  If property values are logged in a value stream, there is a service to query the values and prepare the dataset for processing.  If scenarios where the data is not for a logged property, then another service or sql query can be used to prepare the dataset for processing. Requires javascript development work to implement.   This includes creation of a service to execute the descriptive services and usage of subscriptions and events to orchestrate calculations. An example of the javascript to execute descriptive services is available in the help center (here) Typically retrieval of the input data from value stream (QueryTimedValuesForProperty) is slowest part of the process. The input data is sent to an out of process platform analytics service for all calculations. Broader set of calculation services available (see table at the end of this article) Remember that these services are not meant to be used for big data calculations or big data preparation.  Look for other approaches if the input data sets grow larger than 10,800 values Property Transforms (click for more details) provide a set of transformation services for streaming data as it enters ThingWorx.   Property transforms are targeted for performing continuous calculations on recent values in the stream of a single property and delivering results in (near) real-time.  Since property transforms are continuous calculations, they are always running and using compute resources. Before implementing property transforms review the information in the property transform sizing guide to better understand factors that impact the scaling of property transforms. Scenarios to use: Continuous calculations on a stream for a single property as new data comes into ThingWorx New values enter the stream faster than one value per minute (as a general guideline) Calculations required to be done in seconds or minutes.  Examples: average electrical current in last 10 seconds, median pressure in the last 10 readings,  or max torque in last minute Time between data creation and analysis is small (in seconds).  Results of property transform is required for rapid decisions and action so reducing latency is critical Data sets used for calculation are small and contain 10s to 100s of values.  Calculated results are stored in a new property in the ThingModel Things to consider when using property transforms Codeless process to create new property transforms on a single property in the ThingModel Does not require input property values to be logged as calculations are performed on streaming data as it enters ThingWorx Unlike descriptive services which only execute when called, each property transform creates a continuously running job that will always be using compute resources.  Resource allocations for property transforms must be included in the overall system architecture.  Before selecting the property transform approach, refer to the Property Transform Sizing Guide for more information about how different parameters affect the performance of Property Transforms and results of performance load test scenarios. Let’s apply these guidelines to a few different use cases to determine which approach to select. 1. Mashup application that allows users to calculate and view median temperature over a selected time window In this scenario, the calculation will be executed on-demand with a user defined time window. Descriptive services are the only option here since there is not a pre-defined schedule and the user can select which data to use for the calculation.   2. Calculate the max torque (readings arriving one per second) on a press over each minute without storing all of the individual readings. In this scenario, the calculation will be executed without storing the individual readings coming from the machine. The transformation is made to the data on its way into ThingWorx and continuously calculating based on new values. Property transforms are the only option here since the individual values are not being stored.   3. Calculation of average pressure value (readings arriving one per second) over a five minute window to monitor conditions and raise an alert when the median value is more than two standard deviations from expected. In this scenario, both descriptive services and property transforms can perform the calculation required. The calculation is going to occur every 5 minutes and each data set will have about 300 values. The selection of batch (descriptive services) or streaming (property transforms) will likely be determined by the usage of the result. In this case, the calculation result will be used to raise an alert for a specific five minute window which likely will require immediate action. Since the alert needs to be raised as soon as possible, property transforms are the best option (although descriptive services will handle this case also with less compute resource requirements).   4, Calculation of median temperature (readings each 20 seconds) over 48 hour period to use as input to predict error conditions in the next week. In this scenario, the calculation will be performed relatively infrequently (once every 48 hours) over a larger data set (about 8,640 values). Descriptive services are the best option in this case due to the data size and calculation frequency. If property transforms were used, then compute resources would be tied up holding all of the incoming values in memory for an extended period before performing a calculation. With descriptive services, the calculation will only consume resource when needed, or once every 48 hours.   Hopefully this information above provides some more insight and guidelines to help choose between property transforms and descriptive services. The table below provides some additional comparisons between the two approaches.     Descriptive Services Property Transforms Purpose Provide a set of ThingWorx services to analyze a data set and perform many common data transformations. Provide a set of prescribed transformation services for streaming data as it enters ThingWorx. Processing Mode Batch Streaming / Continuous Delivery API / Service Composer interface API / Service Input Data Discrete data set Must be logged Single property Configurable by time or lookback Rolling data set on property X Persistence is optional Single property Configurable by time or lookback Output Data Return object handled programmatically Single output for discrete data set New property f_X in the input model Continuous output at configurable frequency Output time aligned with input data Available Services Statistics (min, max, mean, median, mode, std deviation) SPC calculations (# continuous data points: above threshold, in / out of range, increasing / decreasing, alternating) Data distribution: count by bins (histogram) Five numbers (min, lower quartile, median, upper quartile, max) Confidence interval Sampling frequency Frequency transform (FFT) Statistics (min, max, mean, median, mode, std deviation) SPC calculations (# continuous data points: above threshold, in / out of range, increasing / decreasing, alternating)
View full tip
We are excited to announce ThingWorx 8.4 is now available for download!    Key functional highlights ThingWorx 8.4 covers the following areas of the product portfolio: ThingWorx Analytics and ThingWorx Foundation which includes Connection Server and Edge capabilities.   ThingWorx Foundation Next Generation Composer: File Repository Editor added for application file management New entity Config Table Editor to enable application configurability and customization Localization support fornew languages: Italian, Japanese, Korean, Spanish, Russian, Chinese/Taiwan, Chinese/Simplified Mashup Builder: Responsive Layout with new Layout Editor 13 new and updated widgets (beta) Theming Editor (beta) New Functions Editor New Personalized Workspace Platform: Added support for AzureSQL, a relational database-as-a-service (DBaaS) as the new persistence provider A PaaS database that is always running on the latest stable version of SQL Server Database Engine and  patched OS with 99.99% availability.   Added support for InfluxData, a leading time series storage platform as the new ThingWorx persistence provider Supports ingesting large amounts of IoT data and offers high availability with clustering setup New extension for Remote Access and Control Supports VNC, RDP desktop sharing for any remote device HTTP and SSH connectivity supported An optional microservice to offload the ThingWorx server by allowing query execution to occur in a separate process on the same or on a different physical machine. Installers for Postgres versions of ThingWorx running on Windows or RHEL AzureSQL InfluxDB Thing Presence feature introduced which indicates whether the connection of a thing is “normal” based on the expected behavior of the device. Remote Access Extension Query Microservice: Click and Go Installers for Windows and Linux (RHEL) Security: Major investments include updating 3rd party libraries, handling of data to address cross-site scripting (XSS)  issues and enhancements to the password policy, including a password blacklist. A significant number of security issues have been fixed in this release. It is recommended that customers upgrade as soon as possible to take advantage of these important improvements. Docker Support  Added Dockerfile as a distribution media for ThingWorx Foundation and Analytics Allows building Docker container image that unlocks the potential of Dev and Ops Note:  Legacy Composer has been removed and replaced with the New Composer.   Documentation: ThingWorx 8.4 Reference Documents ThingWorx Platform 8.4 Release Notes ThingWorx Platform Help Center ThingWorx Analytics Help Center ThingWorx Connection Services Help Center  
View full tip
This post covers how to build and operationalize a time series model using Thingworx Analytics. A lookback window is used to read multiple previous rows before the current one, and base the prediction on those lookback rows.   In this example we use time series data to predict water flow for different water pumps in a system.   There is a full explanation of the method attached, also all necessary resources are included in the attached files.
View full tip
Put together a quick example mashup in support of a couple of analytics projects to demonstrate use of the new TWX Analytics 8.1 APIs within TWX mashup builder. The intention here is for use in POCs to provide a quick way of demonstrating customer-facing analytics outputs along with the more detailed view available in Analytics Builder. Required pre-requisites are: ThingWorx 8.1 + Analytics Extensions ThingWorx Analytics Server 8.1 Carousel TWX UI widget (attached) imported into TWX Data set(s) loaded with signals / profiles generated. The demo can be installed by importing the attached entities file into TWX composer then launching the mashup 'EMEA.Analytics.CustomerInsightMashUp'. A quick run through of the functionality ... On launching the mashup, data sets and models are displayed for selection on the left hand-side. On selecting dataset and model, signals are presented in two tabs - first an overview of all signals. The list on the left can be expanded by changing the value for 'Top <n> Contributing Features'. On selecting a signal from the list, the 'Selected Signal Details' tab displays additional charting for value ranges, average goal etc. The number of 'bins' to display can be edited. Similarly, profiles can be viewed from the 'Profiles' tab - each profile can be selected by dragging the upper carousel. This is all done using the Analytics 8.1 "Things" in TWX along with an additional custom Thing with some scripted services (EMEA.Analytics.Helper). Thanks to Arian Van Huelsen & Tanveer Saifee at PTC for their support; all comments / feedback welcome.
View full tip
A Feature - a piece of information that is potentially useful for prediction. Any attribute could be a feature, as long as it is useful to the model. Feature engineering – Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data. It’s a vaguely agreed space of tasks related to designing feature sets for Machine Learning applications. Components: First, understanding the properties of the task you’re trying to solve and how they might interact with the strengths and limitations of the model you are going to use. Second, experimental work were you test your expectations and find out what actually works and what doesn’t. Feature engineering as a technique, has three sub categories of techniques: Feature selection, Dimension reduction and Feature generation. Feature Selection: Sometimes called feature ranking or feature importance, this is the process of ranking the attributes by their value to predictive ability of a model. Algorithms such as decision trees automatically rank the attributes in the data set. The top few nodes in a decision tree are considered the most important features from a predictive stand point. As a part of a process, feature selection using entropy based methods like decision trees can be employed to filter out less valuable attributes before feeding the reduced dataset to another modeling algorithm. Regression type models usually employ methods such as forward selection or backward elimination to select the final set of attributes for a model. For example: Project Development decision-tree:                                                  Dimension Reduction: This is sometimes called feature extraction. The most classic example of dimension reduction is principle component analysis or PCA. PCA allows us to combine existing attributes into a new data frame consisting of a much reduced number of attributes by utilizing the variance in the data. The attributes which "explain" the highest amount of variance in the data form the first few principal components and we can ignore the rest of the attributes if data dimensionality is a problem from a computational standpoint. Feature Generation or Feature Construction: Quite simply, this is the process of manually constructing new attributes from raw data. It involves intelligently combining or splitting existing raw attributes into new one which have a higher predictive power. For example a date stamp may be used to generate 2 new attributes such as AM and PM which may be useful in discriminating whether day or night has a higher propensity to influence the response variable. Feature construction is essentially a data transformation process. Tips for Better Feature Engineering Tip 1: Think about inputs you can create by rolling up existing data fields to a higher/broader level or category. As an example, a person’s title can be categorized into strategic or tactical. Those with titles of “VP” and above can be coded as strategic. Those with titles “Director” and below become tactical. Strategic contacts are those that make high-level budgeting and strategic decisions for a company. Tactical are those in the trenches doing day-to-day work.  Other roll-up examples include: Collating several industries into a higher-level industry: Collate oil and gas companies with utility companies, for instance, and call it the energy industry, or fold high tech and telecommunications industries into a single area called “technology.” Defining “large” companies as those that make $1 billion or more and “small” companies as those that make less than $1 billion.   Tip 2: Think about ways to drill down into more detail in a single field. As an example, a contact within a company may respond to marketing campaigns, and you may have information about his or her number of responses. Drilling down, we can ask how many of these responses occurred in the past two weeks, one to three months, or more than six months in the past. This creates three additional binary (yes=1/no=0) data fields for a model. Other drill-down examples include: Cadence: Number of days between consecutive marketing responses by a contact: 1–7, 8–14, 15–21, 21+ Multiple responses on same day flag (multiple responses = 1, otherwise =0) Tip 3: Split data into separate categories also called bins. For example, annual revenue for companies in your database may range from $50 million (M) to over $1 billion (B). Split the revenue into sequential bins: $50–$200M, $201–$500M, $501M–$1B, and $1B+. Whenever a company falls with the revenue bin it receives a one; otherwise the value is zero. There are now four new data fields created from the annual revenue field. Other examples are: Number of marketing responses by contact: 1–5, 6–10, 10+ Number of employees in company: 1–100, 101–500, 502–1,000, 1,001–5,000, 5,000+ Tip 4: Think about ways to combine existing data fields into new ones. As an example, you may want to create a flag (0/1) that identifies whether someone is a VP or higher and has more than 10 years of experience. Other examples of combining fields include: Title of director or below and in a company with less than 500 employees Public company and located in the Midwestern United States You can even multiply, divide, add, or subtract one data field by another to create a new input. Tip 5: Don’t reinvent the wheel – use variables that others have already fashioned. Tip 6: Think about the problem at hand and be creative. Don’t worry about creating too many variables at first, just let the brainstorming flow.
View full tip
This Blog presents a simple Java utility to validate the deployment of ThingWatcher. It is important to note that the utility used is not a real life situation, the intent was to keep it as simple as possible in order to achieve its aim: validation of the deployment. An understanding of Java IDE (such as Eclipse) is necessary in order to run the utility with relevant dependency and classpath setup. Those are beyond the scope of this posting. We will cover the following points: Pre-Requisites Using the sample utility Code walk through Validate training job creation Validate model job creation Update for ThingWorx Analytics 8.0 Pre-requisites A strict adherence to the ThingWatcher deployment guide is recommended in order to first deploy training and model microservices as well as to familiarize yourself with ThingWatcher APIs. Prior to testing ThingWatcher, both the training and model microservices should be up and running The media for ThingWatcher (including model and training micro-service) should be downloaded from PTC Software Download page . The commands to deploy the micro-services will vary depending on the platform used and are presented in the ThingWatcher deployment guide. As a reference example, on Windows the command will be similar to the following: Start Docker: Start > Program > Docker > Docker Quick Start Terminal Load model micro service tar $ docker load < "D:\PTC\MED-61147-CD-522_F000_ThingWorx-Analytics-ThingWatcher-52-2\components\ModelService\ModelService\model-service.tar"     3. Install model service: $ docker run -d -p 8080:8080 -v '/d/TWatcherStorage/model:/data/models' -v '/d/TWatcherStorage/db:/tmp/' twxml/model-service:1.0 -Dfile.storage.path=/data/models -jar maven/model-1.0.jar server maven/standalone-evaluator.yml     4. Load training micro service tar file                         $ docker load < "D:\PTC\MED-61147-CD-522_F000_ThingWorx-Analytics-ThingWatcher-52-2\components\TrainingService\TrainingService\training-service.tar"     5. Install training service                         $ docker run -d -p 8090:8080  twxml/training-service:1.0.0  -Dmodel.destination.uri=model://192.168.99.100:8080/models -jar maven/training-standalone-1.0.0-bin.jar server /maven/training-standalone-single.yml Note: the -Dmodel.destination.uri points here to the model micro-service host. To find the ip address, enter docker-machine ip on the model micro-service docker machine.     6. Validate micro-services deployment: Execute docker ps  and confirmed that both services are up, as in the following example: CONTAINER ID        IMAGE                          COMMAND                      CREATED            STATUS              PORTS NAMES 5b6a29b95611        twxml/training-service:1.0.0  "java -Dmodel.destina"  13 days ago        Up 44 minutes      8081/tcp, 0.0.0.0:8090->8080/tcp  modest_albattani 8c13c0bc910e        twxml/model-service:1.0        "java -Dfile.storage."      2 weeks ago        Up 44 minutes      0.0.0.0:8080->8080/tcp, 8081/tcp  thirsty_ptolemy   Using the sample utility Download the attachment Main.java Import Main.java into Eclipse (or IDE of choice) with the ThingWatcher dependencies added in classpath. Update the trainingBaseURI (see below) to points to the training micro-services. The utility should be ready to execute. Code walk through The code declares a thingwatcher in the following snippet: ThingWatcher thingwatcher = new ThingWatcherBuilder() .certainty(90.0) .trainingDataDuration(60) .trainingDataDurationUnit(DurationUnit.SECOND) .trainingBaseURI("http://192.168.99.100:8090/training") .getThingWatcher(); In the above code it is important to update the trainingBaseURI argument with the correct ip address and port for the training micro-service host. The code then loops 10000 times and sends a new value, which simulates the sensor data, at a simulated 100 ms interval. The value is computed as Math.sin(i) for the whole calibrating phase and most of the monitoring phase too. We artificially introduce an anomaly by sending a value of Math.incremetExact(i) between the 9000 th and 9900 th iterations. During the Monitoring phase, the code logs the value, the anomalous status and the thingwatcher state. It is advised to save the output to a file in order to review the logging once the utility has run. In Eclipse this can be done by selecting the Main.java with right mouse button > Run As… > Run Configuration > Common and tick Output File under the Standard Input and Output, and specify a location for the output file. A review of the output log file will shows that somewhere between timestamp 900000 and 990000, the isAnomalousValue is true. Note that this does not starts and ends exactly at 900000 and 990000, as ThingWatcher needs a few occurrences before reporting it as anomaly. Sample output indicating an anomalous state: [main] INFO com.thingworx.analytics.demo.Main - Value = 901700,9017.0,-9016.403802019577 [main] INFO com.thingworx.analytics.demo.Main - isAnomalousValue = true [main] INFO com.thingworx.analytics.demo.Main - ThingWatcherStat = MONITORING As part of validating the successful deployment of ThingWatcher, it is recommended to validate the correct creation of a training and model job. Validate training job creation In order to validate the successful creation of a training job, execute a GET request to the training micro service : http://192.168.99.100:8090/training (update the ip address to the one on your system) This should return a COMPLETED job whose body starts with something similar to: Validate model job creation In order to validate the successful creation of a model job, execute a GET request to http://192.168.99.100:8080/models (update the ip address to the one on your system) to see all the models that have been created. For example: Alternatively, click (or use) the URI reported in the training job output, here http://192.168.99.100:8080/models/6/pmml.xml, to see the complete model definition. The output will be similar to: When this sample test runs correctly, the ThingWatcher deployment has been validated. Update for ThingWorx Analytics 8.0 Deploying the microservices, see Video Link : 1937 Updated Java code: see Does anyone know how to use java api to achieve anomaly detection with Thingwatcher8.0? To Note: The utility provided is for testing purpose only. The code does not represent any kind of best practice and is not meant to be a perfect java coding example. It is provided as is with no guarantee.
View full tip
The following is valid  for ThingWorx Analytics (TWA) 52.0.2 till 8.0 For release 8.3.0 and above see How to score new data in ThingWorx Analytics 8.3.x ?   Overview The main steps are as follow: Create a dataset Configure the dataset Upload data to the dataset Optimize the dataset Create filters for training and scoring data Train the model Execute scoring on existing data Upload new data to dataset Execute scoring on new data TWA models are dataset centric, which means a model created with one dataset cannot be reused with a different dataset. In order to be able to score new data, a specific feature, record purpose in the below example, is included in the dataset. This feature needs to be included from the beginning when the data is first uploaded to TWA. A filter on that feature can then be created to allow to isolate desired data. When new data comes in, they are added to the original dataset but with a specific value for the filtered feature (record purpose), which allows to discriminate and score only those new records. Process Create a dataset This example uses the beanpro demo dataset Create dataset is done through a POST on datasets REST API as below 2. Configure dataset This is done through a POST on <dataset>/configuration REST API 3.      Upload data         4.      Optimize the dataset         5.      Create filters The dataset includes a feature named record purpose created especially to differentiate between the rows to be used for training and the rows to be used for scoring. New data to be added will have record purpose set to scoringnew, which will allow to execute a scoring job limited to those filtered new rows Filter for training data: Filter for new scoring data        6.      Train the model This is done through a POST on <dataset>/prediction API        7.      Score the training data This is done through a POST on <dataset>/predictive_scores API. Note the use of the filter TrainingData created earlier. This allow to score only the rows with training as value for record purpose feature. Note: scoring could also be done without filter at this stage, in which case all the data in the dataset will be scored and not just the ones with training fore record purpose   Retrieving the scoring result show all the records in the dataset:   8.      Upload new data The newly uploaded csv file should only contains new record. This will be appended to the existing ones.   Note that the new record (it could be more than one) has a value scoringnew for the record purpose feature: This will allow to use the previously created filter ScoringNewData so that a new scoring job will only take into account this new record.   9.      Scoring new data A POST on API predictive_scores is executed however using the filter ScoringNewData. This results in only the newly added data to be scored and therefore a much quicker execution time too. Retrieving the scoring result shows only the new record:
View full tip
With ThingWorx, we can already use univariate anomaly alerts (on a single sensor value). However, in many situations, the readings from an individual sensor may not tell you much about the overall issue and a multivariate anomaly detector can be more useful. This post is intended to provide an overview of the Azure Anomaly Detector and how it can be integrated with ThingWorx. The attachment contains: A document with detailed instructions about the setup; A .csv file with the multivariate timeseries dataset; A .twx file with some entities that need to be imported in ThingWorx as well as the CSVParser extension that needs to be installed; A .zip file that will need to uploaded in an Azure Blob Container at some point in the setup
View full tip
In our interactions with PTC customers we often learn they have previously performed Analytics modeling in Python, Matlab, R, or even built home grown analyses in languages such as Java or C++. As expected, when adopting an Industrial Innovation Platform such as ThingWorx that also has its own ThingWorx Analytics module, customers do not want to reimplement everything from scratch and would rather integrate their previous work in the Smart Applications built in ThingWorx, leveraging a combination of their existing toolset together with ThingWorx Analytics modeling. That is certainly possible and there are multiple ways to do that. In this article we will focus on several general ways to make that happen, but it is important to keep in mind that language specific approaches are also possible and we are happy to discuss those in the specific context of the customer.   Here are five different ways to bring existing Analytics into ThingWorx: If the task is to reuse an existing predictive model developed in a language such as Python/R/Matlab, typically one can export that model in PMML (Predictive Model Markup Language), an xml format, and import it in ThingWorx Analytics using the AnalyticsServer_ResultsThing -> UploadModel service. Libraries such as sklearn2pmml & r2pmml can be utilized towards that goal. The imported model can then be used in the same fashion as a ThingWorx Analytics developed model to power smart applications built in ThingWorx. If the Analysis involves more complex tasks than Predictive Modeling, such as custom data normalizations or non-standard Machine Learning models or home grown algorithms, one can use the options below. Call the ThingWorx exposed REST Web API from Python/Matlab/R/Java/Javascript. Every service from ThingWorx can be called that way, and the API can also be used to push analyses results into ThingWorx for further consumption, perhaps together with other sources of data such as sensor readings, in the smart applications built there. The documentation for the ThingWorx REST API can be found here.  Expose the existing Analytics via using a thin layer of REST Web Services. For example, in Python, this can be done using Flask, with few lines of code. Then, the orchestration can happen from ThingWorx by calling the exposed Web Service and weaving the results back into smart applications. Often our customers' current architecture involves a relational database (e.g. SQL Server, Oracle, etc) that is powering the existing Analytics, and stores the end results (predictions, correlations, etc). In this scenario, we can connect ThingWorx directly to that database to read these results.  Finally, in the case of complex Analytics, where a tighter integration with ThingWorx is desired, existing Analytics / algorithms can be wrapped into a ThingWorx Extension or an Analytics Provider using the corresponding PTC SDKs.  When choosing an integration option, customers need to carefully balance complexity of integration, constraints of their architecture, Analytics modeling complexity, as well as end user consumption requirements.
View full tip