Community Tip - You can Bookmark boards, posts or articles that you'd like to access again easily! X

IoT Tips

Sort by:
Video Author:                    Christophe Morfin Original Post Date:            June 9, 2017 Applicable Releases:        ThingWorx Analytics 8.0   Description: In this video we go through the steps to install ThingWorx Analytics Server 8.0.    
View full tip
A confusion matrix is a technique for summarizing the performance of a classification algorithm. Classification accuracy alone can be misleading if you have an unequal number of observations in each class or if you have more than two classes in your data set. Calculating a confusion matrix can give you a better idea of what your classification model is getting right and what types of errors it is making. Classification Accuracy and its Limitations: ​Classification Accuracy = Correct Predictions/Total Predictions The main problem with classification accuracy is that it hides the detail you need to better understand the performance of your classification model. Below are two examples: 1.  When you are data has more than 2 classes. With 3 or more classes you may get a classification accuracy of 80%, but you don’t know if that is because all classes are being predicted equally well or whether one or two classes are being neglected by the model. 2.  When your data does not have an even number of classes. You may achieve accuracy of 90% or more, but this is not a good score if 90 records for every 100 belong to one class and you can achieve this score by always predicting the most common class value. Classification accuracy can hide the detail you need to diagnose the performance of your model. But thankfully we can tease apart this detail by using a confusion matrix. Confusion Matrix Terminology: A confusion matrix is a table that is often use to describe the performance of a classification model on a set of test data for which true values are known. Let’s start with an example for a binary classifier: N=165 Predicted no: Predicted yes: Actual no: 50 10 Actual yes: 5 100 What we can learn from Confusion Matrix? There are two possible predicted classes: "yes" and "no". If we were predicting the presence of a disease, for example, "yes" would mean they have the disease, and "no" would mean they don't have the disease. The classifier made a total of 165 predictions (e.g., 165 patients were being tested for the presence of that disease). Out of those 165 cases, the classifier predicted "yes" 110 times, and "no" 55 times. In reality, 105 patients in the sample have the disease, and 60 patients do not. Let's now define the most basic terms, which are whole numbers (not rates): True positives (TP): These are cases in which we predicted yes (they have the disease), and they do have the disease. True negatives (TN): We predicted no, and they don't have the disease. False positives (FP): We predicted yes, but they don't actually have the disease. (Also known as a "Type I error.") False negatives (FN): We predicted no, but they actually do have the disease. (Also known as a "Type II error.") N=165 Predicted No: Predicted Yes: Actual No: TN=50 FP=10 60 Actual Yes: FN=5 TP=100 105 55 110 This is a list of rates that are often computed from a confusion matrix for a binary classifier: Accuracy: Overall, how often is the classifier correct? 1. (TP+TN)/total = (100+50)/165 = 0.91 Misclassification Rate: Overall, how often is it wrong? 1. (FP+FN)/total = (10+5)/165 = 0.09 2. Equivalent to 1 minus Accuracy 3. Also known as "Error Rate" True Positive Rate: When it's actually yes, how often does it predict yes? 1. TP/actual yes = 100/105 = 0.95 2. Also known as "Sensitivity" or "Recall" False Positive Rate: When it's actually no, how often does it predict yes? 1. FP/actual no = 10/60 = 0.17 Specificity: When it's actually no, how often does it predict no? 1. TN/actual no = 50/60 = 0.83 2. Equivalent to 1 minus False Positive Rate Precision: When it predicts yes, how often is it correct? 1. TP/predicted yes = 100/110 = 0.91 Prevalence: How often does the yes condition actually occur in our sample? 1. Actual yes/total = 105/165 = 0.64
View full tip
Key Functional Highlights ThingWorx 8.0 covers the following areas of the product portfolio:  ThingWorx Analytics, ThingWorx Utilities and ThingWorx Foundation which includes Core, Connection Server and Edge capabilities. Highlights of the release include: ThingWorx Foundation Native Industrial Connectivity: Enhancements to ThingWorx allow users to seamlessly map data from ThingWorx Industrial Connectivity to the ThingModel. With over 150 protocols supporting thousands of devices, ThingWorx Industrial Connectivity allows users to connect, monitor, and manage diverse automation devices from ThingWorx. With this new capability, users can quickly integrate industrial operations data in IoT solutions for smart, connected operations. Native AWS IoT and Azure IoT Cloud Support: ThingWorx 8 now has deeper, native integration with AWS IoT and Azure IoT Hub clouds so you can gain cost efficiencies and standardize on the device cloud provider of your choice.  This support strengthens the connection between leading cloud providers and ThingWorx. Next Generation Composer: Re-imagined Composer using modern browser concepts to improve developer efficiency including enhanced functionality, updated user interface and optimized workflows. Product Installers:  New, Docker-based product installers for Foundation and Analytics make it easy and fast for customers to get the core platform and analytics server running. Single Sign On (SSO): Provides the ability to login once and access all PTC apps and enterprise systems. License Management: Simple, automated, licensing system for collection, storage, reporting, management and auditing of licensing entitlements. Integration Connectors: Integration Connectors allow Thingworx developers and administrators quick and easy access to the data stored on external ERP, PLM, Manufacturing and other systems to quickly develop applications providing improved Contextualization and Analysis. Thingworx 8.0 delivers ‘OData’ and ‘SAP OData’ connectors plus the ability to connect to generic web services to supplement the ‘Swagger’ and ‘Windchill Swagger’ Connectors released in Thingworx 7.4. An improved mapping tool allows Business Administrators to quickly and easily transform retrieved data into a standard Thingworx format for easy consumption. Includes single sign on support for improved user experience. ThingWorx Analytics Native Anomaly Detection: ThingWorx 8 features more tightly integrated analytics capabilities, including the ability to configure anomaly alerts on properties directly from the ThingWorx Composer. ThingWatcher technology is utilized to increase machine monitoring capabilities by automatically learning normal behavior, continuously monitoring data streams and raising alerts when abnormal conditions are identified. ThingWorx Utilities Software Content Management (SCM) – Auto Retry: Provides the ability to automatically retry delivery of patches to devices if interrupted.  This ensures the ability to successfully update devices.  ThingWorx Trial Edition ThingWorx Trial Edition will be available to internal PTC resources at launch and will be made available externally on the Developer Portal shortly after launch. Developer Enablement: Enhancements have been made to the Trial Edition installation tool, providing a native installation process of the ThingWorx platform including: ThingWorx Foundation ThingWorx Utilities ThingWorx Analytics ThingWorx Industrial Connectivity Documentation ThingWorx 8.0 Reference Documents ThingWorx Analytics 8.0 Reference Documents ThingWorx Core 8.0 Release Notes ThingWorx Core Help Center ThingWorx Edge SDKs and WebSocket-based Edge MicroServer Help Center ThingWorx Connection Services Help Center ThingWorx Industrial Connectivity Help Center ThingWorx Utilities Help Center ThingWorx Utilities Installation Guide ThingWorx Analytics Help Center ThingWorx Trial Edition User Guide Additional information ThingWorx eSupport Portal ThingWorx Developer Portal ThingWorx Marketplace Download The following items are available for download from the PTC Software Download site. ThingWorx Platform – Select Release 8.0 ThingWorx Utilities – Select Release 8.0 ThingWorx Analytics – Select Release 8.0 You can also read this post in the Developer Community from Jeremy Little about the technical changes in ThingWorx 8.0.
View full tip
In this video we show the setup for anomaly detection (ThingWatcher) in release 8.4. We also show how to create an anomaly alert.  
View full tip
How to score new data with ThingWorx Analytics ?   The following is valid starting with ThingWorx Analytics (TWA) 8.3.0   Overview   Once a training model has been created, one of the main objective is to score new data to predict the value for the goal ThingWorx Analytics can score new data in 2 ways: Batch scoring Real time scoring Batch scoring   Batch scoring will be used when a large amount of data needs to be scored. To perform a batch scoring we will usually follow steps similar to the below ones: Upload the historic data Create a new model with this historic data Upload new data – the one to be scored Perform a prediction job to score those new data Retrieve the prediction job result Uploading the new data can be done in different ways. If using a large amount of data, it can be easier to upload the data via a csv file in a similar way as the historic data. This is the way used in ThingWorx Analytics Builder. If the amount of data is more limited this can be sent in the body of the scoring request. The post Analytics: Prediction Methods Mashup  shows a good example of how to do this using the PredictionThing.BatchScore service. We are focusing below on ThingWorx Analytics Builder, that is uploading new data via a csv file. In order to perform the scoring job only on the new data in step 4 above, we need to be able to filter those added data. If the dataset has already suitable column/feature such as a timestamp for example, we can use this to score only new data after timestamp > newdate, assuming all data are in chronological order. If the dataset has no such feature, we will have to add one  beforehand when we first upload the historic data in step 1 above. We often use a new column/feature named record_purpose to this effect. So initial data can take a value of training for this record_purpose feature since they are used to create the initial model. Then new added data to be scored can get any value that identify those rows only. It is important to note that this record_purpose feature needs to be set with the optType INFORMATIONAL so as to not be taken into account by the learning algorithms.   The video below shows those steps while using ThingWorx Analytics Builder   Real time scoring   Real time scoring is better suited for small amount of data. The process for real time scoring can be done either via the Analytics Server PredictionThing RealTimeScore service or using the Analytics Manager framework. The posts How to work with ordinal and categorical data in ThingWorx Analytics  and Analytics: Prediction Methods Mashup do give  examples of the use of the RealTimeScore service.   We will concentrate below on the Analytics Manager. The process involves the following steps: In Analytics Manager Create an Analysis Provider that uses the AnalyticsServerConnector connector Publish the model created in ThingWorx Analytics Builder to Analytics Manager Enable the model created Create an Analysis Event Map the properties to the datashape field Enable the Event In ThingWorx Composer Relevant properties of the Thing used in the Analysis Event are updated in someway This trigger the analysis job to be executed The scoring result is populated into the result property mapped in the Analysis event The Help Center has got more detailed about this process. The following video shows those steps Following articles can also be of interest for this topic: How to use ThingPredictor in release 8.3 of ThingWorx Analytics Server ? Publish model from Analytics Builder into Analytics Manager using TW.AnalysisServices.AnalyticsServer.AnalyticsServerConnector Creating Template For Thing, And Configure Analysis Event For Real-Time Scoring via Analytics Manager Note that the AnalyticsServerConnector connector in release 8.3 replaces the ThingPredictor connector from previous releases.
View full tip
The accuracy of a predictive model can be boosted in two ways: Either by embracing Feature engineering or by applying boosting algorithms straight away. There are multiple boosting algorithms like Gradient Boosting, XGBoost, AdaBoost, Gentle Boost etc. Every algorithm has its own underlying mathematics and a slight variation is observed while applying them. While working with boosting algorithms, we have come across two frequently occurring buzzwords: Bagging and Boosting. Bagging: It is an approach where you take random samples of data, build learning algorithms and take simple means to find bagging probabilities. Boosting: Boosting is similar, however the selection of sample is made more intelligently. We subsequently give more and more weight to hard to classify observations. Below are Default Algorithms used in Predictive Models generated in ThingWorx Analytics: Decision Tree Gradient Boost Linear regression Neural Net Random Forrest Logistic Regression Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differential loss function. Let’s begin with an easy example: Assume, you are given a previous model M to improve on. Currently you observe that the model has an accuracy of 80% (any metric). How do you go further about it? One simple way is to build an entirely different model using new set of input variables and trying better ensemble learners. On the contrary, we have a much simpler way to suggest. It goes like this: Y = M(x) + error What if we are able to see that error is not a white noise but have same correlation with outcome(Y) value. What if we can develop a model on this error term? Like:error = G(x) + error2 Probably, we will see error rate will improve to a higher number, say 84%. Let’s take another step and regress against error2: error2 = H(x) + error3 Now we combine all these together: Y = M(x) + G(x) + H(x) + error3 This probably will have a accuracy of even more than 84%. What if we can find an optimal weights for each of the three learners: Y = alpha * M(x) + beta * G(x) + gamma * H(x) + error4 How Gradient Boosting Works: 1. Loss Function: The loss function used depends on the type of problem being solved. It must be differential, but many standard loss functions are supported and you can define your own. A benefit of the gradient boosting framework is that a new boosting algorithm does not have to be derived for each loss function that may want to be used, instead, it is a generic enough framework that any differential loss function can be used. 2. Weak Learner: Decision trees are used as the weak learner in gradient boosting. Specifically regression trees are used that output real values for splits and whose output can be added together, allowing subsequent models outputs to be added and “correct” the residuals in the predictions. Trees are constructed in a greedy manner, choosing the best split points based on purity scores like Gini or to minimize the loss. 3. Additive Model: Trees are added one at a time, and existing trees in the model are not changed. A gradient descent procedure is used to minimize the loss when adding trees. we have weak learner sub-models or more specifically decision trees. After calculating the loss, to perform the gradient descent procedure, we must add a tree to the model that reduces the loss. Improvements to Basic Gradient Boosting: 1. Tree Constraints: It is important that the weak learners have skill but remain weak. Below are some constraints that can be imposed on the construction of decision trees: Number of trees: ​Generally adding more trees to the model can be very slow to over fit. The advice is to keep adding trees until no further improvement is observed. Tree depth: Deeper trees are more complex trees and shorter trees are preferred. Generally, better results are seen with 4-8 levels. Number of nodes or number of leaves: like depth, this can constrain the size of the tree, but is not constrained to a symmetrical structure if other constraints are used. Number of observations per split: Imposes a minimum constraint on the amount of training data at a training node before a split can be considered Minimum improvement to loss: Is a constraint on the improvement of any split added to a tree. 2. Weighted Updates: The contribution of each tree to this sum can be weighted to slow down the learning by the algorithm. This weighting is called a shrinkage or a learning rate. "Each update is simply scaled by the value of the “learning rate parameter v". 3. Stochastic Gradient Boosting: At each iteration a sub sample of the training data is drawn at random (without replacement) from the full training data set. The randomly selected sub sample is then used, instead of the full sample, to fit the base learner. 4. Penalized Gradient Boosting: The additional regularization term helps to smooth the final learnt weights to avoid over-fitting. Intuitively, the regularized objective will tend to select a model employing simple and predictive functions.
View full tip
This video is the 3 rd part of a series of 3 videos walking you through how to setup ThingWatcher for Anomaly Detection. In this video we will use Anomaly Mashup to visualize data received from my remote device.   Updated Link for access to this video:  Anomaly Detection 8.0:  Viewing Data via Anomaly Mashup:  Part 3 of 3
View full tip
This video is the 2 nd part of a series of 3 videos walking you through how to setup ThingWatcher for Anomaly Detection. In this video you will learn how to use “Discover UI” from the “New Composer” to bind simulated data coming through KEPServer for Anomaly Detection.   Updated Link for access to this video:  Anomaly Detection 8.0: Configuring Anomaly Alerts:  Part 2 of 3
View full tip
Sampling Strategy​ This Blog Post will cover the 4 sampling Strategies that are available in ThingWorx Analytics.  It will tell you how the sampling strategy runs behind the scenes, when you may want to use that strategy, and will give you the pros and cons of each strategy. SAMPLE_WITH_REPLACEMENT This strategy is not often used by professionals but still may be useful in certain circumstances.  When you sample with replacement, the value that you randomly selected is then returned to the sample pool.  So there is a chance that you can have the same record multiple times in your sample. Example Let’s say you have a hat that contain 3 cards with different people’s names on them. John Sarah Tom Let’s say you make 2 random selections. The first selection you pull out the name Tom. When you sample with replacement, you would put the name Tom back into the hat and then randomly select a card again.  For your second selection, it is possible to get another name like Sarah, or the same one you selected, Tom. Pros May find improved models in smaller datasets with low row counts Cons The Accuracy of the model may be artificially inflated due to duplicates in the sample SAMPLE_WITHOUT_REPLACEMENT This is the default setting in ThingWorx Analytics and the most commonly used sampling strategy by professionals.  The way this strategy works is after the value is randomly selected from the sample pool, it is not returned.  This ensures that all the values that are selected for the sample, are unique. Example Let’s say you have a hat that contain 3 cards with different people’s names on them. John Sarah Tom Let’s say you make 2 random selections. The first selection you pull out the name Tom. When you sample without replacement, you would randomly select a card from the hat again without adding the card Tom.  For your second selection, you could only get the Sarah or John card. Pros This is the sampling strategy that is most commonly used It will deliver the best results in most cases Cons May not be the best choice if the desired goal is underrepresented in the dataset UPSAMPLE_AND_SAMPLE_WITHOUT_REPLACEMENT This is useful when the desired goal is underrepresented in the dataset.  The features that represent the desired outcome of the goal are copied multiple times so they represent a larger share of the total dataset. Example Let’s say you are trying to discover if a patient is at risk for developing a rare condition, like chronic kidney failure, that affects around .5% of the US population.  In this case, the most accurate model that would be generated would say that no one will get this condition, and according to the numbers, it would be right 99.5% of the time.  But in reality, this is not helpful at all to the use case since you want to know if the patient is at risk of developing the condition. To avoid this from happening, copies are made of the records where the patient did develop the condition so it represents a larger share of the dataset.  Doing this will give ThingWorx Analytics more examples to help it generate a more accurate model. Pros Patterns from the original dataset remain intact Cons Longer training time DOWNSAMPLE_AND_SAMPLE_WITHOUT_REPLACEMENT This is also useful when the desired goal is underrepresented in the dataset. In downsample and sample without replacement, some features that do not represent the desired goal outcome are removed. This is done to increase the desired features percentage of the dataset. Example Let’s continue using the medical example from above.  Instead of creating copies of the desired records, undesired record are removed from the dataset.  This causes the records where patients did develop the condition to occupy a larger percentage of the dataset. Pros Shorter training time Cons Patterns from the original dataset may be lost
View full tip
This video will walk you through the first steps of how to set-up Analytics Manager for Real-Time Scoring. More specifically this video demonstrate how to share your predictive model from Analytics Builder into Analytics Manger -and test the shared model.   Updated Link for access to this video::  ThingWorx Analytics Manager: Publish & Test a Predictive Model
View full tip
In this Blog, we will share some light about Gradient boost, which is a default algorithm in our Analytics platform. We will touch on: 1) The main purpose of Gradient boost and how the technique works. 2) We will look at advantages and constraint. 3) Last some “nice to know” tips when working with Gradient. Gradient boost is a machine learning technique which main purpose is to help weak prediction models become stronger. Gradient boost works by building one tree at a time, and correct errors made by previously tree. The theory support reweights of edges which allows badly weight edges to get reweighted. For example the misclassified gain weight and those weights which are classified correctly, lose weight. It is kind of the same strategy when dealing with stocks; you balance the investment between bonds and share. An analog could also be done to illnesses; If a doctor informs that you have a rare disease, you want to make sure to get a few more opinions from other doctors, You will evaluate all the information to make a more correct decision about how to cure yourself. Why use gradient boost: - Gradient boost provides the user with a powerful tool to boost/improve weak prediction models. - Gradient boost works well with regression and classification problems, therefore Decision tree can benefit from applying gradient boost. - Gradient bo​ost is known in the industry, to be one of the best techniques to use when dealing with model improvement. - Gradient boost uses stagewise fashion, in this way each time it adjust a tree, it does not go back and readjust when dealing with the next tree. As with all machine learning algorithms gradient boost also have some constraint: - There is a change of overfitting. “Nice to know” tips: - A natural way to reduce this risk of overfitting would be to monitor and adjust the iterations. - The depth of the tree might have an influence on the prediction error, observe what happens if the depth is a stump/1 level deep.
View full tip
In our interactions with PTC customers we often learn they have previously performed Analytics modeling in Python, Matlab, R, or even built home grown analyses in languages such as Java or C++. As expected, when adopting an Industrial Innovation Platform such as ThingWorx that also has its own ThingWorx Analytics module, customers do not want to reimplement everything from scratch and would rather integrate their previous work in the Smart Applications built in ThingWorx, leveraging a combination of their existing toolset together with ThingWorx Analytics modeling. That is certainly possible and there are multiple ways to do that. In this article we will focus on several general ways to make that happen, but it is important to keep in mind that language specific approaches are also possible and we are happy to discuss those in the specific context of the customer.   Here are five different ways to bring existing Analytics into ThingWorx: If the task is to reuse an existing predictive model developed in a language such as Python/R/Matlab, typically one can export that model in PMML (Predictive Model Markup Language), an xml format, and import it in ThingWorx Analytics using the AnalyticsServer_ResultsThing -> UploadModel service. Libraries such as sklearn2pmml & r2pmml can be utilized towards that goal. The imported model can then be used in the same fashion as a ThingWorx Analytics developed model to power smart applications built in ThingWorx. If the Analysis involves more complex tasks than Predictive Modeling, such as custom data normalizations or non-standard Machine Learning models or home grown algorithms, one can use the options below. Call the ThingWorx exposed REST Web API from Python/Matlab/R/Java/Javascript. Every service from ThingWorx can be called that way, and the API can also be used to push analyses results into ThingWorx for further consumption, perhaps together with other sources of data such as sensor readings, in the smart applications built there. The documentation for the ThingWorx REST API can be found here.  Expose the existing Analytics via using a thin layer of REST Web Services. For example, in Python, this can be done using Flask, with few lines of code. Then, the orchestration can happen from ThingWorx by calling the exposed Web Service and weaving the results back into smart applications. Often our customers' current architecture involves a relational database (e.g. SQL Server, Oracle, etc) that is powering the existing Analytics, and stores the end results (predictions, correlations, etc). In this scenario, we can connect ThingWorx directly to that database to read these results.  Finally, in the case of complex Analytics, where a tighter integration with ThingWorx is desired, existing Analytics / algorithms can be wrapped into a ThingWorx Extension or an Analytics Provider using the corresponding PTC SDKs.  When choosing an integration option, customers need to carefully balance complexity of integration, constraints of their architecture, Analytics modeling complexity, as well as end user consumption requirements.
View full tip
To help explain some of the different ways in which a prediction can be triggered from a Thingworx Analytics Model, I've built a mashup which allows you to easily trigger these types of prediction:   - API Realtime Prediction - Analytics Manager: Event - API Batch Prediction   For information on setting up this environment to use the mashup with some sample data, please see the attached instructions document: Prediction-Methods-Mashup.pdf. The referenced resource files can be found inside resources.zip   For more information on prediction scoring please see this related post: How to score new data with ThingWorx Analytics 8.3.x
View full tip
One of the interesting features of ThingWorx Analytics Manager is its ability to run distributed models created in Excel (and more of course).  Most people having been tasked with understanding data have built models in Excel and have sometimes built quite complex models (or even applications) with it.   The ability to tie these models to real data coming from various systems connected through ThingWorx and operationalise their execution is a really simple way for people to leverage their existing work and I.P. on a connected analytics journey.   To demonstrate this power and ease of implementation, I created a sample data set with historical data, traffic profile, and a simple anomaly detection model to execute with Analytics Manager.  (files are attached)   The online help center was quite helpful in explaining the process of Creating the Excel Workbook, however I got stuck at the XML mapping stage.  The Analytics and Excel documentation both neglect to mention one important detail -- you must be using the Windows version of Excel in order to get the XML Source functionality (and I use Mac).  Once using Windows, it was easy to do - here is a video of the XML mapping part of the process (for the inputs and results).   
View full tip
Getting Started on the ThingWorx Platform Learning Path   Learn hands-on how ThingWorx simplifies the end-to-end process of implementing IoT solutions.   NOTE: Complete the following guides in sequential order. The estimated time to complete this learning path is 210 minutes.   Get Started with ThingWorx for IoT   Part 1 Part 2 Part 3 Part 4 Part 5 Data Model Introduction Configure Permissions Part 1 Part 2 Build a Predictive Analytics Model  Part 1 Part 2
View full tip
Video Author:                     Christophe Morfin Original Post Date:            October 2, 2017 Applicable Releases:        ThingWorx Analytics 8.1   Description:​ In this video we will walk thru the installation steps of ThingWorx Analytics Server 8.1.  This covers the Native Linux installation though the steps will be similar for a docker installation on Windows or Linux.    
View full tip
Connecting Existing Things to ThingWorx Industrial Gateway for Anomaly Detection   In this Video you will learn how to :   - To bind a property of an existing entity to the KEPSserverEX Data Feed - To create an Alert on that property and monitor it's behavior   Updated Link for access to this video:  Connecting Existing Things to ThingWorx Industrial Gateway for Anomaly Detection
View full tip
In this video we cover: a short introduction of Thingworx Analytics Builder The import of the Thingworx Analytics Builder extension   This video applies to ThingWorx Analytics 52.1 till 8.1   Updated Link for access to this video:  Installing Thingworx Analytics Builder:  Part 1 of 3
View full tip
There have been a number of questions from customers and partners on when they should use different tools for calculation of descriptive analytics within ThingWorx applications. The platform includes two different approaches for the implementation of many common statistical calculations on data for a property: descriptive services and property transforms. Both of these tools are easy to implement and orchestrate as part of a ThingWorx application. However, these tools are targeted for handling different scenarios and also differ in utilization of compute resources. When choosing between these two approaches it is important to consider the specific use case being implemented along with how the implemented approach will fit into the overall design and architecture of the ThingWorx environment. This article will provide some guidance on scenarios to use each of these approaches in ThingWorx applications and things to consider with each approach.   Let's look at the two different approaches and some guidelines for when they should be used.   Descriptive services (click for more details) provide a set of ThingWorx services to analyze a data set and perform many common data transformations.  These services are targeted for performing calculations and transformations on recent operating history of a single property.  Descriptive services are called on demand to perform batch calculations. Scenarios to use descriptive services: On demand calculations performed within a mashup, a service call or an event to determine action and calculation results are not (always) stored Regular occurring calculations on logged property values or generated datasets (batch calculations) Calculations are done regularly in minutes, hours or days on a discrete set of data.  Examples: average value in last hour, median value in last day, or max value in last half hour.  Time between data creation and analysis is minutes or hours.  Some latency in the calculation result is acceptable for the use case. Input data set has 10s to 100s to 1000s of values.  Keep the size of the input data at 10,800 values or less.  If larger data sizes are required, then break them into micro batches if possible or use other tools to handle the processing. Multiple calculations need to be done from the same set of input data.  Examples: average value in last hour, max value in the last hour and standard deviation value in the last hour are all required to be calculated. Things to consider when using descriptive services Requires input dataset to be in the specific datashape format that is used by descriptive services.  If property values are logged in a value stream, there is a service to query the values and prepare the dataset for processing.  If scenarios where the data is not for a logged property, then another service or sql query can be used to prepare the dataset for processing. Requires javascript development work to implement.   This includes creation of a service to execute the descriptive services and usage of subscriptions and events to orchestrate calculations. An example of the javascript to execute descriptive services is available in the help center (here) Typically retrieval of the input data from value stream (QueryTimedValuesForProperty) is slowest part of the process. The input data is sent to an out of process platform analytics service for all calculations. Broader set of calculation services available (see table at the end of this article) Remember that these services are not meant to be used for big data calculations or big data preparation.  Look for other approaches if the input data sets grow larger than 10,800 values Property Transforms (click for more details) provide a set of transformation services for streaming data as it enters ThingWorx.   Property transforms are targeted for performing continuous calculations on recent values in the stream of a single property and delivering results in (near) real-time.  Since property transforms are continuous calculations, they are always running and using compute resources. Before implementing property transforms review the information in the property transform sizing guide to better understand factors that impact the scaling of property transforms. Scenarios to use: Continuous calculations on a stream for a single property as new data comes into ThingWorx New values enter the stream faster than one value per minute (as a general guideline) Calculations required to be done in seconds or minutes.  Examples: average electrical current in last 10 seconds, median pressure in the last 10 readings,  or max torque in last minute Time between data creation and analysis is small (in seconds).  Results of property transform is required for rapid decisions and action so reducing latency is critical Data sets used for calculation are small and contain 10s to 100s of values.  Calculated results are stored in a new property in the ThingModel Things to consider when using property transforms Codeless process to create new property transforms on a single property in the ThingModel Does not require input property values to be logged as calculations are performed on streaming data as it enters ThingWorx Unlike descriptive services which only execute when called, each property transform creates a continuously running job that will always be using compute resources.  Resource allocations for property transforms must be included in the overall system architecture.  Before selecting the property transform approach, refer to the Property Transform Sizing Guide for more information about how different parameters affect the performance of Property Transforms and results of performance load test scenarios. Let’s apply these guidelines to a few different use cases to determine which approach to select. 1. Mashup application that allows users to calculate and view median temperature over a selected time window In this scenario, the calculation will be executed on-demand with a user defined time window. Descriptive services are the only option here since there is not a pre-defined schedule and the user can select which data to use for the calculation.   2. Calculate the max torque (readings arriving one per second) on a press over each minute without storing all of the individual readings. In this scenario, the calculation will be executed without storing the individual readings coming from the machine. The transformation is made to the data on its way into ThingWorx and continuously calculating based on new values. Property transforms are the only option here since the individual values are not being stored.   3. Calculation of average pressure value (readings arriving one per second) over a five minute window to monitor conditions and raise an alert when the median value is more than two standard deviations from expected. In this scenario, both descriptive services and property transforms can perform the calculation required. The calculation is going to occur every 5 minutes and each data set will have about 300 values. The selection of batch (descriptive services) or streaming (property transforms) will likely be determined by the usage of the result. In this case, the calculation result will be used to raise an alert for a specific five minute window which likely will require immediate action. Since the alert needs to be raised as soon as possible, property transforms are the best option (although descriptive services will handle this case also with less compute resource requirements).   4, Calculation of median temperature (readings each 20 seconds) over 48 hour period to use as input to predict error conditions in the next week. In this scenario, the calculation will be performed relatively infrequently (once every 48 hours) over a larger data set (about 8,640 values). Descriptive services are the best option in this case due to the data size and calculation frequency. If property transforms were used, then compute resources would be tied up holding all of the incoming values in memory for an extended period before performing a calculation. With descriptive services, the calculation will only consume resource when needed, or once every 48 hours.   Hopefully this information above provides some more insight and guidelines to help choose between property transforms and descriptive services. The table below provides some additional comparisons between the two approaches.     Descriptive Services Property Transforms Purpose Provide a set of ThingWorx services to analyze a data set and perform many common data transformations. Provide a set of prescribed transformation services for streaming data as it enters ThingWorx. Processing Mode Batch Streaming / Continuous Delivery API / Service Composer interface API / Service Input Data Discrete data set Must be logged Single property Configurable by time or lookback Rolling data set on property X Persistence is optional Single property Configurable by time or lookback Output Data Return object handled programmatically Single output for discrete data set New property f_X in the input model Continuous output at configurable frequency Output time aligned with input data Available Services Statistics (min, max, mean, median, mode, std deviation) SPC calculations (# continuous data points: above threshold, in / out of range, increasing / decreasing, alternating) Data distribution: count by bins (histogram) Five numbers (min, lower quartile, median, upper quartile, max) Confidence interval Sampling frequency Frequency transform (FFT) Statistics (min, max, mean, median, mode, std deviation) SPC calculations (# continuous data points: above threshold, in / out of range, increasing / decreasing, alternating)
View full tip
Users of ThingWorx Analytics (TWA) may choose to create a predictive model using TWA or import a predictive model that was created using other software. When importing into or exporting out of TWA, this predictive model must be in a PMML (Predictive Model Markup Language) version 4.3+ format. This post describes how to complete the import and export processes. Exporting: The user may create a model in two main ways inside of TWA: using the Builder user interface, or by using ‘Create Job’ service that exists the Training Thing. Whichever method is used, a model Job Id is created automatically by TWA for that model. It is this model Job Id that is used to identify the model inside of TWA, regardless of what is being done with that model.   If a model is trained using Builder, the user may highlight that model, click ‘Job Details’, and then copy the Job ID. This is done as follows:   Next, the user will navigate to Browse --> Things --> …TrainingThing. This is the Training Microservice inside of TWA where all the functionality involved with training a model exists. Within the …TrainingThing, the user will execute the ‘RetrieveModel’ service under Services. When executing the service, the user will paste the model Job ID (ex. 49704f1a-7fcd-4e38-ab53-84ef46517d0a) they copied earlier, and press ‘Execute’. The resulting text can then be highlighted and copied to Notepad or some other text editor, and saved as .pmml format (ex. ‘ModelExport.pmml’).   Importing Through Results Microservice: To import a model that has been saved in PMML 4.3+ format into TWA using the Results Microservice, the user will navigate to Manage --> Repositories (ex. AnalyticsUploadStorage) --> Actions --> Upload, and choose the PMML file. The user will then navigate to Browse --> Things --> …ResultsThing. This is the Results Microservice inside of TWA where all the functionality exists related to previously trained models. Within the …ResultsThing, the user will execute the ‘UploadModel’ service under Services. Alternatively, the user can upload the model from any repository using ‘UploadModelFromRepository” service.   To create a model from the uploaded PMML inside of TWA, the user will fill out the filePath and name then execute the service. Note: This model will not show up in Builder, as that would require model validation information that is not part of the imported PMML file.   The resulting Job Id can be used to make predictions, such as by using the …PredictionThing’s BatchScore or RealtimeScore services. At this point, the uploaded model acts the same way as if the model were created inside of that TWA environment.       Importing Through Analytics Manager: To import a model that has been saved in PMML 4.3+ format into TWA using the Analytics Manager, the user will navigate to Analytics --> Analytics Manager --> Analysis Models, and click the green “New” button. Next the user will choose the provider name (or create a new one by navigating to Analytics --> Analytics Manager --> Analysis Providers). The user will also check the box to “Upload Model”, and click the grey “Choose File” button to find the PMML file. Finally, the user will click the black “Upload” button, then the green “Save” button.     At this point, the model is uploaded into ThingWorx Analytics, and the user may progress through the subsequent steps to set up “Analysis Events” and “Analysis Jobs” that will be powered by the imported model.
View full tip
Thingworx Analytics is offered through the User interface called Analytics Builder with some pre-configured functionality. However, should you want to create your own jobs and mashups, all features from Analytics Builder and some more are available through the Thingworx Services.  Running most functionality requires that you provide some data to run the Analytics Services. This is where the datasetRef parameter is required.        Data uploaded through Analytics Builder Any dataset uploaded through builder will require have a datasetUri as shown in the image above and format will be parquet (all small letters) datasetUri can be obtained from the list of datasets in builder Passing data as an in-body Dataset If data isn't uploaded through Analytics Builder, data can be supplied as an Infotable in the data parameter of the datasetRef. Metadata will also need to be supplied if a new dataset is being created (create Job of the AnalyticsServer_DataThing) If this data is being supplied for a scoring job, as long as the column names match up to what the model is expecting, TWX Analytics will inference them appropriately. The filter parameter is for parquet datasets already uploaded into TWXA and will take an ANSI SQL statement format to add conditions to reduce number of rows. Exclusions is an single column infotable list of the columns you wish to remove from the job you are trying to submit Example: If you want Profiles to only run on 5 out of 10 columns, you would give a list of 5 columns that you don't want to include in this exclusions infotable. Data may also be supplied as a csv file in the file repo in some cases, in which case you would give the dataseturi parameter the location of the file on the TWX File repo (of the format thingworx://UseCaseFileRepo/tempdata.csv) and the format which would be csv
View full tip
Announcements