IoT Tips

Use Case: You’ve published a model from Analytics Builder to Analytics Manager, and then used service CreateOrUpdateThingTemplateForModel on resource TW.AnalysisServices.ModelManagementServicesAPI. A thing created from the resulting template will have an infotable called “data” which needs to be populated in order to trigger an Analysis Event & Job. For example you might have been following the online documentation for Analytics Manager > Working with Thing Predictor > Demo: Using Thing Predictor, link here. This script makes it easy to create a line of test data into field "data" on your thing to trigger the analysis event & job. Also fields causalTechnique, goalName and importantFieldCount are set programmatically, these are needed for the analysis event & job. Also this script might be useful as a general example of how to write to an infotable property on a thing. The JavaScript code is shown here and also attached as a text file to this post: me.causalTechnique = 'FULL_RANGE' me.goalName = 'predict_Compressor_failure' me.importantFieldCount = 3 // ThingPredictor.test_3f1a6a31-e388-4232-9e47-284572658a4a.InputParamsdataDataShape entry object //var newEntry = new Object(); var params = { infoTableName : "InfoTable", dataShapeName : "ThingPredictor.test-integer_afebaef3-b2cf-4347-824c-a39c11ddbb4a.InputParamsdataDataShape" }; // CreateInfoTableFromDataShape(infoTableName:STRING("InfoTable"), dataShapeName:STRING):INFOTABLE(ThingPredictor.test_3f1a6a31-e388-4232-9e47-284572658a4a.InputParamsdataDataShape) var myInfoTable = Resources["InfoTableFunctions"].CreateInfoTableFromDataShape(params); // 2 - CREATE INFOTABLE ROW USING object var newEntry = new Object(); newEntry._Pressure = 10.5; // NUMBER newEntry._Temperature = 45.1; // NUMBER newEntry._VibrationX = 81; // NUMBER newEntry._VibrationY = 65; // NUMBER //newEntry.key = 4; // STRING - isPrimaryKey = true // 3 - ADD INFOTABLE ROW USING TO INFOTABLE myInfoTable.AddRow(newEntry); // 3 – PERSIST INFOTABLE TO THE THING PROPERTY ‘data’ me.data = myInfoTable;

Jan 12, 2018

The Metadata for the 8.1 release has been updated from that of the previous releases as part of the architecture update that is new in 8.1. The goal of this blog post is to inform you of these changes to the metadata, and where you can find more information on both the new metadata format, as well as all the other changes that are new for the 8.1 release. New Structure The image below provides an excerpt of what the metadata file itself will look like in practice. The table below provides a definition for what each field in the metadata is and how it should be populated in your metadata file. Parameter Description Required/Optional fieldName The exact name of the field as it appears in the data file. Required values A list of the acceptable values for the field. Note: For Ordinal opTypes, the values must be presented in the correct order. Required if the opType is Ordinal Optional for Categorical opType Do not use for Boolean and Continuous range For a Continuous field, defines the minimum and maximum values the field can accept. For informational purposes only. Optional dataType Describes what type of data the field contains. Options include: Long, Integer, Short, Byte, Double, Boolean, String, Other. Note: Select the most accurate dataType. Selecting the String dataType for numeric data can lead to undesirable results. Required opType Describes how the data in the field can be used. Options include: Categorical, Boolean, Ordinal, Continuous, Informational, Temporal, Entity_ID Required timeSamplingInterval An integer representing the time between observations in a temporal field. Required if the opType is Temporal Do not use for other opTypes isStatic A flag indicating whether or not the value in a temporal field can change over time. Marking a field as static reduces training time by removing redundant data points for fields that do not change. Optional Things to Remember Remember that the Metadata file that you create will need to match the data file that you have; furthermore, all of the columns that you have in your dataset will need to be represented in the metadata file. The metadata file needs to be a JSON file. Setting the opType parameter incorrectly can have a severe impact on system performance. For example, setting a numerical field that has thousands of different values as categorical instead of continuous will cause the system to handle each value as an independent category, instead of just a number, which will result in significantly longer processing time. Additional References For more information on all the other changes that are new in the 8.1 release please follow this link for the complete reference document. Feel free to use the blank example metadata file attached to this post to help you get started on your own.

Dec 29, 2017

Put together a quick example mashup in support of a couple of analytics projects to demonstrate use of the new TWX Analytics 8.1 APIs within TWX mashup builder. The intention here is for use in POCs to provide a quick way of demonstrating customer-facing analytics outputs along with the more detailed view available in Analytics Builder. Required pre-requisites are: ThingWorx 8.1 + Analytics Extensions ThingWorx Analytics Server 8.1 Carousel TWX UI widget (attached) imported into TWX Data set(s) loaded with signals / profiles generated. The demo can be installed by importing the attached entities file into TWX composer then launching the mashup 'EMEA.Analytics.CustomerInsightMashUp'. A quick run through of the functionality ... On launching the mashup, data sets and models are displayed for selection on the left hand-side. On selecting dataset and model, signals are presented in two tabs - first an overview of all signals. The list on the left can be expanded by changing the value for 'Top <n> Contributing Features'. On selecting a signal from the list, the 'Selected Signal Details' tab displays additional charting for value ranges, average goal etc. The number of 'bins' to display can be edited. Similarly, profiles can be viewed from the 'Profiles' tab - each profile can be selected by dragging the upper carousel. This is all done using the Analytics 8.1 "Things" in TWX along with an additional custom Thing with some scripted services (EMEA.Analytics.Helper). Thanks to Arian Van Huelsen & Tanveer Saifee at PTC for their support; all comments / feedback welcome.

Dec 8, 2017

Preface In this blog post, we will discuss how to Start and Stop ThingWorx Analytics, as well as some other useful triaging/troubleshooting commands. This applies to all flavors of the native Linux installation of the Application. In order to perform these steps, you will have to have sudo or ROOT access on the host machine; as you will have to execute a shell script and be able to view the outputs. The example screenshots below were taken on a virtual CentOS 7 Server with a GUI as ROOT user. Checking ThingWorx Analytics Server Application Status 1. Change directory to the installation destination of the ThingWorx Analytics (TWA) Application. In the screenshot below, the application is installed to the /opt/ThingWorxAnalyticsServer directory 2. In the install directory, there are a series of folders and files. You can use the ls command to see a list of files and folders in the installation directory. a. You will need to go navigate one more level down into the ./ThingWorxAnalyticsServer/bin directory by using command cd ./bin b. As you can see above, we used the pwd command to verify that we are in the correct directory. 3. In the ./ThingWorxAnalyticsServer/bin directory, there should be three shell files: configure-apirouter.sh, configure-user.sh, and twas.sh a. To run a status check of the application, use the command ./twas.sh status i. This will provide a list of outputs, and a few warning messages. This is normal, see screenshot below: b. You will have a series of services, which will have a green active (running) or red not active (stopped). i. List of services: twas-results-ms.service - ThingWorx Analytics - Results Microservice twas-data-ms.service - ThingWorx Analytics - Data Microservice twas-analytics-ms.service - ThingWorx Analytics - Analytics Microservice twas-profiling-ms.service - ThingWorx Analytics - Profiling Microservice twas-clustering-ms.service - ThingWorx Analytics - Clustering Microservice twas-prediction-ms.service - ThingWorx Analytics - PredictionMicroservice twas-training-ms.service - ThingWorx Analytics - Training Microservice twas-validation-ms.service - ThingWorx Analytics - Validation Microservice twas-apirouter.service - ThingWorx Analytics - API Router twas-edge-ms.service - ThingWorx Analytics - Edge Microservice Starting and Stopping ThingWorx Analytics If you encounter any errors or stopped services in the above, a good solution would be to restart the TWA Server application. There are two methods to restart the application, one being the restart command, the other would be using the stop and start commands. Method 1 - Restart Command: 1. In the same ./ThingWorxAnalyticsServer/bin directory, run the following command: ./twas.sh restart a. The output of a successful restart will look like the following: 2. The restart should only take a few seconds to complete Method 2 - Stop / Start Commands: 1. In the same ./ThingWorxAnalyticsServer/bin directory, run the following command: ./twas.sh stop 2. After the application stops, run the following command: ./twas.sh start Note: You can confirm the status of the TWA Server application by following the steps in the "Checking ThingWorx Analytics Server Application Status" section above.

Dec 5, 2017

Mapping previous versions of ThingWorx Analytics API to ThingWorx Analytics 8.1 Services Since ThingWorx Analytics 8.1, the classic server monolith has been replaced by a series of independent microservices. This new structure groups services around specific elements of functionality (data, training, results). Thus the use of the previous API commands to access ThingWorx Analytics functions has been replaced by the use of ThingWorx Services. Those Services exist within specific Microservice Things accessible in the ThingWorx Platform 8.1. The table below shows a mapping of the most common previous API commands from version 8.0 and previous versions to the version 8.1 related services. The table below does not contain an exhaustive listing either of API commands nor of Services. The API commands used below are samples which might require further information like headers and Body once used. These are used in the table below for reference purposes. Previous API Command Purpose Sample Syntax TWA 8.1 Service Analytics Thing related to Service Service description 1 Version Info GET: http://<IP Address>:8080/1.0/about/versioninfo VersionInfo This service is available in each Mircorservice Thing inheriting from Analytics Server Returns the internal version number for a specific microservice. The first two digits = ThingWorx Core version. The next three digits = version of the microservice. 2 Registering new Dataset POST: http://<IP Address>:8080/1.0/datasets/ CreateDataset Data Microservice Creates the dataset uploads the data along with its metadata and optimizes it automatically. 3 Checking Dataset Status GET: http://<IP Address>:8080/1.0/datasets/<DataSet Name> ListCreatedDatasets Data Microservice This old functionality is replaced by a Service that lists all the created Datasets 4 Creating Metadata POST: http://<IP Address>:8080/1.0/datasets/<DataSet Name>/configuration CreateDataset Data Microservice (Check line 2 for further information) 5 Checking Dataset Configuration GET: http://<IP Address>:8080/1.0/datasets/<DataSet Name>/configuration GetDatasetSchema Data Microservice Retrieves the metadata from a dataset. 6 Loading Dataset CSV POST: http://<IP Address>:8080/1.0/datasets/<DataSet Name>/data CreateDataset Data Microservice (Check line 2 for further information) 7 Checking Job Status GET: http://<IP Address>:8080/1.0/status/<Job ID> GetJobStatus Available in all created Microservices inheriting from AnalyticsJob Server Retrieves the status of a specific job 8 Signals Job POST: http://<IP Address>:8080/1.0/datasets/<DataSet Name>/signals CreateJob Signals Microservice Create a job to identify signals 9 Signal Results Job GET: http://<IP Address>:8080/1.0/datasets/<DataSet Name>/signals/<Job ID>/results RetrieveResult Signals Microservice Retrieve a result of a Signals job 10 Profile Job POST: http://<IP Address>:8080/1.0/datasets/<DataSet Name>/profiles CreateJob Profiling Microservice Creates a job to generate profiles. 11 Profile Result Job GET: http://<IP Address>:8080/1.0/datasets/<DataSet Name>/profiles/<Job ID>/results RetrieveResult Profiling Micorservice Retrieve the results of a profiles job. 12 Train Model Job POST: http://<IP Address>:8080/1.0/datasets/<DataSet Name>/prediction CreateJob Training Micorservice Create a prediction model job. 13 Train Model Result Job GET: http://<IP Address>:8080/1.0/datasets/<DataSet Name>/prediction/<Job ID>/results RetrieveModel Training Microservice Only retrieves the PMML model. But if a holdout for validation was specified in the CreateJob, a validation job is auto-created and runs. 14 Scoring Job POST: http://<IP Address>:8080/1.0/datasets/<DataSet Name>/predictive_scores BatchScore Prediction Microservice Submit Predictive Scoring Job 15 Scoring Job Result GET: http://<IP Address>:8080/1.0/datasets/<DataSet Name>/predictive_scores/<Job ID>/results RetrieveResult Prediction Microservice Retrieve results from prediction scoring jobs

Dec 4, 2017

Nov 2, 2017

The following Expert Session videos are now available for viewing within the ThingWorx Community: ThingWorx Analytics Installation - This Expert Session will walk you through the complete installation of ThingWorx Analytics from the Prerequisites to Confirming the Installation is successful and all steps in between. The first half of the video gives a breakdown of the components and the process of the installation with the second half being an actual Demo of the Installation. ThingWorx Analytics API Overview - This Expert Session is designed to help beginners get up and running with ThingWorx Analytics. It covers basic concepts like: What are APIs, how to configure the metadata file, and a live Demo that shows you how to interact and use ThingWorx Analytics in real time. This Expert Session would also be useful for experienced users who need a refresher course. Decision Tree, ThingWorx Analytics Builder - This Expert Session reviews the concept of “Decision Trees” and the functionality that is available in ThingWorx Analytics Builder. First, you will learn how to create and upload a dataset in ThingWorx Analytics Builder. After that, it shows you how to train a model and score on the model that was just generated. It then goes into detail on how the prediction learner "Decision Tree" operates and classifies inputs. Use Case Identification - This Expert Session goes over ways to identify and develop a successful use case for ThingWorx Analytics. The example use case presented here is on employee retention in a fictional company with the goal of maximizing employee retention . This presentation will provide you with all the fundamentals you need to develop your own ThingWorx Analytics use cases from the ground up. ThingWorx Analytics Signals - This Expert Session will provide you with an in depth explanation behind how Signals are calculated in ThingWorx Analytics, what purpose they serve, and why we use them. Some basic mathematical concepts are discussed so viewers will have a better idea of how ThingWorx Analytics operates behind the scenes. Related Links For more information, you can visit a new space dedicated to these helpful technical videos. Additional Expert Sessions will be highlighted here in the ThingWorx Community every few weeks. Visit the Online Success Guide to access additional information about ThingWorx training and services.

Oct 16, 2017

In the last while I've seen a few things which got me thinking about how value is created or unlocked from connected data. Multiple components are required to create, send, store and manage the data created by edge devices, doing these things enables value to be unlocked, but what does it take to unlock the value? It was this article which first got me interested in considering this question. In particular it was a section near the bottom of the article where the author describes a number of creative business use cases for car manufacturers which could be enabled by connected data. If this author could come up with several creative and potentially valuable use case examples for one industry I started to wonder what other sorts of use cases could exist in other industries? Could there be a series of use of use cases which a little variation be applied to different industries? The second article which further sparked my interest in where the value originates from, is this one on using a cryptocurrency with IOT. While the idea of using a blockchain like technology with IOT is intriguing, it was the second image in the article (below) which resonated with me on value. This image is a graphical representation of the connection between the key components of a connected system and it makes it clear that each component has a critical role to play and the whole system and missing anyone of the parts and the system doesn't function. This image make it clear that it's the "Analyze" phase which drives the action to do something, and it's taking an action which is the reason the systems reason for existing. Which brings me to the third and final article describing Industry 4.0. Like the other two articles, it wasn't the main point of the article I found most interesting, rather it was the image below, and in particular the side bar 'Value Creation through' which brought me back to the question of where value comes from. The idea that in a manufacturing setting, value can be created through product or process innovations as well as through new business models is intriguing. I think a fourth idea missing from this list, is one were network effects from getting more and more proprietary data creating a compounding effect, like with Facebook or LinkedIn. If there are at least four modes of value creation, maybe there others? While these articles caused me to ask some questions, none of them really answered the question of where the value is unlocked. To answer the question I decided to restate the question to be "how is value unlocked from data" making the assumption the value is derived from the data. This question is a little easier to address. The best visual representation of the answer I've seen is the data value road map (below) from the Creating a Data-Driven Organization book which was released a couple of years ago. While I think the author is probably missing at least two boxes above 'optimization' ("new business models" and "data driven network effects") I think the graphic does a good job communicating that as the value created from data increases, the complexity of the analytic task also increases; suggesting the value is unlocked by the analytics. For me, the value from a system of connected devices is unlocked from the "analysis" phase as seen in the first image. But in order to perform the "analysis" I think requires two things. First asking the right high value questions of the data (product managers/beginning with the end in mind/use cases) and then using the right set of technologies to address those questions which in many instances means Artificial Intelligence of some sort. Interestingly although artificial intelligence is required for many high value use cases, both parts of the analysis require distinctly human skills (the right use cases & controlling the technology) to create externalized intelligence and generate value. Creating a Data-Driven Organization: Practical Advice from the Trenches 1, Carl Anderson, eBook - Amazon.com

Oct 11, 2017

Starting with the 8.1 release, the architecture of ThingWorx Analytics has changed from being a single sever to being split into several independent microservices. This has been done to allow services to run concurrently. It also prevents issues with one microservice from affecting the others. Overview The new Analytics Server Architecture consists of a suite of 9 microservices: Data Clustering Profiling Signals Training Prediction Validation Presciptive Results All of the microservices work together to create a similar experience for users as it was in the past. The data that is uploaded and generated by the Analytics Server is stored directly in a file system, instead of a Postgres Database like it was in the past. Closer Integration with ThingWorx Please note that ThingWorx Foundation is required to be installed and operating before Installing Analytics. During the install you will be asked to supply IP Address of the ThingWorx Instance that will be used for Analytics. At this step, the AnalyticsServerThing is configured which allows the user to interact with Analytics Server through ThingWorx. All of the configured microservices are represented as Things under the AnalyticsServerThing. This is because ThingWorx Analytics has become a native part of ThingWorx Foundation functionality and is dependent on ThingWorx for user interaction. Because of these changes, there is no longer a direct ThingWorx Analytics Server REST API. Support for accessing the services via REST calls is now provided through the ThingWorx Core REST API layer. Because of this, a new URI pattern is required moving forward. One other update from the older versions is that the requirement to use application keys and Application IDs are no longer necessary. This should come as a welcome relief as the Application keys and IDs were the source of issues for users who may have misplaced them etc. Less Data-Centric In the old versions, jobs, models, signals, etc. were all tied to the dataset. So there was no way to a model from one dataset to the other. With the new architecture, this is no longer the case you are able to move a model from one dataset to the other seamlessly. Please note that when moving a model from one dataset to the other, it must have the same metadata between each of the datasets. This is because a model created to increase efficiency in a factory would provide no insight on a dataset that monitors the soil moisture in a corn field. Updates to Metadata Although going over the exact changes to the Metadata is out of scope for this post, it is worth mentioning. For more details on the changes, please follow this link. Summary In conclusion, the new architecture of ThingWorx Analytics was done to increase scalability and to produce a more robust system. The new release is much more integrated into the ThingWorx Platform to increase the ease of use from the previous releases. It is much less data-centric than it was in the past and geared more to the solutions themselves.

Oct 10, 2017

Oct 6, 2017

Parquet Data Format used in ThingWorx Analytics Starting ThingWorx Analytics Version 8.1 Data storage will no longer require the installation of a PostgreSQL database. Instead, uploaded CSV data is converted to the optimized Apache Parquet format and stored directly in the file system. This Blog explains some the features of Apache Parquet justifying this transition in ThingWorx Analytics Data Storage. features What is Apache Parquet: Apache Parquet is a column-oriented data store of the Apache Hadoop ecosystem. It is compatible with most of the data processing frameworks in the Hadoop environment. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. Below is an illustration of the Columnar Storage model: Apache Parquet Features and Benefits: Apache Parquet is implemented using the record shredding and assembly algorithm taking into account the complex data structures that can be used to store the data. Apache Parquet stores data where the values in each column are physically stored in contiguous memory locations. Due to the columnar storage, Apache Parquet provides the following benefits: Column-wise compression is efficient and saves storage space Compression techniques specific to a type can be applied as the column values tend to be of the same type Queries that fetch specific column values need not read the entire row data thus improving performance Different encoding techniques can be applied to different columns Some advantages of using Parquet for ThingWorx Analytics: Apart from the above benefits of using Parquet which amount to higher efficiency and increased performance, below are some advantages that apply specifically to ThingWorx Analytics This change in ThingWorx Analytics from using a Database to using Parquet removes the limitations on the number of data columns the system can handle. It also allows for streamlining the dataset creation process. Since the data is converted to a Parquet format, there is no need to separately optimize the dataset. Even when new data is appended to an existing dataset, a new partition is added and re-optimization is optional but not required. Data could be appended easily so there is no longer a need to re-load the full Dataset when new Data values are added The illustration below shows the transition from Row-based Data Storage model VS the columnar based Storage of Parquet

Oct 2, 2017

First we need to Understand below terms: Quantitative Variable: A quantitative variable is naturally measured as a number for which meaningful arithmetic operations make sense. Examples: Height, age, crop yield, GPA, salary, temperature, area, air pollution index (measured in parts per million), etc. Categorical variable: Any variable that is not quantitative is categorical. Categorical variables take a value that is one of several possible categories. As naturally measured, categorical variables have no numerical meaning. Examples: Hair color, gender, field of study, college attended, political affiliation, status of disease infection. Ordinal Variables: An ordinal variable is a categorical variable for which the possible values are ordered. Ordinal variables can be considered “in between” categorical and quantitative variables. Example: Educational level might be categorized as 1: Elementary school education 2: High school graduate 3: Some college 4: College graduate 5: Graduate degree • In this example (and for many ordinal variables), the quantitative differences between the categories are uneven, even though the differences between the labels are the same. (e.g., the difference between 1 and 2 is four years, whereas the difference between 2 and 3 could be anything from part of a year to several years) • Thus it does not make sense to take a mean of the values. • Common mistake: Treating ordinal variables like quantitative variables without thinking about whether this is appropriate in the particular situation at hand. Ordinal regression: In statistics, ordinal regression (also called "ordinal classification") is a type of regression analysis used for predicting an ordinal variable. The Ordinal Regression procedure allows you to build models, generate predictions, and evaluate the importance of various predictor variables in cases where the dependent (target) variable is ordinal in nature. Ordinal dependents and linear regression: When you are trying to predict ordinal responses, the usual linear regression models don't work very well. Those methods can work only by assuming that the outcome (dependent) variable is measured on an interval scale. Because this is not true for ordinal outcome variables, the simplifying assumptions on which linear regression relies are not satisfied, and thus the regression model may not accurately reflect the relationships in the data. In particular, linear regression is sensitive to the way you define categories of the target variable. With an ordinal variable, the important thing is the ordering of categories. So, if you collapse two adjacent categories into one larger category, you are making only a small change, and models built using the old and new categorizations should be very similar. Unfortunately, because linear regression is sensitive to the categorization used, a model built before merging categories could be quite different from one built after. Below are some examples pf ordered logistic regression: Example 1: A marketing research firm wants to investigate what factors influence the size of soda (small, medium, large or extra large) that people order at a fast-food chain. These factors may include what type of sandwich is ordered (burger or chicken), whether or not fries are also ordered, and age of the consumer. While the outcome variable, size of soda, is obviously ordered, the difference between the various sizes is not consistent. The difference between small and medium is 10 ounces, between medium and large 8, and between large and extra large 12. Example 2: A researcher is interested in what factors influence modaling in Olympic swimming. Relevant predictors include at training hours, diet, age, and popularity of swimming in the athlete’s home country. The researcher believes that the distance between gold and silver is larger than the distance between silver and bronze. Example 3: A study looks at factors that influence the decision of whether to apply to graduate school. College juniors are asked if they are unlikely, somewhat likely, or very likely to apply to graduate school. Hence, our outcome variable has three categories. Data on parental educational status, whether the undergraduate institution is public or private, and current GPA is also collected. The researchers have reason to believe that the “distances” between these three points are not equal. For example, the “distance” between “unlikely” and “somewhat likely” may be shorter than the distance between “somewhat likely” and “very likely”. How to use and get result by Ordinal Regression: Clink this link for PDF PDF source: http://www.norusis.com

Oct 2, 2017

The accuracy of a predictive model can be boosted in two ways: Either by embracing Feature engineering or by applying boosting algorithms straight away. There are multiple boosting algorithms like Gradient Boosting, XGBoost, AdaBoost, Gentle Boost etc. Every algorithm has its own underlying mathematics and a slight variation is observed while applying them. While working with boosting algorithms, we have come across two frequently occurring buzzwords: Bagging and Boosting. Bagging: It is an approach where you take random samples of data, build learning algorithms and take simple means to find bagging probabilities. Boosting: Boosting is similar, however the selection of sample is made more intelligently. We subsequently give more and more weight to hard to classify observations. Below are Default Algorithms used in Predictive Models generated in ThingWorx Analytics: Decision Tree Gradient Boost Linear regression Neural Net Random Forrest Logistic Regression Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differential loss function. Let’s begin with an easy example: Assume, you are given a previous model M to improve on. Currently you observe that the model has an accuracy of 80% (any metric). How do you go further about it? One simple way is to build an entirely different model using new set of input variables and trying better ensemble learners. On the contrary, we have a much simpler way to suggest. It goes like this: Y = M(x) + error What if we are able to see that error is not a white noise but have same correlation with outcome(Y) value. What if we can develop a model on this error term? Like:error = G(x) + error2 Probably, we will see error rate will improve to a higher number, say 84%. Let’s take another step and regress against error2: error2 = H(x) + error3 Now we combine all these together: Y = M(x) + G(x) + H(x) + error3 This probably will have a accuracy of even more than 84%. What if we can find an optimal weights for each of the three learners: Y = alpha * M(x) + beta * G(x) + gamma * H(x) + error4 How Gradient Boosting Works: 1. Loss Function: The loss function used depends on the type of problem being solved. It must be differential, but many standard loss functions are supported and you can define your own. A benefit of the gradient boosting framework is that a new boosting algorithm does not have to be derived for each loss function that may want to be used, instead, it is a generic enough framework that any differential loss function can be used. 2. Weak Learner: Decision trees are used as the weak learner in gradient boosting. Specifically regression trees are used that output real values for splits and whose output can be added together, allowing subsequent models outputs to be added and “correct” the residuals in the predictions. Trees are constructed in a greedy manner, choosing the best split points based on purity scores like Gini or to minimize the loss. 3. Additive Model: Trees are added one at a time, and existing trees in the model are not changed. A gradient descent procedure is used to minimize the loss when adding trees. we have weak learner sub-models or more specifically decision trees. After calculating the loss, to perform the gradient descent procedure, we must add a tree to the model that reduces the loss. Improvements to Basic Gradient Boosting: 1. Tree Constraints: It is important that the weak learners have skill but remain weak. Below are some constraints that can be imposed on the construction of decision trees: Number of trees: Generally adding more trees to the model can be very slow to over fit. The advice is to keep adding trees until no further improvement is observed. Tree depth: Deeper trees are more complex trees and shorter trees are preferred. Generally, better results are seen with 4-8 levels. Number of nodes or number of leaves: like depth, this can constrain the size of the tree, but is not constrained to a symmetrical structure if other constraints are used. Number of observations per split: Imposes a minimum constraint on the amount of training data at a training node before a split can be considered Minimum improvement to loss: Is a constraint on the improvement of any split added to a tree. 2. Weighted Updates: The contribution of each tree to this sum can be weighted to slow down the learning by the algorithm. This weighting is called a shrinkage or a learning rate. "Each update is simply scaled by the value of the “learning rate parameter v". 3. Stochastic Gradient Boosting: At each iteration a sub sample of the training data is drawn at random (without replacement) from the full training data set. The randomly selected sub sample is then used, instead of the full sample, to fit the base learner. 4. Penalized Gradient Boosting: The additional regularization term helps to smooth the final learnt weights to avoid over-fitting. Intuitively, the regularized objective will tend to select a model employing simple and predictive functions.

Sep 28, 2017

Sep 26, 2017

There are four types of Analytics: Prescriptive analytics: What should I do about it? Prescriptive analytics is about using data and analytics to improve decisions and therefore the effectiveness of actions.Prescriptive analytics is related to both Descriptive and Predictive analytics. While Descriptive analytics aims to provide insight into what has happened and Predictive analytics helps model and forecast what might happen, Prescriptive analytics seeks to determine the best solution or outcome among various choices, given the known parameters. “Any combination of analytics, math, experiments, simulation, and/or artificial intelligence used to improve the effectiveness of decisions made by humans or by decision logic embedded in applications.”These analytics go beyond descriptive and predictive analytics by recommending one or more possible courses of action. Essentially they predict multiple futures and allow companies to assess a number of possible outcomes based upon their actions. Prescriptive analytics use a combination of techniques and tools such as business rules, algorithms, machine learning and computational modelling procedures. Prescriptive analytics can also suggest decision options for how to take advantage of a future opportunity or mitigate a future risk, and illustrate the implications of each decision option. In practice, prescriptive analytics can continually and automatically process new data to improve the accuracy of predictions and provide better decision options. Prescriptive analytics can be used in two ways: Inform decision logic with analytics: Decision logic needs data as an input to make the decision. The veracity and timeliness of data will insure that the decision logic will operate as expected. It doesn’t matter if the decision logic is that of a person or embedded in an application — in both cases, prescriptive analytics provides the input to the process. Prescriptive analytics can be as simple as aggregate analytics about how much a customer spent on products last month or as sophisticated as a predictive model that predicts the next best offer to a customer. The decision logic may even include an optimization model to determine how much, if any, discount to offer to the customer. Evolve decision logic: Decision logic must evolve to improve or maintain its effectiveness. In some cases, decision logic itself may be flawed or degrade over time. Measuring and analyzing the effectiveness or ineffectiveness of enterprises decisions allows developers to refine or redo decision logic to make it even better. It can be as simple as marketing managers reviewing email conversion rates and adjusting the decision logic to target an additional audience. Alternatively, it can be as sophisticated as embedding a machine learning model in the decision logic for an email marketing campaign to automatically adjust what content is sent to target audiences. Different technologies of Prescriptive analytics to create action: Search and knowledge discovery: Information leads to insights, and insights lead to knowledge. That knowledge enables employees to become smarter about the decisions they make for the benefit of the enterprise. But developers can embed search technology in decision logic to find knowledge used to make decisions in large pools of unstructured big data. Simulation: Simulation imitates a real-world process or system over time using a computer model. Because digital simulation relies on a model of the real world, the usefulness and accuracy of simulation to improve decisions depends a lot on the fidelity of the model. Simulation has long been used in multiple industries to test new ideas or how modifications will affect an existing process or system. Mathematical optimization: Mathematical optimization is the process of finding the optimal solution to a problem that has numerically expressed constraints. Machine learning: “Learning” means that the algorithms analyze sets of data to look for patterns and/or correlations that result in insights. Those insights can become deeper and more accurate as the algorithms analyze new data sets. The models created and continuously updated by machine learning can be used as input to decision logic or to improve the decision logic automatically. Paragmetic AI: Enterprises can use AI to program machines to continuously learn from new information, build knowledge, and then use that knowledge to make decisions and interact with people and/or other machines. Use of Prescriptive Analytics in ThingWorx Analytics: Thing Optimizer: Thing Optimizer functionality provides the prescriptive scoring and optimization capabilities of ThingWorx Analytics. While predictive scoring allows you to make predictions about future outcomes, prescriptive scoring allows you to see how certain changes might affect future outcomes. After you have generated a prediction model (also called training a model), you can modify the prescriptive attributes in your data (those attributes marked as levers) to alter the predictions. The prescriptive scoring process evaluates each lever attribute, and returns an optimal value for that feature, depending on whether you want to minimize or maximize the goal variable. Prescriptive scoring results include both an original score (the score before any lever attributes are changed) and an optimized score (the score after optimal values are applied to the lever attributes). In addition, for each attribute identified in your data as a lever, original and optimal values are included in the prescriptive scoring results. How to Access Thing Optimizer Functionality: ThingWorx Analytics prescriptive scoring can only be accessed via the REST API Service. Using a REST client, you can access the Scoring service which includes a series of API endpoints to submit scoring requests, retrieve results, list jobs, and more. Requires installation of the ThingWorx Analytics Server. How to avoid mistakes - Below are some common mistakes while doing Prescriptive analytics: Starting digital analytics without a clear goal Ignoring core metrics Choosing overkill analytics tools Creating beautiful reports with little business value Failing to detect tracking errors Image source: Wikipedia, Content: go.forrester.com(Partially)

Sep 25, 2017

This blog is intended to help diagnose and fix the most common issues that may be encountered when working with ThingWatcher. It cannot be stressed strongly enough that you should be familiar with your data including the average time interval between data points, and the collection duration and certainty threshold you specified. Before you start troubleshooting ThingWatcher, check that result and training microservices is running. For testing result microservices open a web browser and paste result URL; http://<IP of microservices>:<Port of results microservices>/results/models (e.g., http://localhost:8096/results/models) For testing training microservices open a web browser and paste training URL; http://<IP of microservices>:<Port of training microservices>/training (e.g., http://localhost:8091/training) If you see either: {"values":[],"total":0,"next":null,"previous":null} or a list of training jobs in JSON format, this means the result and training microservice service is available. 1. Question. I haven't seen an anomaly but I believe that my 'property' is anomalous? This can be caused by different reasons, here are the most common causes: The certainty is too high. If the certainty is too high ThingWatcher is conservative in its categorization of "true positives" and therefore may emit more "false negatives". Reducing the certainty will change this behavior but note that ThingWatcher may now categorize too many "false positives" as a result. In other words, ThingWatcher may detect the desired anomalies but also some non-anomalies. The 'property' is anomalous during training data collection. If ThingWatcher creates a predictive model from anomalous data, it may not be able to detect the desired anomalies during MONITORING because the data does not really appear to be anomalous. So ThingWatcher treats this pattern as 'normal'. Therefore, ensure that 'property' values are also non-anomalous during training. There are long time gaps during the monitoring state so ThingWatcher stays in Buffering and categorizes these data points as non-anomalous. 2. Question. ThingWatcher detects an anomaly but my 'property' is non-anomalous? The certainty might be too low. In this case, ThingWatcher reports anomalies when the incoming data pattern looks even slightly different from the expected data pattern. ThingWatcher might need more training data. If the 'property' data has a pattern that occurs over a long time span, ThingWatcher needs to collect multiple cycles of all these patterns in order to detect a true anomaly without emitting too many false positives. 3. Question. ThingWatcher is in FAILED State, why? There are many possible reasons for a failed state, here are the most likely problems that can cause a failed state. ThingWatcher emits a FAILED ThingWatcher State because the training service has not been setup or is down. similarly, the result service is not available. NotemessageText=Unexpected exception. {Throwable=[ConnectException: Operation timed out}]]messageText=Unexpected exception. {Throwable=[ConnectException: Connection refused}]]. Note that ThingWatcher is still able to collect all training data and you will only begin to see these failed states after ThingWatcher tried to post the training request. ThingWatcher emits a FAILED ThingWatcher State because time gaps prevent the data collection for training.You will see this warning in the log messages : "A long time gap was detected in the data that is greater than the threshold of {n}". This means you have a long gap in the training data and ThingWatcher will recollect the data. If there are more than 3 recollections due to a long time gap, ThingWatcher transitions to a failed state and will not be able to recover. In this case you can either instruct ThingWatcher to retrain and try again or check the data source to make sure it does not have long gaps. 4. Question. Why does ThingWatcher remain in Buffering? There are many possible reasons for ThingWatcher to remain in Buffering, but the most likely issue is time gaps which cause ThingWatcher to remain stuck in Buffering. If the incoming data regularly contains long time gaps, you will notice that ThingWatcher keeps alternating between the monitoring and buffering states. You may need to provide better quality data i.e. more evenly spaced data. Source: Alex Meng, Specialist Software Engineer

Sep 22, 2017

A confusion matrix is a technique for summarizing the performance of a classification algorithm. Classification accuracy alone can be misleading if you have an unequal number of observations in each class or if you have more than two classes in your data set. Calculating a confusion matrix can give you a better idea of what your classification model is getting right and what types of errors it is making. Classification Accuracy and its Limitations: Classification Accuracy = Correct Predictions/Total Predictions The main problem with classification accuracy is that it hides the detail you need to better understand the performance of your classification model. Below are two examples: 1. When you are data has more than 2 classes. With 3 or more classes you may get a classification accuracy of 80%, but you don’t know if that is because all classes are being predicted equally well or whether one or two classes are being neglected by the model. 2. When your data does not have an even number of classes. You may achieve accuracy of 90% or more, but this is not a good score if 90 records for every 100 belong to one class and you can achieve this score by always predicting the most common class value. Classification accuracy can hide the detail you need to diagnose the performance of your model. But thankfully we can tease apart this detail by using a confusion matrix. Confusion Matrix Terminology: A confusion matrix is a table that is often use to describe the performance of a classification model on a set of test data for which true values are known. Let’s start with an example for a binary classifier: N=165 Predicted no: Predicted yes: Actual no: 50 10 Actual yes: 5 100 What we can learn from Confusion Matrix? There are two possible predicted classes: "yes" and "no". If we were predicting the presence of a disease, for example, "yes" would mean they have the disease, and "no" would mean they don't have the disease. The classifier made a total of 165 predictions (e.g., 165 patients were being tested for the presence of that disease). Out of those 165 cases, the classifier predicted "yes" 110 times, and "no" 55 times. In reality, 105 patients in the sample have the disease, and 60 patients do not. Let's now define the most basic terms, which are whole numbers (not rates): True positives (TP): These are cases in which we predicted yes (they have the disease), and they do have the disease. True negatives (TN): We predicted no, and they don't have the disease. False positives (FP): We predicted yes, but they don't actually have the disease. (Also known as a "Type I error.") False negatives (FN): We predicted no, but they actually do have the disease. (Also known as a "Type II error.") N=165 Predicted No: Predicted Yes: Actual No: TN=50 FP=10 60 Actual Yes: FN=5 TP=100 105 55 110 This is a list of rates that are often computed from a confusion matrix for a binary classifier: Accuracy: Overall, how often is the classifier correct? 1. (TP+TN)/total = (100+50)/165 = 0.91 Misclassification Rate: Overall, how often is it wrong? 1. (FP+FN)/total = (10+5)/165 = 0.09 2. Equivalent to 1 minus Accuracy 3. Also known as "Error Rate" True Positive Rate: When it's actually yes, how often does it predict yes? 1. TP/actual yes = 100/105 = 0.95 2. Also known as "Sensitivity" or "Recall" False Positive Rate: When it's actually no, how often does it predict yes? 1. FP/actual no = 10/60 = 0.17 Specificity: When it's actually no, how often does it predict no? 1. TN/actual no = 50/60 = 0.83 2. Equivalent to 1 minus False Positive Rate Precision: When it predicts yes, how often is it correct? 1. TP/predicted yes = 100/110 = 0.91 Prevalence: How often does the yes condition actually occur in our sample? 1. Actual yes/total = 105/165 = 0.64

Sep 21, 2017

Learning is a wide domain and thus, the field of machine learning has branched into several sub fields dealing with different types of learning tasks. Supervised vs Unsupervised Learning Since learning involves an interaction between the learner and the environment, we can divide tasks according to the nature of the interaction. Consider the task of detecting the hand written numbers among the lots of digital numbers vs task of anomaly detection. For the hand-written numbers detection task, we consider a setting in which the learner receives a set of training numbers, labeled as ‘Hand-Written’ and ‘Digital’. On the basis of such training, the learner should figure out a rule for labeling a newly arriving number. In contrast, for the task of anomaly detection, all the learner gets, as a training is a large set of numbers, without any labels on it, and the learner’s task is to detect ‘unusual’ numbers. Consider learning as a process of ‘using the experience to gain expertise’. Supervised learning describes a scenario in which the ‘experience’, a training example, consist significant information (labels Hand-Written or Digital) that is missing in the future arriving numbers, to which the learned experience is to be applied. Here, the acquired expertise is aimed at predicting the missing information for the validation data. However, in unsupervised learning, there is no difference between the training data and validation data. The learner processes input data with the goal of coming up with some summary or compressed version of the data. Clustering a data set into subsets of similar objects is a good example of such task. There is also an intermediate learning setting in which, the training set contains more information than the validation set. The learner here is required to predict even more information for the validation set. Consider a game of chess, where a value function describes each setting of a chess board. Now one may want to learn the degree why which White’s position is better than the black’s. The only information available to the learner at the training time is positions that occurred in an actual chess game and who won it. Such learning frameworks are investigated under reinforcement learning. Source: ‘Understanding machine learning: From theory to algorithms’ written by Shai Shalev-Shwartz and Shai Ben-David

Jul 24, 2017

Overview Time-series predictive models generated by ThingWorx Analytics Server in Analytics Manager will have additional columns in their dataShape generated by ThingPredictor. These columns are known as “transformation fields” and are used for internal processing but are not necessary for inclusion in the DataShape. So there is no need to worry about mapping all these additional fields since it will be handled internally by ThingPredictor. There is one addition step that the user must take which is detailed below. Step to Import: Edit the DataShape generated by ThingPredictor to match the format of the data that was provided during the model training process. In other words, remove all the transformation fields from the DataShape.

Jun 30, 2017

IoT Tips

Service to set Test Data in TWX Foundation to trigger Analytics Event & Job

Metadata in ThingWorx Analytics 8.1

Example ThingWorx Analytics 8.1 Mashup

Starting & Stopping ThingWorx Analytics 8.1 (Native Linux Deployment)

Mapping previous versions of ThingWorx Analytics API to ThingWorx Analytics 8.1 Services

ThingWorx_Analytics_8.1_sample_data.zip

Get Started with ThingWorx Analytics - New Expert Sessions

Where is the value unlocked in a system of connected devices?

ThingWorx Analytics 8.1 Architecture

What's New in ThingWorx Analytics Builder 8.1

Parquet Data Format used in ThingWorx Analytics

Ordinal Predictions

Validating a ThingWorx Analytics Server 8.1 installation

Installing ThingWorx Analytics Server 8.1 - Native Linux

Gradient Boost Algorithm

Deploying training and result microservices via docker containers for ThingWorx anomaly detection

Prescriptive analytics

General Troubleshooting Blog for ThingWatcher

Confusion Matrix

TYPES OF LEARNING

How to import time-series predictive models generated by ThingWorx Analytics Server into Analytics Manager

ThingWorx Learning Paths

Getting Started on the ThingWorx Platform Learning Path