cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

ThingWorx Navigate is now Windchill Navigate Learn More

IoT & Connectivity Tips

Sort by:
In this video we are walking through the installation steps of ThingWorx Analytics Server 8.1. This cover the Native Linux installation though the steps will be similar for a docker installation on Windows or Linux.   Updated Link for access to this video:  Installing ThingWorx Analytics Server 8.1 - Native Linux
View full tip
This video shows the commands to execute to deploy the training and results microservices as docker container. This is based on Docker Toolbox to highlight the specific settings required on Toolbox.   Updated Link for access to this video:  Deploying Training & Result Microservices via Docker Containers for Anomaly Detection
View full tip
This blog is intended to help diagnose and fix the most common issues that may be encountered when working with ThingWatcher. It cannot be stressed strongly enough that you should be familiar with your data including the average time interval between data points, and the collection duration and certainty threshold you specified. Before you start troubleshooting ThingWatcher, check that result and training microservices is running. For testing result microservices open a web browser and paste result URL; http://<IP of microservices>:<Port of results microservices>/results/models (e.g., http://localhost:8096/results/models) For testing training microservices open a web browser and paste training URL; http://<IP of microservices>:<Port of training microservices>/training (e.g., http://localhost:8091/training) If you see either: {"values":[],"total":0,"next":null,"previous":null} or a list of training jobs in JSON format, this means the result and training microservice service is available. 1. Question. I haven't seen an anomaly but I believe that my 'property' is anomalous?         This can be caused by different reasons, here are the most common causes: The certainty is too high. If the certainty is too high ThingWatcher is conservative in its categorization of "true positives" and therefore may emit more "false negatives". Reducing the certainty will change this behavior but note that ThingWatcher may now categorize too many "false positives" as a result. In other words, ThingWatcher may detect the desired anomalies but also some non-anomalies. The 'property' is anomalous during training data collection. If ThingWatcher creates a predictive model from anomalous data, it may not be able to detect the desired anomalies during MONITORING because the data does not really appear to be anomalous. So ThingWatcher treats this pattern as 'normal'. Therefore, ensure that 'property' values are also non-anomalous during training. There are long time gaps during the monitoring state so ThingWatcher stays in Buffering and categorizes these data points as non-anomalous. 2. Question. ThingWatcher detects an anomaly but my 'property' is non-anomalous? The certainty might be too low. In this case, ThingWatcher reports anomalies when the incoming data pattern looks even slightly different from the expected data pattern. ThingWatcher might need more training data. If the 'property' data has a pattern that occurs over a long time span, ThingWatcher needs to collect multiple cycles of all these patterns in order to detect a true anomaly without emitting too many false positives. 3. Question. ThingWatcher is in FAILED State, why?     There are many possible reasons for a failed state, here are the most likely problems that can cause a failed state. ThingWatcher emits a FAILED ThingWatcher State because the training service has not been setup or is down. similarly, the result service is not available. NotemessageText=Unexpected exception. {Throwable=[ConnectException: Operation timed out}]]messageText=Unexpected exception. {Throwable=[ConnectException: Connection refused}]]. Note that ThingWatcher is still able to collect all training data and you will only begin to see these failed states after ThingWatcher tried to post the training request. ThingWatcher emits a FAILED ThingWatcher State because time gaps prevent the data collection for training.You will see this warning in the log messages : "A long time gap was detected in the data that is greater than the threshold of {n}". This means you have a long gap in the training data and ThingWatcher will recollect the data. If there are more than 3 recollections due to a long time gap, ThingWatcher transitions to a failed state and will not be able to recover. In this case you can either instruct ThingWatcher to retrain and try again or check the data source to make sure it does not have long gaps. 4. Question. Why does ThingWatcher remain in Buffering? There are many possible reasons for ThingWatcher to remain in Buffering, but the most likely issue is time gaps which cause ThingWatcher to remain stuck in Buffering. If the incoming data regularly contains long time gaps, you will notice that ThingWatcher keeps alternating between the monitoring and buffering states. You may need to provide better quality data i.e. more evenly spaced data. Source: Alex Meng, Specialist Software Engineer
View full tip
Learning is a wide domain and thus, the field of machine learning has branched into several sub fields dealing with different types of learning tasks. Supervised vs Unsupervised Learning Since learning involves an interaction between the learner and the environment, we can divide tasks according to the nature of the interaction. Consider the task of detecting the hand written numbers among the lots of digital numbers vs task of anomaly detection. For the hand-written numbers detection task, we consider a setting in which the learner receives a set of training numbers, labeled as ‘Hand-Written’ and ‘Digital’. On the basis of such training, the learner should figure out a rule for labeling a newly arriving number. In contrast, for the task of anomaly detection, all the learner gets, as a training is a large set of numbers, without any labels on it, and the learner’s task is to detect ‘unusual’ numbers. Consider learning as a process of ‘using the experience to gain expertise’. Supervised learning describes a scenario in which the ‘experience’, a training example, consist significant information (labels Hand-Written or Digital) that is missing in the future arriving numbers, to which the learned experience is to be applied. Here, the acquired expertise is aimed at predicting the missing information for the validation data. However, in unsupervised learning, there is no difference between the training data and validation data. The learner processes input data with the goal of coming up with some summary or compressed version of the data. Clustering a data set into subsets of similar objects is a good example of such task. There is also an intermediate learning setting in which, the training set contains more information than the validation set. The learner here is required to predict even more information for the validation set. Consider a game of chess, where a value function describes each setting of a chess board. Now one may want to learn the degree why which White’s position is better than the black’s. The only information available to the learner at the training time is positions that occurred in an actual chess game and who won it. Such learning frameworks are investigated under reinforcement learning. Source: ‘Understanding machine learning: From theory to algorithms’ written by Shai Shalev-Shwartz and Shai Ben-David
View full tip
Many users of our software have submitted cases regarding the Third-Party Components and their functions within ThingWorx Analytics. This short blog post will provide the main components used by our software and explain their functionality.   ThingWorx Analytics uses the following components in its default installation:   Apache ZooKeeper ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services ThingWorx Analytics uses ZooKeeper as the gatekeeper to API calls and processes to the Application Component Homepage: https://zookeeper.apache.org/ Apache Tomcat ​Apache Tomcat software is an open source implementation of the Java Servlet, JavaServer Pages, Java Expression Language and Java WebSocket technologies ThingWorx Analytics uses Tomcat to handle webservices and API communications This enables the use of ThingWorx Foundation (Core) mashups with ThingWorx Analytics Server Component Homepage: http://tomcat.apache.org/ PostgreSQL Server PostgreSQL is an open source object-relational database system ThingWorx Analytics uses PostgresSQL server to store analytical results for later retrieval Component Homepage: https://www.postgresql.org/
View full tip
Neural Net is a learning algorithm that is inspired by how the human brain works. For example, imagine you love chocolate cake so much that, you joyfully exercise a bit more during the week just to enjoy that delicious chocolate cake without feeling guilty. But if the weather is terrible there is no way you go exercise, and then you can’t eat the delicious chocolate cake. Although, if your beautiful girlfriend / boyfriend exercise with you, then you ignore the weather and joyfully exercise and then you can enjoy that delicious chocolate cake without feeling guilty. The brain’s nervous system passes information using a synapse structure which allows neurons to pass information to other neurons and finally make a decision. This structure of passing information and decision making is the construction behind neural net algorithm. The data structure provides weights on the edges for the nodes/synapses in a directed graph. For example, our chocolate cake decision making could be translated into: w1=6 Whether or not your girlfriend or boyfriend exercise together with you w2=3 Enjoying delicious chocolate cake w3=2 Weather The high weights for example indicates the condition to have a high influence on your output decision making, while lower weight is not that influential. The illustration below is an example of neural net with 5 different input of information: The edges/arrows represent weights each input/node have a weight associated with it. These weights are applied when training neural net. Three of the inputs could represent 1= Delicious chocolate cake, 2= The weather, 3= Your girlfriend or boyfriend, and the last two inputs could be other information. The output is the condition of the decision determined by the hidden layer. ThingWorx Analytics Server applies neural net with full interconnection layer, which means each value from the input layer is duplicated and sent to each node in the hidden layer, just like in following illustration.
View full tip
Retraining the Model in ThingWorx Analytics When using ThingWorx Analytics Products to build Prediction Models, it is not enough to end up with models that are a Technical Success. The purpose is to ultimately have models that are a Business Success. What the user would want to achieve is to have Models that remain reliable and accurate in a potentially changing production environment. Therefore, when your environment changes, the model that you have used and relied on might no longer provide the same quality of results. Hence the need to retrain your model. Types of Models to be retrained: There are currently two types of models that are created with ThingWorx Analytics: Predictive models Anomaly Detection models Each of those models could require retraining based on the context in which they are created then used. When to retrain your Model: - Predictive models: For predictive analytics models, the main initiator for retraining would be a change in the production environment. resulting in the change of collected Dataset. This could nonetheless be caused by many factors: An overall change in the business objective: This could include a change in the granularity at which the Dataset is used. An Example, in a Company HR Dataset, could be moving from making predictions on a Department Level to making predictions on an Employee Level. The addition of either new features in the Dataset or even new values in the existing features which did not figure within the values of the training dataset. This type of change in the Dataset would require the retraining of the Model. The emergence of new trends in the marketplace: These new trends would appear in the generated Datasets. This could be detected by the degradation of the results that are provided by the existing Prediction Models. - Anomaly Detection models: In anomaly detection, the need to retrain the models originates mainly from a change in what is considered to be a Normal behavior of a certain monitored property. The could be caused by the following factors: A change in the context in which the Property values are measured then monitored: An example is monitoring the Traffic in a Street in the Working weekdays while excluding the weekends then adding the Weekend days to the monitored behavior. Here the change in the Traffic is normal however would be detected as an Anomaly unless the model is retrained. A change in the Thresholds of values accepted to be normal in a certain property. An example is the temperatures measured on a running device. The Device, previously,  never run at full power when the model was built but since it started running at full power the temperature increased beyond the usual threshold and thus the model needs to be retrained to include the new Normal temperatures. Another reason that could justify retraining the anomaly detection model is simply that when the model was trained the Property values that were used were not representing its normal state. For example, the temperature of an  Engine was being measured on a "Turned off" state when we are actually trying to build a model that would detect temperature anomalies on a running Engine. This might not be an exhaustive list of the reasons that would require either a Predictive or an Anomaly Detection Model to be retrained. As a general rule of thumb, if the model starts delivering results that are below expected or if the business context for the model is not valid, then it might be a wise decision to retrain the Analytics Model.
View full tip
Behavior of ThingWorx Analytics Anomaly Detection with Data Gaps In ThingWorx Analytics, Anomaly detection is performed through the ThingWatcher API framework. This is done by observing the Data from an Edge device, learning what the data stream should look like and then monitoring for any unexpected sequences of Data within the incoming Data stream. Ideally, for this process to work properly, there should be no Data Gaps. However, Data Gaps do occur, this blog describes how ThingWatcher deals with them in order to achieve high performance in anomaly detection. Data Gaps and phases affected: In anomaly detection, ThingWatcher goes through three consecutive phases which are Initializing, Calibrating and then Monitoring. Both the Initializing and Monitoring phases involve either collecting or monitoring streamed Data, so these two phases are sensitive to Data streaming Gaps. The Calibrating phase involves the use of already collected data to create the Anomaly Detection Model. Thus this phase is not directly affected by Data gaps. Dealing with Long and Short Data Gaps: Initializing Phase: During this phase, Data is collected and as part of the collection process, the sampling rate is imputed. So when short data gaps occur these are interpolated so that there are no missing values. However long Gaps might also occur. A Data gap is considered to be long if there is more than three missing Data points which should amount to three times the sampling rate. Basically, if the Timestamp on a data point is greater than the previous timestamp by more than three times the sampling rate that is considered to be a long gap. If a long gap occurs, ThingWatcher would restart the Data Collection process since long Data gaps are not acceptable. The data recollection process could be initiated three times when there are long gaps before failing if the gaps persist. The Data source would then no longer be considered reliable Monitoring Phase: In this phase, the Data stream is monitored to detect any unexpected behavior. In that case, if a short time gap occurs between the previous and the current TimedValue data points, the lookback buffer would be cleared. ThingWatcher will re-enter the Buffering state and will remain in this state until the lookback window buffer is completely filled. For more information on The functionalities of ThingWatcher, Please refer to the ThingWatcher Deployment Guide https://support.ptc.com/WCMS/files/173109/en/ThingWatcher-Deployment-Guide-8.0.pdf However, if the gaps are long and exceed three times the sampling rate, data filling could no longer be a valid solution and Data collection restarts. It is important to note that these imputed values decrease the accuracy of the Anomaly Detection Model. Therefore data monitored by ThingWatcher should be incremented in regular intervals. In general, persistent data gaps should be avoided by ensuring that data is streamed such that the timestamps increase in regular increments and any gaps that exist are generally incidental and small.
View full tip
In this video we show: - how to deploy the microservices via jar files - how to setup ThingWorx to use those microservices for anomaly detection   Updated Link for access to this video:  ThingWorx Analytics: Deploying Training and Result Microservices via jar files for Anomaly Detection
View full tip
This video is the 1 st part of a series of 3 videos walking you through how to setup ThingWatcher for Anomaly Detection. In this first video you will learn the basics of how to create connectivity between KEPServer and ThingWorx Platform.   Updated Link for access to this video:  Anomaly Detection 8.0 - Part 1: Connecting KEPServer to ThingWorx: Part 1 of 3
View full tip
In this video we go through the steps to install ThingWorx Analytics Server 8.0   Updated Link for access to this video:  ThingWorx Analytics Server 8.0
View full tip
In this video we show a simple mashup and services in order to display the ThingPredictor's real time scoring results. This video applies to ThingWorx Analytics 7.4 to 8.1   Updated Link for access to this video:  Showing ThingWorx Analytics Manager's results (ThingPredictor) in a Mashup
View full tip
This is part 3 out of 3 videos on Getting Started with ThingWorx Analytics During this video you will learn:   Executing a “Signals” Job Getting the results of the “Signals” Job Executing a “Training Model” Job Getting the results of the “Training Model” Job   Updated Link for access to this video:  Getting Started with ThingWorx Analytics: Part 3 of 3
View full tip
This blog addresses a few points that are related to scoring with ThingWorx Analytics. It, particularly, brings a clearer understanding of the concepts behind the values of the scores that are generated when performing a scoring job.   Scoring Outputs:   It is important to note that when training an analytics model, the method is to create a generalizable model from a relatively small training dataset.   By its nature, we expect the training process to see a limited subset and not an exhaustive list of all possible values for many constraints, especially for time and practicality.   As such, these generalized models will be expected to handle unseen data in the form of new combinations or values outside of previously observed ranges (more on this below).   One common way to see scores that exceed the observed ranges in training, under the assumption that the goals are continuous, is to use prescriptive scoring.   Prescriptive scoring attempts to find optimal values for a lever, meaning tunable, features in order to maximize or minimize score values. See the prescriptive scoring documentation and functionality for more information.   min/max constraints: these are constraints that are placed upon the inputs for training and expected inputs for scoring.   •          For training: If theses ranges were provided as part of the upload process, then training will raise exceptions regarding invalid data. However, if the ranges are not provided - they will be inferred from the data and, as such, training will not see values outside of observed ranges.   •          For scoring: Validation of the ranges will only be performed on the inputs - not the outputs. It is very important to note that the handling of these "constraints" is dependent upon the data type.   For categorical (e.g. colors) and ordinal data (e.g. shirt sizes), the constraints are strict and data that was not observed in training will raise exceptions during scoring.   However, for continuous values (e.g. temperature ranges) these constraints are more informational in nature. For predictive scoring, our code will accept records with values outside of those ranges.   The rule of thumb is that values slightly outside these ranges are acceptable and that as the values stray farther from the ranges, the accuracy of the model degrades very quickly.   For prescriptive scoring, these constraints are used to determine the acceptable ranges of values to try when determining the optimal values. Values outside of these constraints will NOT be tried.
View full tip
A Feature - a piece of information that is potentially useful for prediction. Any attribute could be a feature, as long as it is useful to the model. Feature engineering – Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data. It’s a vaguely agreed space of tasks related to designing feature sets for Machine Learning applications. Components: First, understanding the properties of the task you’re trying to solve and how they might interact with the strengths and limitations of the model you are going to use. Second, experimental work were you test your expectations and find out what actually works and what doesn’t. Feature engineering as a technique, has three sub categories of techniques: Feature selection, Dimension reduction and Feature generation. Feature Selection: Sometimes called feature ranking or feature importance, this is the process of ranking the attributes by their value to predictive ability of a model. Algorithms such as decision trees automatically rank the attributes in the data set. The top few nodes in a decision tree are considered the most important features from a predictive stand point. As a part of a process, feature selection using entropy based methods like decision trees can be employed to filter out less valuable attributes before feeding the reduced dataset to another modeling algorithm. Regression type models usually employ methods such as forward selection or backward elimination to select the final set of attributes for a model. For example: Project Development decision-tree:                                                  Dimension Reduction: This is sometimes called feature extraction. The most classic example of dimension reduction is principle component analysis or PCA. PCA allows us to combine existing attributes into a new data frame consisting of a much reduced number of attributes by utilizing the variance in the data. The attributes which "explain" the highest amount of variance in the data form the first few principal components and we can ignore the rest of the attributes if data dimensionality is a problem from a computational standpoint. Feature Generation or Feature Construction: Quite simply, this is the process of manually constructing new attributes from raw data. It involves intelligently combining or splitting existing raw attributes into new one which have a higher predictive power. For example a date stamp may be used to generate 2 new attributes such as AM and PM which may be useful in discriminating whether day or night has a higher propensity to influence the response variable. Feature construction is essentially a data transformation process. Tips for Better Feature Engineering Tip 1: Think about inputs you can create by rolling up existing data fields to a higher/broader level or category. As an example, a person’s title can be categorized into strategic or tactical. Those with titles of “VP” and above can be coded as strategic. Those with titles “Director” and below become tactical. Strategic contacts are those that make high-level budgeting and strategic decisions for a company. Tactical are those in the trenches doing day-to-day work.  Other roll-up examples include: Collating several industries into a higher-level industry: Collate oil and gas companies with utility companies, for instance, and call it the energy industry, or fold high tech and telecommunications industries into a single area called “technology.” Defining “large” companies as those that make $1 billion or more and “small” companies as those that make less than $1 billion.   Tip 2: Think about ways to drill down into more detail in a single field. As an example, a contact within a company may respond to marketing campaigns, and you may have information about his or her number of responses. Drilling down, we can ask how many of these responses occurred in the past two weeks, one to three months, or more than six months in the past. This creates three additional binary (yes=1/no=0) data fields for a model. Other drill-down examples include: Cadence: Number of days between consecutive marketing responses by a contact: 1–7, 8–14, 15–21, 21+ Multiple responses on same day flag (multiple responses = 1, otherwise =0) Tip 3: Split data into separate categories also called bins. For example, annual revenue for companies in your database may range from $50 million (M) to over $1 billion (B). Split the revenue into sequential bins: $50–$200M, $201–$500M, $501M–$1B, and $1B+. Whenever a company falls with the revenue bin it receives a one; otherwise the value is zero. There are now four new data fields created from the annual revenue field. Other examples are: Number of marketing responses by contact: 1–5, 6–10, 10+ Number of employees in company: 1–100, 101–500, 502–1,000, 1,001–5,000, 5,000+ Tip 4: Think about ways to combine existing data fields into new ones. As an example, you may want to create a flag (0/1) that identifies whether someone is a VP or higher and has more than 10 years of experience. Other examples of combining fields include: Title of director or below and in a company with less than 500 employees Public company and located in the Midwestern United States You can even multiply, divide, add, or subtract one data field by another to create a new input. Tip 5: Don’t reinvent the wheel – use variables that others have already fashioned. Tip 6: Think about the problem at hand and be creative. Don’t worry about creating too many variables at first, just let the brainstorming flow.
View full tip
This video is the 2nd part, of a series of two videos, walking you through the configuration of Analysis Event which is applied for Real-Time Scoring. This part 2 video will walk you through the configuration of Analysis Event for Real-Time Scoring, and validate that a predictions job has been executed based on new input data.   Updated Link for access to this video:  Analytics Manger 7.4: Configure Analysis Event & Real Time Scoring Part 2 of 2
View full tip
This video walks you through the use of Analysis Replay to execute analysis events on historic data. This video applies to ThingWorx Analytics 7.4 to 8.1   Updated Link for access to this video:  Analytics Manager : Using Analysis Replay
View full tip
This video is the 1st part of a series of two videos walking you through the configuration of Analysis Event which is applied for Real-Time Scoring. This 1st video demonstrate how to create a Template and Thing which allows the prediction model to score in real-time. Note that this video is just for demo purposes, customers who have ThingWorx, they of course already have their properties set-up. They just need to configure Analysis Event which is demonstrated in the part 2 video.   Updated Link for access to this video:  Analytics Manger 7.4: Create a Template & Thing for Real-Time Scoring Part 1 of 2
View full tip
This video will walk you through the first steps of how to set-up Analytics Manager for Real-Time Scoring. More specifically this video demonstrates how to create an Analysis Provider and start ThingPredictor Agent. NOTE: For version 8.1 the startup command for the Agent has changed view the command in PTC Help center.   Updated Link for access to this video:  ThingWorx Analytics Manager: Create an Analysis Provider & Start the ThingPredictor Agent                                  
View full tip
In this part of the Troubleshooting blog series, we will review the process on how to restart individual services essential to the ThingWorx Analytics Application within the Virtual Machine Appliance.   Services have stopped, and I cannot run my Analytics jobs! In some cases, we have had users encounter issues where a system or process has halted and they are unable to proceed with their tasks. This can be due to a myriad of reasons, ranging from OS hanging issues to memory issues with certain components.   As we covered earlier in Part II, the ThingWorx Analytics Application is installed in a CentOS (Linux) Operating System. As with most Linux Operating Systems, you have the ability to manually check and restart processes as needed.   Steps to Restart Services   With how the Application is installed and configured, the services for functionality should auto start when you boot up the VM. You will have to verify that the Appliance is functional by running your desired API call.   If a system is not functioning as expected, you will receive an error in your output when you POST an API call. Some errors are very specific and you can search the Knowledge Database for any existing Knowledge Articles that may solve the issue.   For error messages that do not have an existing article, you may want to attempted the following   Method 1:   If you are encountering issues, and are unsure what process is not working correctly, we would recommend a full Application restart. This involves restarting the Virtual Machine Appliance via the command line terminal.   We would recommend that you use the following command, as root user or using SUDO, as this is known as a “Graceful restart” ​sudo reboot -h now   This will restart the virtual machine safely, and once you are back up and running you can run your API calls to verify functionality. This should resolve any incremental issues you may have faced.   Method 2:   If you want to restart an individual service, there is a particular start order that needs to be followed to make sure the Application is operating as expected.   The first step is check what services are not running, you can use the following command to check what is running and its current status: service –status-all   The services you are looking for are the following: Zookeeper PostgreSQL Server GridWorker(s) Tomcat   If a particular service is not on the running list, you will have to manually start them by using the service start command. service [name of service] start e.g. service tomcat start You may be prompted for the root password   You can verify that the services are operating by running the status check command as described above.   If you need to restart all services, we have a specific start order, this is important to do as there are some dependencies such as Postgres for the GridWorker(s) and Tomcat to use.   The start order is as follows: Zookeeper PostgreSQL Server GridWorker(s) Tomcat   After completing the restart, and verifying that the services are running, run your desired API call.
View full tip
  PLEASE NOTE DataConnect has now been deprecated and no longer in use and supported.   We are regularly asked in the community how to send data from ThingWorx platform to ThingWorx Analytics in order to perform some analytics computation. There are different ways to achieve this and it will depend on the business needs. If the analytics need is about anomaly detection, the best way forward is to use ThingWatcher without the use of ThingWorx Analytics server. The ThingWatcher Help Center is an excellent place to start, and a quick start up program can be found in this blog. If the requirement is to perform a full blown analytics computation, then sending data to ThingWorx Analytics is required. This can be achieved by Using ThingWorx DataConnect, and this is what this blog will cover Using custom implementation. I will be very pleased to get your feedback on your experience in implementing custom solution as this could give some good ideas to others too. In this blog we will use the example of a smart Tractor in ThingWorx where we collect data points on: Speed Distance covered since last tyre change Tyre pressure Amount of gum left on the tyre Tyre size. From an Analytics perspective the gum left on the tyre would be the goal we want to analyse in order to know when the tyre needs changing.   We are going to cover the following: Background Workflow DataConnect configuration ThingWorx Configuration Data Analysis Definition Configuration Data Analysis Definition Execution Demo files   Background For people not familiar with ThingWorx Analytics, it is important to know that ThingWorx Analytics only accepts a single datafile in a .csv format. Each columns of the .csv file represents a feature that may have an impact on the goal to analyse. For example, in the case of the tyre wear, the distance covered, the speed, the pressure and tyre size will be our features. The goal is also included as a column in this .csv file. So any solution sending data to ThingWorx Analytics will need to prepare such a .csv file. DataConnect will perform this activity, in addition to some transformation too.   Workflow   Decide on the properties of the Thing to be collected, that are relevant to the analysis. Create service(s) that collect those properties. Define a Data Analysis Definition (DAD) object in ThingWorx. The DAD uses a Data Shape to define each feature that is to be collected and sends them to ThingWorx Analytics. Part of the collection process requires the use of the services created in point 2. Upon execution, the DAD will create one skinny csv file per feature and send those skinny .csv files to DataConnect. In the case of our example the DAD will create a speed.csv, distance.csv, pressure.csv, gumleft.csv, tyresize.csv and id.csv. DataConnect processes those skinny csv files to create a final single .csv file that contains all these features. During the processing, DataConnect will perform some transformation and synchronisation of the different skinny .csv files. The resulting dataset csv file is sent to ThingWorx Analytics Server where it can then be used as any other dataset file.     DataConnect configuration   As seen in this workflow a ThingWorx server, DataConnect server and a ThingWorx Analytics server will need to be installed and configured. Thankfully, the installation of DataConnect is rather simple and well described in the ThingWorx DataConnect User’s guide. Below I have provided a sample of a working dataconnect.conf file for reference, as this is one place where syntax can cause a problem:   ThingWorx Configuration The platform Subsystem needs to be configured to point to the DataConnect server . This is done under SYSTEM > Subsystems > PlatformSubsystem:     DAD Configuration The most critical part of the process is to properly configure the DAD as this is what will dictate the format and values filled in the skinny csv files for the specific features. The first step is to create a data shape with as many fields as feature(s)/properties collected.   Note that one field must be defined as the primary key. This field is the one that uniquely identifies the Thing (more on this later). We can then create the DAD using this data shape as shown below   For each feature, a datasource needs to be defined to tell the DAD how to collect the values for the skinny csv files. This is where custom services are usually needed. Indeed, the Out Of The Box (OOTB) services, such as QueryNumberPropertyHistory, help to collect logged values but the id returned by those services is continuously incremented. This does not work for the DAD. The id returned by each services needs to be what uniquely identifies the Thing. It needs to be the same for all records for this Thing amongst the different skinny csv files. It is indeed this field that is then used by DataConnect to merge all the skinny files into one master dataset csv file. A custom service can make use of the OOTB services, however it will need to override the id value. For example the below service uses QueryNumberPropertyHistory to retrieve the logged values and timestamp but then overrides the id with the Thing’s name.     The returned values of the service then needs to be mapped in the DAD to indicate which output corresponds to the actual property’s value, the Thing id and the timestamp (if needed). This is done through the Edit Datasource window (by clicking on Add Datasource link or the Datasource itself if already defined in the Define Feature window).   On the left hand side, we define the origin of the datasource. Here we have selected the service GetHistory from the Thing Template smartTractor. On the right hand side, we define the mapping between the service’s output and the skinny .csv file columns. Circled in grey are the output from the service. Circled in green are what define the columns in the .csv file. A skinny csv file will have 1 to 3 columns, as follow: One column for the ID. Simply tick the radio button corresponding to the service output that represents the ID One column representing the value of the Thing property. This is indicated by selecting the link icon on the left hand side in front of the returned data which represent the value (in our example the output data from the service is named value) One column representing the Timestamp. This is only needed when a property is time dependant (for example, time series dataset). On the example the property is Distance, the distance covered by the tyre does depend on the time, however we would not have a timestamp for the TyreSize property as the wheel size will remain the same. How many columns should we have (and therefore how many output should our service has)? The .csv file representing the ID will have one column, and therefore the service collecting the ID returns only one output (Thing name in our smartTractor example – not shown here but is available in the download) Properties that are not time bound will have a csv file with 2 columns, one for the ID and one for the value of the property. Properties that are time bound will have 3 columns: 1 for the ID, one for the value and one for the timestamp. Therefore the service will have 3 outputs.   Additionally the input for the service may need to be configured, by clicking on the icon.   Once the datasources are configured, you should configure the Time Sampling Interval in the General Information tab. This sampling interval will be used by DataConnect to synchronize all the skinny csv files. See the Help Center for a good explanation on this.   DAD Execution Once the above configuration is done, the DAD can be executed to collect properties’ value already logged on the ThingWorx platform. Select Execution Settings in the DAD and enter the time range for property collection:   A dataset with the same name as the DAD is then created in DataConnect as well as in ThingWorx Analytics Server Dataconnect:     ThingWorx Analytics:   The dataset can then be processed as any other dataset inside ThingWorx Analytics.   Demo files   For convenience I have also attached a ThingWorx entities export that can be imported into a ThingWorx platform for you to take a closer look at the setup discussed in this blog. Attached is also a small simulator to populate the properties of the Tractor_1 Thing. The usage is : java -jar TWXTyreSimulatorClient.jar  hostname_or_IP port AppKey For example: java -jar TWXTyreSimulatorClient.jar 192.168.56.106 8080 d82510b7-5538-449c-af13-0bb60e01a129   Again feel free to share your experience in the comments below as they will be very interesting for all of us. Thank you
View full tip
Announcements