This blog is intended to help diagnose and fix the most common issues that may be encountered when working with ThingWatcher. It cannot be stressed strongly enough that you should be familiar with your data including the average time interval between data points, and the collection duration and certainty threshold you specified.
Before you start troubleshooting ThingWatcher, check that result and training microservices is running. For testing result microservices open a web browser and paste result URL;
http://<IP of microservices>:<Port of results microservices>/results/models
(e.g., http://localhost:8096/results/models)
For testing training microservices open a web browser and paste training URL;
http://<IP of microservices>:<Port of training microservices>/training
(e.g., http://localhost:8091/training)
If you see either: {"values":[],"total":0,"next":null,"previous":null} or a list of training jobs in JSON format, this means the result and training microservice service is available.
1. Question. I haven't seen an anomaly but I believe that my 'property' is anomalous?
This can be caused by different reasons, here are the most common causes:
- The certainty is too high. If the certainty is too high ThingWatcher is conservative in its categorization of "true positives" and therefore may emit more "false negatives". Reducing the certainty will change this behavior but note that ThingWatcher may now categorize too many "false positives" as a result. In other words, ThingWatcher may detect the desired anomalies but also some non-anomalies.
- The 'property' is anomalous during training data collection. If ThingWatcher creates a predictive model from anomalous data, it may not be able to detect the desired anomalies during MONITORING because the data does not really appear to be anomalous. So ThingWatcher treats this pattern as 'normal'. Therefore, ensure that 'property' values are also non-anomalous during training.
- There are long time gaps during the monitoring state so ThingWatcher stays in Buffering and categorizes these data points as non-anomalous.
2. Question. ThingWatcher detects an anomaly but my 'property' is non-anomalous?
- The certainty might be too low. In this case, ThingWatcher reports anomalies when the incoming data pattern looks even slightly different from the expected data pattern.
- ThingWatcher might need more training data. If the 'property' data has a pattern that occurs over a long time span, ThingWatcher needs to collect multiple cycles of all these patterns in order to detect a true anomaly without emitting too many false positives.
3. Question. ThingWatcher is in FAILED State, why?
There are many possible reasons for a failed state, here are the most likely problems that can cause a failed state.
- ThingWatcher emits a FAILED ThingWatcher State because the training service has not been setup or is down. similarly, the result service is not available. NotemessageText=Unexpected exception. {Throwable=[ConnectException: Operation timed out}]]messageText=Unexpected exception. {Throwable=[ConnectException: Connection refused}]]. Note that ThingWatcher is still able to collect all training data and you will only begin to see these failed states after ThingWatcher tried to post the training request.
- ThingWatcher emits a FAILED ThingWatcher State because time gaps prevent the data collection for training.You will see this warning in the log messages : "A long time gap was detected in the data that is greater than the threshold of {n}". This means you have a long gap in the training data and ThingWatcher will recollect the data. If there are more than 3 recollections due to a long time gap, ThingWatcher transitions to a failed state and will not be able to recover. In this case you can either instruct ThingWatcher to retrain and try again or check the data source to make sure it does not have long gaps.
4. Question. Why does ThingWatcher remain in Buffering?
- There are many possible reasons for ThingWatcher to remain in Buffering, but the most likely issue is time gaps which cause ThingWatcher to remain stuck in Buffering. If the incoming data regularly contains long time gaps, you will notice that ThingWatcher keeps alternating between the monitoring and buffering states. You may need to provide better quality data i.e. more evenly spaced data.
Source: Alex Meng, Specialist Software Engineer