Hi
Does anyone have any idea how to resolve the following error in Anomaly training:
Returning a FAILED state for TimedValue [timestamp=1551249937059, value=29.32]. ThingWatcherMessage [timestamp=2019-02-27T08:45:34.678, severity=ERROR, state=ThingWatcherInternalState [internal=OBTAINING_MODEL, external=TRAINING], messageCode=WAT1001E, messageText=Operation exception. {Throwable=[Trainer.TrainingJobErrorException: Training job with id null entered into an incomplete state [UNKNOWN] unable to process models. Error message: [Job was queued but is no longer being considered.]}]]_com.thingworx.analytics.thingwatcher.exceptions.ThingWatcherOperationException: Error accessing PMML Model URI, cannot download model_ at com.thingworx.analytics.thingwatcher.ThingWatcherInternal.isModelAvailableAfterTraining(ThingWatcherInternal.java:699)_ at com.thingworx.analytics.thingwatcher.ThingWatcherInternal.runStateMachineFor(ThingWatcherInternal.java:460)_ at com.thingworx.analytics.thingwatcher.ThingWatcherInternal.bareMonitor(ThingWatcherInternal.java:352)_ at com.thingworx.analytics.thingwatcher.ThingWatcherInternal.monitor(ThingWatcherInternal.java:279)_ at com.thingworx.analytics.thingwatcher.ThingWatcher.monitor(ThingWatcher.java:53)_ at com.thingworx.system.subsystems.alerts.anomalyalert.AnomalyMonitor.monitor(AnomalyMonitor.java:258)_ at com.thingworx.system.subsystems.alerts.data.Alert.doAnomalyEvaluation(Alert.java:570)_ at com.thingworx.system.subsystems.alerts.handlers.NumberAlert.evaluateAlarm(NumberAlert.java:138)_ at com.thingworx.system.subsystems.alerts.data.AlertList.checkAlerts(AlertList.java:92)_ at com.thingworx.things.Thing.rawSetPropertyVTQ(Thing.java:4241)_ at com.thingworx.things.Thing.rawSetPropertyVTQ(Thing.java:4063)_ at com.thingworx.things.Thing.handleSetPropertyWithoutUpdate(Thing.java:4307)_ at com.thingworx.things.Thing.internalSetPropertyVTQ(Thing.java:4298)_ at com.thingworx.things.Thing.setPropertyVTQ(Thing.java:4432)_ at com.thingworx.things.Thing.forcePropertiesVTQ(Thing.java:4322)_ at com.thingworx.things.Thing.UpdatePropertyValues(Thing.java:5573)_ at com.thingworx.system.subsystems.federation.FederationSubsystem.ProcessRemotePropertyUpdates(FederationSubsystem.java:656)_ at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source)_ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)_ at java.lang.reflect.Method.invoke(Method.java:498)_ at com.thingworx.common.processors.ReflectionProcessor.processService(ReflectionProcessor.java:261)_ at com.thingworx.handlers.ReflectionServiceHandler.processService(ReflectionServiceHandler.java:50)_ at com.thingworx.handlers.ServiceHandlerBase.processServiceWithMetrics(ServiceHandlerBase.java:48)_ at com.thingworx.entities.helpers.InternalServiceHelper.processServiceRequestDirect(InternalServiceHelper.java:117)_ at com.thingworx.entities.helpers.InternalServiceHelper.processAPIServiceRequest(InternalServiceHelper.java:88)_ at com.thingworx.entities.ServiceProviderEntity.processAPIServiceRequest(ServiceProviderEntity.java:66)_ at com.thingworx.webservices.processors.APIProcessor.executeService(APIProcessor.java:331)_ at com.thingworx.webservices.processors.APIProcessor.dispatchRequest(APIProcessor.java:105)_ at com.thingworx.system.subsystems.wsexecution.processor.WSExecutionInstance.run(WSExecutionInstance.java:49)_ at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)_ at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)_ at java.lang.Thread.run(Thread.java:748)_Caused by: com.thingworx.analytics.thingwatcher.Trainer$TrainingJobErrorException: Training job with id null entered into an incomplete state [UNKNOWN] unable to process models. Error message: [Job was queued but is no longer being considered.]_ at com.thingworx.analytics.thingwatcher.Trainer.lambda$getPMMLModelResultId$1(Trainer.java:132)_ at java.util.Optional.orElseThrow(Optional.java:290)_ at com.thingworx.analytics.thingwatcher.Trainer.getPMMLModelResultId(Trainer.java:132)_ at com.thingworx.analytics.thingwatcher.ThingWatcherInternal.isModelAvailableAfterTraining(ThingWatcherInternal.java:680)_ ... 31 more_
Analytics 8.4 is installed on a separate Ubuntu server.
When accessing:
http://<IPAddress>:9400/training,
I get the following response:
{"values":[{"status":{"state":"UNKNOWN","message":"Job was queued but is no longer being considered.","startDateTime":null,"endDateTime":null,"runTime":null,"queuedStartDateTime":null,"queuedDuration":null,"id":"a4a6b9a1-65a0-423e-b1f1-5a32fb9c73bd","results":{"resultUri":null,"validationUri":null}},"config":{"goalName":"Goal","dataSource":{"dataRef":{"uri":"body:///","format":"csv","hasHeader":true},"schemaRef":{"uri":"body:///","format":"json","metadata":[{"fieldName":"EntityId","values":[],"range":null,"dataType":"STRING","opType":"ENTITY_ID","timeSamplingInterval":null,"isStatic":true},{"fieldName":"Time","values":[],"range":null,"dataType":"LONG","opType":"TEMPORAL","timeSamplingInterval":1078,"isStatic":true},{"fieldName":"Goal","values":[],"range":null,"dataType":"DOUBLE","opType":"CONTINUOUS","timeSamplingInterval":null,"isStatic":false}]},"filter":null,"exclusions":[]},"learners":[{"learningTechnique":"NEURAL_NET","args":{"maxDepth":null,"maxAllowedFields":null,"useRedundancyFilter":false,"removeDuplicatesAndUniformColumns":null,"numberOfIterations":null,"layerCount":null,"hiddenUnitPercentage":null,"treeCount":null}}],"modelName":"AnomalyDetectionModelFor_ce7c75c6-12e0-4bb0-8148-16a2b18c0f19_1551249921892","validationHoldoutPercentage":0.0,"ensembleTechnique":"SOLOIST","comparisonMetric":"PEARSONS","description":"ThingWatcher Training Statistics: {trainingDataCollectionStartTime=1551249295904, trainingDataCollectionEndTime=1551249742196}","timeSeries":{"lookbackSize":0,"lookahead":1,"useGoalHistory":true},"samplingParams":null,"anomalyDetectionParams":{"numberOfDataPointsForTraining":415,"numberOfDataPointsPerCycle":53},"maxAllowedFields":25,"useRedundancyFilter":false,"tags":[]}},{"status":{"state":"UNKNOWN","message":"Job was queued but is no longer being considered.","startDateTime":null,"endDateTime":null,"runTime":null,"queuedStartDateTime":null,"queuedDuration":null,"id":"1326152f-ce0f-4084-ab27-e1507fad1295","results":{"resultUri":null,"validationUri":null}},"config":{"goalName":"Goal","dataSource":{"dataRef":{"uri":"body:///","format":"csv","hasHeader":true},"schemaRef":{"uri":"body:///","format":"json","metadata":[{"fieldName":"EntityId","values":[],"range":null,"dataType":"STRING","opType":"ENTITY_ID","timeSamplingInterval":null,"isStatic":true},{"fieldName":"Time","values":[],"range":null,"dataType":"LONG","opType":"TEMPORAL","timeSamplingInterval":1063,"isStatic":true},{"fieldName":"Goal","values":[],"range":null,"dataType":"DOUBLE","opType":"CONTINUOUS","timeSamplingInterval":null,"isStatic":false}]},"filter":null,"exclusions":[]},"learners":[{"learningTechnique":"NEURAL_NET","args":{"maxDepth":null,"maxAllowedFields":null,"useRedundancyFilter":false,"removeDuplicatesAndUniformColumns":null,"numberOfIterations":null,"layerCount":null,"hiddenUnitPercentage":null,"treeCount":null}}],"modelName":"AnomalyDetectionModelFor_2347889a-7921-4f25-9d3a-603f7e60ca37_1551248819082","validationHoldoutPercentage":0.0,"ensembleTechnique":"SOLOIST","comparisonMetric":"PEARSONS","description":"ThingWatcher Training Statistics: {trainingDataCollectionStartTime=1551248195721, trainingDataCollectionEndTime=1551248678323}","timeSeries":{"lookbackSize":0,"lookahead":1,"useGoalHistory":true},"samplingParams":null,"anomalyDetectionParams":{"numberOfDataPointsForTraining":455,"numberOfDataPointsPerCycle":42},"maxAllowedFields":25,"useRedundancyFilter":false,"tags":[]}},{"status":{"state":"UNKNOWN","message":"Job was queued but is no longer being considered.","startDateTime":null,"endDateTime":null,"runTime":null,"queuedStartDateTime":null,"queuedDuration":null,"id":"bcae59ac-97fc-439e-a930-468d74dc133b","results":{"resultUri":null,"validationUri":null}},"config":{"goalName":"Goal","dataSource":{"dataRef":{"uri":"body:///","format":"csv","hasHeader":true},"schemaRef":{"uri":"body:///","format":"json","metadata":[{"fieldName":"EntityId","values":[],"range":null,"dataType":"STRING","opType":"ENTITY_ID","timeSamplingInterval":null,"isStatic":true},{"fieldName":"Time","values":[],"range":null,"dataType":"LONG","opType":"TEMPORAL","timeSamplingInterval":1063,"isStatic":true},{"fieldName":"Goal","values":[],"range":null,"dataType":"DOUBLE","opType":"CONTINUOUS","timeSamplingInterval":null,"isStatic":false}]},"filter":null,"exclusions":[]},"learners":[{"learningTechnique":"NEURAL_NET","args":{"maxDepth":null,"maxAllowedFields":null,"useRedundancyFilter":false,"removeDuplicatesAndUniformColumns":null,"numberOfIterations":null,"layerCount":null,"hiddenUnitPercentage":null,"treeCount":null}}],"modelName":"AnomalyDetectionModelFor_1e70e3c0-790d-4610-b6f0-dab86b85c6d2_1551190010924","validationHoldoutPercentage":0.0,"ensembleTechnique":"SOLOIST","comparisonMetric":"PEARSONS","description":"ThingWatcher Training Statistics: {trainingDataCollectionStartTime=1551189383120, trainingDataCollectionEndTime=1551189814698}","timeSeries":{"lookbackSize":0,"lookahead":1,"useGoalHistory":true},"samplingParams":null,"anomalyDetectionParams":{"numberOfDataPointsForTraining":407,"numberOfDataPointsPerCycle":56},"maxAllowedFields":25,"useRedundancyFilter":false,"tags":[]}},{"status":{"state":"FAILED","message":"java.io.FileNotFoundException: /tmp/blockmgr-977a7b0a-6b4f-49e1-96da-63871bb028d6/13/shuffle_0_21_0.data.aee1c6cb-f8d4-48c5-8774-3ba9b9b56b09 (No space left on device)","startDateTime":"2019-02-26T12:56:24.761Z","endDateTime":"2019-02-26T12:56:31.075Z","runTime":"0:00:06.314","queuedStartDateTime":"2019-02-26T12:50:10.491Z","queuedDuration":"0:06:14.270","id":"b0541a28-95d1-4bb9-a3a4-f59e3b240838","results":{"resultUri":null,"validationUri":null}},"config":{"goalName":"Goal","dataSource":{"dataRef":{"uri":"body:///","format":"csv","hasHeader":true},"schemaRef":{"uri":"body:///","format":"json","metadata":[{"fieldName":"EntityId","values":[],"range":null,"dataType":"STRING","opType":"ENTITY_ID","timeSamplingInterval":null,"isStatic":true},{"fieldName":"Time","values":[],"range":null,"dataType":"LONG","opType":"TEMPORAL","timeSamplingInterval":1061,"isStatic":true},{"fieldName":"Goal","values":[],"range":null,"dataType":"DOUBLE","opType":"CONTINUOUS","timeSamplingInterval":null,"isStatic":false}]},"filter":null,"exclusions":[]},"learners":[{"learningTechnique":"NEURAL_NET","args":{"maxDepth":null,"maxAllowedFields":null,"useRedundancyFilter":false,"removeDuplicatesAndUniformColumns":null,"numberOfIterations":null,"layerCount":null,"hiddenUnitPercentage":null,"treeCount":null}}],"modelName":"AnomalyDetectionModelFor_f284a5cb-f266-4089-85e5-4c5fc7235b97_1551185390979","validationHoldoutPercentage":0.0,"ensembleTechnique":"SOLOIST","comparisonMetric":"PEARSONS","description":"ThingWatcher Training Statistics: {trainingDataCollectionStartTime=1551184771332, trainingDataCollectionEndTime=1551185239233}","timeSeries":{"lookbackSize":0,"lookahead":1,"useGoalHistory":true},"samplingParams":null,"anomalyDetectionParams":{"numberOfDataPointsForTraining":442,"numberOfDataPointsPerCycle":68},"maxAllowedFields":25,"useRedundancyFilter":false,"tags":[]}}],"total":4,"next":null,"previous":null}
Regards
Brad
Solved! Go to Solution.
Hi @BradC
The training output shows one critical error for one of the attempt:
java.io.FileNotFoundException: /tmp/blockmgr-977a7b0a-6b4f-49e1-96da-63871bb028d6/13/shuffle_0_21_0.data.aee1c6cb-f8d4-48c5-8774-3ba9b9b56b09 (No space left on device)
Could you make sure you have enough disk space especially on /tmp partition on the Analytics server.
If you still have issue, could you please :
- try creating a training model directly in Analytics Builder, using for example the beanpro demo dataset, to confirm that this works.
If this works, then repeat the error in Anomaly alert and
- upload the directory <ThingWorxAnalyticsServer>/data/logs
- send the output of service QueryNumberPropertyHistory for the property you are monitoring for the time span of the test.
Thank you
Kind regards
Christophe
Hi @BradC
The training output shows one critical error for one of the attempt:
java.io.FileNotFoundException: /tmp/blockmgr-977a7b0a-6b4f-49e1-96da-63871bb028d6/13/shuffle_0_21_0.data.aee1c6cb-f8d4-48c5-8774-3ba9b9b56b09 (No space left on device)
Could you make sure you have enough disk space especially on /tmp partition on the Analytics server.
If you still have issue, could you please :
- try creating a training model directly in Analytics Builder, using for example the beanpro demo dataset, to confirm that this works.
If this works, then repeat the error in Anomaly alert and
- upload the directory <ThingWorxAnalyticsServer>/data/logs
- send the output of service QueryNumberPropertyHistory for the property you are monitoring for the time span of the test.
Thank you
Kind regards
Christophe
Hi @cmorfin
Thanks for spotting the memory issue.
It appears that the Analytics instance was not configured with the correct amount of memory. By allocating enough memory the Anomaly was able to train successfully.
Regards,
Brad
 
					
				
				
			
		
