cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - Did you get an answer that solved your problem? Please mark it as an Accepted Solution so others with the same problem can find the answer easily. X

Anomaly Training Failure Analytics 8.4

BradC
7-Bedrock

Anomaly Training Failure Analytics 8.4

Hi 

 

Does anyone have any idea how to resolve the following error in Anomaly training:

 

Returning a FAILED state for TimedValue [timestamp=1551249937059, value=29.32]. ThingWatcherMessage [timestamp=2019-02-27T08:45:34.678, severity=ERROR, state=ThingWatcherInternalState [internal=OBTAINING_MODEL, external=TRAINING], messageCode=WAT1001E, messageText=Operation exception. {Throwable=[Trainer.TrainingJobErrorException: Training job with id null entered into an incomplete state [UNKNOWN] unable to process models. Error message: [Job was queued but is no longer being considered.]}]]_com.thingworx.analytics.thingwatcher.exceptions.ThingWatcherOperationException: Error accessing PMML Model URI, cannot download model_ at com.thingworx.analytics.thingwatcher.ThingWatcherInternal.isModelAvailableAfterTraining(ThingWatcherInternal.java:699)_ at com.thingworx.analytics.thingwatcher.ThingWatcherInternal.runStateMachineFor(ThingWatcherInternal.java:460)_ at com.thingworx.analytics.thingwatcher.ThingWatcherInternal.bareMonitor(ThingWatcherInternal.java:352)_ at com.thingworx.analytics.thingwatcher.ThingWatcherInternal.monitor(ThingWatcherInternal.java:279)_ at com.thingworx.analytics.thingwatcher.ThingWatcher.monitor(ThingWatcher.java:53)_ at com.thingworx.system.subsystems.alerts.anomalyalert.AnomalyMonitor.monitor(AnomalyMonitor.java:258)_ at com.thingworx.system.subsystems.alerts.data.Alert.doAnomalyEvaluation(Alert.java:570)_ at com.thingworx.system.subsystems.alerts.handlers.NumberAlert.evaluateAlarm(NumberAlert.java:138)_ at com.thingworx.system.subsystems.alerts.data.AlertList.checkAlerts(AlertList.java:92)_ at com.thingworx.things.Thing.rawSetPropertyVTQ(Thing.java:4241)_ at com.thingworx.things.Thing.rawSetPropertyVTQ(Thing.java:4063)_ at com.thingworx.things.Thing.handleSetPropertyWithoutUpdate(Thing.java:4307)_ at com.thingworx.things.Thing.internalSetPropertyVTQ(Thing.java:4298)_ at com.thingworx.things.Thing.setPropertyVTQ(Thing.java:4432)_ at com.thingworx.things.Thing.forcePropertiesVTQ(Thing.java:4322)_ at com.thingworx.things.Thing.UpdatePropertyValues(Thing.java:5573)_ at com.thingworx.system.subsystems.federation.FederationSubsystem.ProcessRemotePropertyUpdates(FederationSubsystem.java:656)_ at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source)_ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)_ at java.lang.reflect.Method.invoke(Method.java:498)_ at com.thingworx.common.processors.ReflectionProcessor.processService(ReflectionProcessor.java:261)_ at com.thingworx.handlers.ReflectionServiceHandler.processService(ReflectionServiceHandler.java:50)_ at com.thingworx.handlers.ServiceHandlerBase.processServiceWithMetrics(ServiceHandlerBase.java:48)_ at com.thingworx.entities.helpers.InternalServiceHelper.processServiceRequestDirect(InternalServiceHelper.java:117)_ at com.thingworx.entities.helpers.InternalServiceHelper.processAPIServiceRequest(InternalServiceHelper.java:88)_ at com.thingworx.entities.ServiceProviderEntity.processAPIServiceRequest(ServiceProviderEntity.java:66)_ at com.thingworx.webservices.processors.APIProcessor.executeService(APIProcessor.java:331)_ at com.thingworx.webservices.processors.APIProcessor.dispatchRequest(APIProcessor.java:105)_ at com.thingworx.system.subsystems.wsexecution.processor.WSExecutionInstance.run(WSExecutionInstance.java:49)_ at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)_ at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)_ at java.lang.Thread.run(Thread.java:748)_Caused by: com.thingworx.analytics.thingwatcher.Trainer$TrainingJobErrorException: Training job with id null entered into an incomplete state [UNKNOWN] unable to process models. Error message: [Job was queued but is no longer being considered.]_ at com.thingworx.analytics.thingwatcher.Trainer.lambda$getPMMLModelResultId$1(Trainer.java:132)_ at java.util.Optional.orElseThrow(Optional.java:290)_ at com.thingworx.analytics.thingwatcher.Trainer.getPMMLModelResultId(Trainer.java:132)_ at com.thingworx.analytics.thingwatcher.ThingWatcherInternal.isModelAvailableAfterTraining(ThingWatcherInternal.java:680)_ ... 31 more_

Analytics 8.4 is installed on a separate Ubuntu server.

 

When accessing: 

 

http://<IPAddress>:9400/training,

 

I get the following response:

 

{"values":[{"status":{"state":"UNKNOWN","message":"Job was queued but is no longer being considered.","startDateTime":null,"endDateTime":null,"runTime":null,"queuedStartDateTime":null,"queuedDuration":null,"id":"a4a6b9a1-65a0-423e-b1f1-5a32fb9c73bd","results":{"resultUri":null,"validationUri":null}},"config":{"goalName":"Goal","dataSource":{"dataRef":{"uri":"body:///","format":"csv","hasHeader":true},"schemaRef":{"uri":"body:///","format":"json","metadata":[{"fieldName":"EntityId","values":[],"range":null,"dataType":"STRING","opType":"ENTITY_ID","timeSamplingInterval":null,"isStatic":true},{"fieldName":"Time","values":[],"range":null,"dataType":"LONG","opType":"TEMPORAL","timeSamplingInterval":1078,"isStatic":true},{"fieldName":"Goal","values":[],"range":null,"dataType":"DOUBLE","opType":"CONTINUOUS","timeSamplingInterval":null,"isStatic":false}]},"filter":null,"exclusions":[]},"learners":[{"learningTechnique":"NEURAL_NET","args":{"maxDepth":null,"maxAllowedFields":null,"useRedundancyFilter":false,"removeDuplicatesAndUniformColumns":null,"numberOfIterations":null,"layerCount":null,"hiddenUnitPercentage":null,"treeCount":null}}],"modelName":"AnomalyDetectionModelFor_ce7c75c6-12e0-4bb0-8148-16a2b18c0f19_1551249921892","validationHoldoutPercentage":0.0,"ensembleTechnique":"SOLOIST","comparisonMetric":"PEARSONS","description":"ThingWatcher Training Statistics: {trainingDataCollectionStartTime=1551249295904, trainingDataCollectionEndTime=1551249742196}","timeSeries":{"lookbackSize":0,"lookahead":1,"useGoalHistory":true},"samplingParams":null,"anomalyDetectionParams":{"numberOfDataPointsForTraining":415,"numberOfDataPointsPerCycle":53},"maxAllowedFields":25,"useRedundancyFilter":false,"tags":[]}},{"status":{"state":"UNKNOWN","message":"Job was queued but is no longer being considered.","startDateTime":null,"endDateTime":null,"runTime":null,"queuedStartDateTime":null,"queuedDuration":null,"id":"1326152f-ce0f-4084-ab27-e1507fad1295","results":{"resultUri":null,"validationUri":null}},"config":{"goalName":"Goal","dataSource":{"dataRef":{"uri":"body:///","format":"csv","hasHeader":true},"schemaRef":{"uri":"body:///","format":"json","metadata":[{"fieldName":"EntityId","values":[],"range":null,"dataType":"STRING","opType":"ENTITY_ID","timeSamplingInterval":null,"isStatic":true},{"fieldName":"Time","values":[],"range":null,"dataType":"LONG","opType":"TEMPORAL","timeSamplingInterval":1063,"isStatic":true},{"fieldName":"Goal","values":[],"range":null,"dataType":"DOUBLE","opType":"CONTINUOUS","timeSamplingInterval":null,"isStatic":false}]},"filter":null,"exclusions":[]},"learners":[{"learningTechnique":"NEURAL_NET","args":{"maxDepth":null,"maxAllowedFields":null,"useRedundancyFilter":false,"removeDuplicatesAndUniformColumns":null,"numberOfIterations":null,"layerCount":null,"hiddenUnitPercentage":null,"treeCount":null}}],"modelName":"AnomalyDetectionModelFor_2347889a-7921-4f25-9d3a-603f7e60ca37_1551248819082","validationHoldoutPercentage":0.0,"ensembleTechnique":"SOLOIST","comparisonMetric":"PEARSONS","description":"ThingWatcher Training Statistics: {trainingDataCollectionStartTime=1551248195721, trainingDataCollectionEndTime=1551248678323}","timeSeries":{"lookbackSize":0,"lookahead":1,"useGoalHistory":true},"samplingParams":null,"anomalyDetectionParams":{"numberOfDataPointsForTraining":455,"numberOfDataPointsPerCycle":42},"maxAllowedFields":25,"useRedundancyFilter":false,"tags":[]}},{"status":{"state":"UNKNOWN","message":"Job was queued but is no longer being considered.","startDateTime":null,"endDateTime":null,"runTime":null,"queuedStartDateTime":null,"queuedDuration":null,"id":"bcae59ac-97fc-439e-a930-468d74dc133b","results":{"resultUri":null,"validationUri":null}},"config":{"goalName":"Goal","dataSource":{"dataRef":{"uri":"body:///","format":"csv","hasHeader":true},"schemaRef":{"uri":"body:///","format":"json","metadata":[{"fieldName":"EntityId","values":[],"range":null,"dataType":"STRING","opType":"ENTITY_ID","timeSamplingInterval":null,"isStatic":true},{"fieldName":"Time","values":[],"range":null,"dataType":"LONG","opType":"TEMPORAL","timeSamplingInterval":1063,"isStatic":true},{"fieldName":"Goal","values":[],"range":null,"dataType":"DOUBLE","opType":"CONTINUOUS","timeSamplingInterval":null,"isStatic":false}]},"filter":null,"exclusions":[]},"learners":[{"learningTechnique":"NEURAL_NET","args":{"maxDepth":null,"maxAllowedFields":null,"useRedundancyFilter":false,"removeDuplicatesAndUniformColumns":null,"numberOfIterations":null,"layerCount":null,"hiddenUnitPercentage":null,"treeCount":null}}],"modelName":"AnomalyDetectionModelFor_1e70e3c0-790d-4610-b6f0-dab86b85c6d2_1551190010924","validationHoldoutPercentage":0.0,"ensembleTechnique":"SOLOIST","comparisonMetric":"PEARSONS","description":"ThingWatcher Training Statistics: {trainingDataCollectionStartTime=1551189383120, trainingDataCollectionEndTime=1551189814698}","timeSeries":{"lookbackSize":0,"lookahead":1,"useGoalHistory":true},"samplingParams":null,"anomalyDetectionParams":{"numberOfDataPointsForTraining":407,"numberOfDataPointsPerCycle":56},"maxAllowedFields":25,"useRedundancyFilter":false,"tags":[]}},{"status":{"state":"FAILED","message":"java.io.FileNotFoundException: /tmp/blockmgr-977a7b0a-6b4f-49e1-96da-63871bb028d6/13/shuffle_0_21_0.data.aee1c6cb-f8d4-48c5-8774-3ba9b9b56b09 (No space left on device)","startDateTime":"2019-02-26T12:56:24.761Z","endDateTime":"2019-02-26T12:56:31.075Z","runTime":"0:00:06.314","queuedStartDateTime":"2019-02-26T12:50:10.491Z","queuedDuration":"0:06:14.270","id":"b0541a28-95d1-4bb9-a3a4-f59e3b240838","results":{"resultUri":null,"validationUri":null}},"config":{"goalName":"Goal","dataSource":{"dataRef":{"uri":"body:///","format":"csv","hasHeader":true},"schemaRef":{"uri":"body:///","format":"json","metadata":[{"fieldName":"EntityId","values":[],"range":null,"dataType":"STRING","opType":"ENTITY_ID","timeSamplingInterval":null,"isStatic":true},{"fieldName":"Time","values":[],"range":null,"dataType":"LONG","opType":"TEMPORAL","timeSamplingInterval":1061,"isStatic":true},{"fieldName":"Goal","values":[],"range":null,"dataType":"DOUBLE","opType":"CONTINUOUS","timeSamplingInterval":null,"isStatic":false}]},"filter":null,"exclusions":[]},"learners":[{"learningTechnique":"NEURAL_NET","args":{"maxDepth":null,"maxAllowedFields":null,"useRedundancyFilter":false,"removeDuplicatesAndUniformColumns":null,"numberOfIterations":null,"layerCount":null,"hiddenUnitPercentage":null,"treeCount":null}}],"modelName":"AnomalyDetectionModelFor_f284a5cb-f266-4089-85e5-4c5fc7235b97_1551185390979","validationHoldoutPercentage":0.0,"ensembleTechnique":"SOLOIST","comparisonMetric":"PEARSONS","description":"ThingWatcher Training Statistics: {trainingDataCollectionStartTime=1551184771332, trainingDataCollectionEndTime=1551185239233}","timeSeries":{"lookbackSize":0,"lookahead":1,"useGoalHistory":true},"samplingParams":null,"anomalyDetectionParams":{"numberOfDataPointsForTraining":442,"numberOfDataPointsPerCycle":68},"maxAllowedFields":25,"useRedundancyFilter":false,"tags":[]}}],"total":4,"next":null,"previous":null}

Regards

Brad

1 ACCEPTED SOLUTION

Accepted Solutions
cmorfin
19-Tanzanite
(To:BradC)

Hi @BradC 

 

The training output shows one critical error for one of the attempt:

java.io.FileNotFoundException: /tmp/blockmgr-977a7b0a-6b4f-49e1-96da-63871bb028d6/13/shuffle_0_21_0.data.aee1c6cb-f8d4-48c5-8774-3ba9b9b56b09 (No space left on device)

 

Could you make sure you have enough disk space especially on /tmp partition on the Analytics server.

 

If you still have issue, could you please :

- try creating a training model directly in Analytics Builder, using for example the beanpro demo dataset, to confirm that this works.

If this works, then repeat the error in Anomaly alert and

- upload the directory <ThingWorxAnalyticsServer>/data/logs

- send the output of service QueryNumberPropertyHistory for the property you are monitoring for the time span of the test.

 

Thank you

Kind regards

Christophe

 

View solution in original post

2 REPLIES 2
cmorfin
19-Tanzanite
(To:BradC)

Hi @BradC 

 

The training output shows one critical error for one of the attempt:

java.io.FileNotFoundException: /tmp/blockmgr-977a7b0a-6b4f-49e1-96da-63871bb028d6/13/shuffle_0_21_0.data.aee1c6cb-f8d4-48c5-8774-3ba9b9b56b09 (No space left on device)

 

Could you make sure you have enough disk space especially on /tmp partition on the Analytics server.

 

If you still have issue, could you please :

- try creating a training model directly in Analytics Builder, using for example the beanpro demo dataset, to confirm that this works.

If this works, then repeat the error in Anomaly alert and

- upload the directory <ThingWorxAnalyticsServer>/data/logs

- send the output of service QueryNumberPropertyHistory for the property you are monitoring for the time span of the test.

 

Thank you

Kind regards

Christophe

 

BradC
7-Bedrock
(To:cmorfin)

Hi @cmorfin 

 

Thanks for spotting the memory issue.

It appears that the Analytics instance was not configured with the correct amount of memory. By allocating enough memory the Anomaly was able to train successfully.

 

Regards,

Brad 

Top Tags