BGMS Stalls - Quits processing queue tasks, publishing, anything
I've been tracking a vexing issue for a few weeks now. I would get a note from the users that they are not seeing emails, publishing has stopped or workflows are not working. Quick check on System monitor shows red lights on BGMS:
Threshold of average active method contexts exceeded; average active contexts: 135.0001
It appears as if no queue jobs are transacting. This will persist until I whack the BGMS process and it auto-restarts. Queue jobs start processing normally and there is an email flood. What is strange is this often occurs at night or weekends. I am checking with IT to make sure this is not an external job that is running. I would think if it was DB related, it would settle back down once issue is resolved. I am still investigating issues on that front. Here is the status messages before the event:
2023-05-27 10:05:33,376 INFO [wt.jmx.core.SharedScheduledExecutor.worker] wt.method.MethodContextMonitor.stats.summary - ElapsedSeconds=600.001357667, CompletedContexts=2844, ErrorCount=0, RedirectCount=0, ContextSecondsAverage=0.0010882164483122362, ContextSecondsMax=0.170925781, ContextCpuSecondsAverage=3.7317970253164556E-4, ContextUserSecondsAverage=3.023909985935302E-4, ContextJDBCConnWaitSecondsAverage=9.443747187060478E-6, ActiveContextsAverage=33.00515865339878, ActiveContextsMax=36, ActiveContextsStart=33, ActiveContextsEnd=33, AverageJDBCCallsPerContext=2.2988748241912798, AverageJNDICallsPerContext=0.0, AverageRemoteCacheCallsPerContext=0.015119549929676512, PercentageOfContextTimeInJDBCCalls=69.4537942697918, PercentageOfContextTimeInJDBCConnWait=0.8678188242520327, PercentageOfContextTimeInJNDICalls=0.0, PercentageOfContextTimeInRemoteCacheCalls=0.46653119479917626
Here is the messages when it occurs:
2023-05-27 10:31:23,396 WARN [JMX Monitor ThreadGroup<main> Executor Pool [Thread-32910]] wt.jmx.notif.methodContextGauge - Time=2023-05-27 10:31:23.393 -0400, Name=MethodContextsGaugeNotifier, SourceObjectName=com.ptc:wt.subsystem=Monitors,wt.monitorType=UserSpecified,monitorName=NumberOfMethodContexts, class=class javax.management.monitor.MonitorNotification, type=jmx.monitor.gauge.high, message=, JVM Name=734509@<hostname>, Database Connections=15, RecentStatistics=[activeContextsAverage=43.69575698079789,activeContextsEnd=61,activeContextsMax=61,activeContextsStart=33,averageBlockedCountPerContext=0.0,averageCORBACallsPerContext=0.0,averageJDBCCallsPerContext=2.088888888888889,averageJNDICallsPerContext=0.0,averageRemoteCacheCallsPerContext=0.022222222222222223,averageWaitedCountPerContext=0.0,completedContexts=45,contextCpuSecondsAverage=7.938385555555555E-4,contextJDBCConnWaitSecondsAverage=1.38434E-5,contextSecondsAverage=0.0035897418444444445,contextSecondsMax=432060.292405267,contextUserSecondsAverage=6.666666666666666E-4,contextsPerSecond=0.5998349031447896,elapsedSeconds=75.020642787,endTime=2023-05-27 10:31:23.395 -0400,errorCount=0,percentageOfContextTimeBlocked=0.0,percentageOfContextTimeInCORBACalls=0.0,percentageOfContextTimeInJDBCCalls=1.3454552408663482E-5,percentageOfContextTimeInJDBCConnWait=2.0362712675106688E-7,percentageOfContextTimeInJNDICalls=0.0,percentageOfContextTimeInRemoteCacheCalls=2.472260154022281E-8,percentageOfContextTimeWaited=0.0,redirectCount=0,startTime=2023-05-27 10:30:08.374 -0400], BaselineStatistics=[activeContextsAverage=17.039334284186697,activeContextsEnd=61,activeContextsMax=79,activeContextsStart=3,averageBlockedCountPerContext=0.02292217978357173,averageCORBACallsPerContext=0.0,averageJDBCCallsPerContext=4.403481477633508,averageJNDICallsPerContext=0.001559566688444229,averageRemoteCacheCallsPerContext=0.08460450215393109,averageWaitedCountPerContext=0.00505408810850307,completedContexts=2637271,contextCpuSecondsAverage=0.0014892850589446438,contextJDBCConnWaitSecondsAverage=1.1264928033144869E-4,contextSecondsAverage=0.046587539902467744,contextSecondsMax=1800.673565344,contextUserSecondsAverage=0.001307286964441652,contextsPerSecond=5.336972950282546,elapsedSeconds=494151.089872093,endTime=2023-05-27 10:31:23.395 -0400,errorCount=12,percentageOfContextTimeBlocked=0.0,percentageOfContextTimeInCORBACalls=0.0,percentageOfContextTimeInJDBCCalls=21.863126897034316,percentageOfContextTimeInJDBCConnWait=0.2418013068886723,percentageOfContextTimeInJNDICalls=0.002312341661670789,percentageOfContextTimeInRemoteCacheCalls=0.04387963380984237,percentageOfContextTimeWaited=0.0,redirectCount=0,startTime=2023-05-21 17:15:32.305 -0400]
2023-05-27 10:34:09,071 INFO [wt.jmx.core.SharedScheduledExecutor.worker] wt.summary.general - JVMName=734509@<hostname>, HeapMemoryUsage=637741752, NonHeapMemoryUsage=325782624, Uptime=494404081, ProcessCpuTime=7944010000000, JITCTime=270947, GarbageCollectors=[G1 Old Generation,G1 Young Generation], GCTime=[577956,10256], GCCount=[928,283], Classes=32564, Threads=182, TotalStartedThreadCount=138596, CodeCachePercentUsage=NaN
2023-05-27 10:35:33,381 WARN [wt.jmx.core.SharedScheduledExecutor.worker] wt.method.MethodContextMonitor.stats.summary - ElapsedSeconds=600.001357878, CompletedContexts=1793, ErrorCount=0, RedirectCount=0, ContextSecondsAverage=9.379451115448968E-4, ContextSecondsMax=0.119802725, ContextCpuSecondsAverage=3.3825659453430004E-4, ContextUserSecondsAverage=2.732849972113776E-4, ContextJDBCConnWaitSecondsAverage=9.276237032905745E-6, ActiveContextsAverage=54.95284711645644, ActiveContextsMax=128, ActiveContextsStart=33, ActiveContextsEnd=128, AverageJDBCCallsPerContext=2.036252091466815, AverageJNDICallsPerContext=0.0, AverageRemoteCacheCallsPerContext=0.015616285554935862, PercentageOfContextTimeInJDBCCalls=68.06158638785061, PercentageOfContextTimeInJDBCConnWait=0.9889957225350618, PercentageOfContextTimeInJNDICalls=0.0, PercentageOfContextTimeInRemoteCacheCalls=0.4693038590843637
2023-05-27 10:35:33,382 WARN [wt.jmx.core.SharedScheduledExecutor.worker] wt.jmx.notif.methodContext - Time=2023-05-27 10:35:33.381 -0400, Name=MethodContextsNotifier, SourceObjectName=com.ptc:wt.subsystem=Monitors,wt.monitorType=MethodContexts, class=class javax.management.Notification, type=wt.method.MethodContextMonitorMBean.maxAverageActiveContextsThreshold.exceeded, userData=54.95284711645644, message=Threshold of average active method contexts exceeded; average active contexts: 54.9528, JVM Name=734509@<hostname>, Database Connections=15, MaxAverageActiveContextsThreshold=37, RecentStatistics=[activeContextsAverage=119.98866890627697,activeContextsEnd=128,activeContextsMax=128,activeContextsStart=66,averageBlockedCountPerContext=0.0,averageCORBACallsPerContext=0.0,averageJDBCCallsPerContext=0.0,averageJNDICallsPerContext=0.0,averageRemoteCacheCallsPerContext=0.0,averageWaitedCountPerContext=0.0,completedContexts=17,contextCpuSecondsAverage=3.0073935294117646E-4,contextJDBCConnWaitSecondsAverage=0.0,contextSecondsAverage=5.642147647058823E-4,contextSecondsMax=432310.279012334,contextUserSecondsAverage=0.0,contextsPerSecond=0.20163700157121983,elapsedSeconds=84.309922621,endTime=2023-05-27 10:35:33.382 -0400,errorCount=0,percentageOfContextTimeBlocked=0.0,percentageOfContextTimeInCORBACalls=0.0,percentageOfContextTimeInJDBCCalls=1.1907240740019307E-5,percentageOfContextTimeInJDBCConnWait=1.956330099324393E-7,percentageOfContextTimeInJNDICalls=0.0,percentageOfContextTimeInRemoteCacheCalls=0.0,percentageOfContextTimeWaited=0.0,redirectCount=0,startTime=2023-05-27 10:34:09.072 -0400], BaselineStatistics=[activeContextsAverage=17.072420586985455,activeContextsEnd=128,activeContextsMax=128,activeContextsStart=3,averageBlockedCountPerContext=0.022922005952320013,averageCORBACallsPerContext=0.0,averageJDBCCallsPerContext=4.4034480836585725,averageJNDICallsPerContext=0.001559554861408923,averageRemoteCacheCallsPerContext=0.08460386055236226,averageWaitedCountPerContext=0.005054049780627166,completedContexts=2637291,contextCpuSecondsAverage=0.001489276805779112,contextJDBCConnWaitSecondsAverage=1.1264842605120177E-4,contextSecondsAverage=0.04658719249507581,contextSecondsMax=1800.673565344,contextUserSecondsAverage=0.001307277050579553,contextsPerSecond=5.334314841123434,elapsedSeconds=494401.076529741,endTime=2023-05-27 10:35:33.382 -0400,errorCount=12,percentageOfContextTimeBlocked=0.0,percentageOfContextTimeInCORBACalls=0.0,percentageOfContextTimeInJDBCCalls=21.86312413239578,percentageOfContextTimeInJDBCConnWait=0.24180127631238674,percentageOfContextTimeInJNDICalls=0.002312341369270306,percentageOfContextTimeInRemoteCacheCalls=0.04387962826117007,percentageOfContextTimeWaited=0.0,redirectCount=0,startTime=2023-05-21 17:15:32.305 -0400]
2023-05-27 10:35:33,382 INFO [wt.jmx.core.SharedScheduledExecutor.worker] com.infoengine.util.IeContextMonitor.stats.summary - ElapsedSeconds=600.002, CompletedContexts=0, ErrorCount=0, ContextSecondsAverage=NaN, ContextSecondsMax=0.0, ContextCpuSecondsAverage=NaN, ContextUserSecondsAverage=NaN, ActiveContextsAverage=0.0, ActiveContextsMax=0, ActiveContextsStart=0, ActiveContextsEnd=0
2023-05-27 10:35:33,383 INFO [wt.jmx.core.SharedScheduledExecutor.worker] com.infoengine.jndi.DirContextMonitor.stats.summary - ServiceName=com.utcaus.EnterpriseLdap, IntervalSeconds=600.002, TotalContexts=0, ErrorCount=0, ContextsPerSecond=0.0, AverageContextSeconds=0.0, LongestContextSeconds=0.0, AverageActiveContexts=0.0, MaxActiveContexts=0
2023-05-27 10:35:33,383 INFO [wt.jmx.core.SharedScheduledExecutor.worker] com.infoengine.jndi.DirContextMonitor.stats.summary - ServiceName=com.utcaus.Ldap, IntervalSeconds=600.003, TotalContexts=0, ErrorCount=0, ContextsPerSecond=0.0, AverageContextSeconds=0.0, LongestContextSeconds=0.0, AverageActiveContexts=0.0, MaxActiveContexts=0
2023-05-27 10:44:09,781 INFO [wt.jmx.core.SharedScheduledExecutor.worker] wt.summary.general - JVMName=734509@<hostname>, HeapMemoryUsage=643160336, NonHeapMemoryUsage=325782496, Uptime=495004791, ProcessCpuTime=7949140000000, JITCTime=270956, GarbageCollectors=[G1 Old Generation,G1 Young Generation], GCTime=[578664,10256], GCCount=[929,283], Classes=32564, Threads=245, TotalStartedThreadCount=138771, CodeCachePercentUsage=NaN
2023-05-27 10:45:33,382 WARN [wt.jmx.core.SharedScheduledExecutor.worker] wt.method.MethodContextMonitor.stats.summary - ElapsedSeconds=600.001896003, CompletedContexts=8, ErrorCount=0, RedirectCount=0, ContextSecondsAverage=0.00578046275, ContextSecondsMax=0.008206355, ContextCpuSecondsAverage=0.0023634275, ContextUserSecondsAverage=0.0025, ContextJDBCConnWaitSecondsAverage=0.0, ActiveContextsAverage=130.0123936803109, ActiveContextsMax=132, ActiveContextsStart=128, ActiveContextsEnd=132, AverageJDBCCallsPerContext=0.0, AverageJNDICallsPerContext=0.0, AverageRemoteCacheCallsPerContext=0.0, PercentageOfContextTimeInJDBCCalls=0.0, PercentageOfContextTimeInJDBCConnWait=0.0, PercentageOfContextTimeInJNDICalls=0.0, PercentageOfContextTimeInRemoteCacheCalls=0.0
2023-05-27 10:45:33,384 WARN [wt.jmx.core.SharedScheduledExecutor.worker] wt.jmx.notif.methodContext - Time=2023-05-27 10:45:33.383 -0400, Name=MethodContextsNotifier, SourceObjectName=com.ptc:wt.subsystem=Monitors,wt.monitorType=MethodContexts, class=class javax.management.Notification, type=wt.method.MethodContextMonitorMBean.maxAverageActiveContextsThreshold.exceeded, userData=130.0123936803109, message=Threshold of average active method contexts exceeded; average active contexts: 130.0124, JVM Name=734509@<hostname>, Database Connections=15, MaxAverageActiveContextsThreshold=37, RecentStatistics=[activeContextsAverage=131.38045394969262,activeContextsEnd=132,activeContextsMax=132,activeContextsStart=131,averageBlockedCountPerContext=0.0,averageCORBACallsPerContext=0.0,averageJDBCCallsPerContext=0.0,averageJNDICallsPerContext=0.0,averageRemoteCacheCallsPerContext=0.0,averageWaitedCountPerContext=0.0,completedContexts=1,contextCpuSecondsAverage=0.002965126,contextJDBCConnWaitSecondsAverage=0.0,contextSecondsAverage=0.007209526,contextSecondsMax=432910.280733025,contextUserSecondsAverage=0.01,contextsPerSecond=0.011961477921314188,elapsedSeconds=83.601709302,endTime=2023-05-27 10:45:33.383 -0400,errorCount=0,percentageOfContextTimeBlocked=0.0,percentageOfContextTimeInCORBACalls=0.0,percentageOfContextTimeInJDBCCalls=1.1796607632116384E-5,percentageOfContextTimeInJDBCConnWait=1.9381533542918624E-7,percentageOfContextTimeInJNDICalls=0.0,percentageOfContextTimeInRemoteCacheCalls=0.0,percentageOfContextTimeWaited=0.0,redirectCount=0,startTime=2023-05-27 10:44:09.782 -0400], BaselineStatistics=[activeContextsAverage=17.20931762561805,activeContextsEnd=132,activeContextsMax=132,activeContextsStart=3,averageBlockedCountPerContext=0.022921936420557548,averageCORBACallsPerContext=0.0,averageJDBCCallsPerContext=4.403434726210414,averageJNDICallsPerContext=0.0015595501306450274,averageRemoteCacheCallsPerContext=0.08460360391445945,averageWaitedCountPerContext=0.0050540344496395745,completedContexts=2637299,contextCpuSecondsAverage=0.0014892794574335331,contextJDBCConnWaitSecondsAverage=1.1264808434273096E-4,contextSecondsAverage=0.04658706871167547,contextSecondsMax=1800.673565344,contextUserSecondsAverage=0.0013072806685931325,contextsPerSecond=5.327865162195883,elapsedSeconds=495001.078239194,endTime=2023-05-27 10:45:33.384 -0400,errorCount=12,percentageOfContextTimeBlocked=0.0,percentageOfContextTimeInCORBACalls=0.0,percentageOfContextTimeInJDBCCalls=21.863115903528474,percentageOfContextTimeInJDBCConnWait=0.2418011853029498,percentageOfContextTimeInJNDICalls=0.002312340498948655,percentageOfContextTimeInRemoteCacheCalls=0.04387961174570612,percentageOfContextTimeWaited=0.0,redirectCount=0,startTime=2023-05-21 17:15:32.305 -0400]
Any idea what might cause this behavior as described?


