BGMS Stalls - Quits processing queue tasks, publishing, anything

Question

I've been tracking a vexing issue for a few weeks now. I would get a note from the users that they are not seeing emails, publishing has stopped or workflows are not working. Quick check on System monitor shows red lights on BGMS: Threshold of average active method contexts exceeded; average active contexts: 135.0001 It appears as if no queue jobs are transacting. This will persist until I whack the BGMS process and it auto-restarts. Queue jobs start processing normally and there is an email flood. What is strange is this often occurs at night or weekends. I am checking with IT to make sure this is not an external job that is running. I would think if it was DB related, it would settle back down once issue is resolved. I am still investigating issues on that front. Here is the status messages before the event: 2023-05-27 10:05:33,376 INFO [wt.jmx.core.SharedScheduledExecutor.worker] wt.method.MethodContextMonitor.stats.summary - ElapsedSeconds=600.001357667, CompletedContexts=2844, ErrorCount=0, RedirectCount=0, ContextSecondsAverage=0.0010882164483122362, ContextSecondsMax=0.170925781, ContextCpuSecondsAverage=3.7317970253164556E-4, ContextUserSecondsAverage=3.023909985935302E-4, ContextJDBCConnWaitSecondsAverage=9.443747187060478E-6, ActiveContextsAverage=33.00515865339878, ActiveContextsMax=36, ActiveContextsStart=33, ActiveContextsEnd=33, AverageJDBCCallsPerContext=2.2988748241912798, AverageJNDICallsPerContext=0.0, AverageRemoteCacheCallsPerContext=0.015119549929676512, PercentageOfContextTimeInJDBCCalls=69.4537942697918, PercentageOfContextTimeInJDBCConnWait=0.8678188242520327, PercentageOfContextTimeInJNDICalls=0.0, PercentageOfContextTimeInRemoteCacheCalls=0.46653119479917626 Here is the messages when it occurs: 2023-05-27 10:31:23,396 WARN [JMX Monitor ThreadGroup

Executor Pool [Thread-32910]] wt.jmx.notif.methodContextGauge - Time=2023-05-27 10:31:23.393 -0400, Name=MethodContextsGaugeNotifier, SourceObjectName=com.ptc:wt.subsystem=Monitors,wt.monitorType=UserSpecified,monitorName=NumberOfMethodContexts, class=class javax.management.monitor.MonitorNotification, type=jmx.monitor.gauge.high, message=, JVM Name=734509@, Database Connections=15, RecentStatistics=[activeContextsAverage=43.69575698079789,activeContextsEnd=61,activeContextsMax=61,activeContextsStart=33,averageBlockedCountPerContext=0.0,averageCORBACallsPerContext=0.0,averageJDBCCallsPerContext=2.088888888888889,averageJNDICallsPerContext=0.0,averageRemoteCacheCallsPerContext=0.022222222222222223,averageWaitedCountPerContext=0.0,completedContexts=45,contextCpuSecondsAverage=7.938385555555555E-4,contextJDBCConnWaitSecondsAverage=1.38434E-5,contextSecondsAverage=0.0035897418444444445,contextSecondsMax=432060.292405267,contextUserSecondsAverage=6.666666666666666E-4,contextsPerSecond=0.5998349031447896,elapsedSeconds=75.020642787,endTime=2023-05-27 10:31:23.395 -0400,errorCount=0,percentageOfContextTimeBlocked=0.0,percentageOfContextTimeInCORBACalls=0.0,percentageOfContextTimeInJDBCCalls=1.3454552408663482E-5,percentageOfContextTimeInJDBCConnWait=2.0362712675106688E-7,percentageOfContextTimeInJNDICalls=0.0,percentageOfContextTimeInRemoteCacheCalls=2.472260154022281E-8,percentageOfContextTimeWaited=0.0,redirectCount=0,startTime=2023-05-27 10:30:08.374 -0400], BaselineStatistics=[activeContextsAverage=17.039334284186697,activeContextsEnd=61,activeContextsMax=79,activeContextsStart=3,averageBlockedCountPerContext=0.02292217978357173,averageCORBACallsPerContext=0.0,averageJDBCCallsPerContext=4.403481477633508,averageJNDICallsPerContext=0.001559566688444229,averageRemoteCacheCallsPerContext=0.08460450215393109,averageWaitedCountPerContext=0.00505408810850307,completedContexts=2637271,contextCpuSecondsAverage=0.0014892850589446438,contextJDBCConnWaitSecondsAverage=1.1264928033144869E-4,contextSecondsAverage=0.046587539902467744,contextSecondsMax=1800.673565344,contextUserSecondsAverage=0.001307286964441652,contextsPerSecond=5.336972950282546,elapsedSeconds=494151.089872093,endTime=2023-05-27 10:31:23.395 -0400,errorCount=12,percentageOfContextTimeBlocked=0.0,percentageOfContextTimeInCORBACalls=0.0,percentageOfContextTimeInJDBCCalls=21.863126897034316,percentageOfContextTimeInJDBCConnWait=0.2418013068886723,percentageOfContextTimeInJNDICalls=0.002312341661670789,percentageOfContextTimeInRemoteCacheCalls=0.04387963380984237,percentageOfContextTimeWaited=0.0,redirectCount=0,startTime=2023-05-21 17:15:32.305 -0400] 2023-05-27 10:34:09,071 INFO [wt.jmx.core.SharedScheduledExecutor.worker] wt.summary.general - JVMName=734509@, HeapMemoryUsage=637741752, NonHeapMemoryUsage=325782624, Uptime=494404081, ProcessCpuTime=7944010000000, JITCTime=270947, GarbageCollectors=[G1 Old Generation,G1 Young Generation], GCTime=[577956,10256], GCCount=[928,283], Classes=32564, Threads=182, TotalStartedThreadCount=138596, CodeCachePercentUsage=NaN 2023-05-27 10:35:33,381 WARN [wt.jmx.core.SharedScheduledExecutor.worker] wt.method.MethodContextMonitor.stats.summary - ElapsedSeconds=600.001357878, CompletedContexts=1793, ErrorCount=0, RedirectCount=0, ContextSecondsAverage=9.379451115448968E-4, ContextSecondsMax=0.119802725, ContextCpuSecondsAverage=3.3825659453430004E-4, ContextUserSecondsAverage=2.732849972113776E-4, ContextJDBCConnWaitSecondsAverage=9.276237032905745E-6, ActiveContextsAverage=54.95284711645644, ActiveContextsMax=128, ActiveContextsStart=33, ActiveContextsEnd=128, AverageJDBCCallsPerContext=2.036252091466815, AverageJNDICallsPerContext=0.0, AverageRemoteCacheCallsPerContext=0.015616285554935862, PercentageOfContextTimeInJDBCCalls=68.06158638785061, PercentageOfContextTimeInJDBCConnWait=0.9889957225350618, PercentageOfContextTimeInJNDICalls=0.0, PercentageOfContextTimeInRemoteCacheCalls=0.4693038590843637 2023-05-27 10:35:33,382 WARN [wt.jmx.core.SharedScheduledExecutor.worker] wt.jmx.notif.methodContext - Time=2023-05-27 10:35:33.381 -0400, Name=MethodContextsNotifier, SourceObjectName=com.ptc:wt.subsystem=Monitors,wt.monitorType=MethodContexts, class=class javax.management.Notification, type=wt.method.MethodContextMonitorMBean.maxAverageActiveContextsThreshold.exceeded, userData=54.95284711645644, message=Threshold of average active method contexts exceeded; average active contexts: 54.9528, JVM Name=734509@, Database Connections=15, MaxAverageActiveContextsThreshold=37, RecentStatistics=[activeContextsAverage=119.98866890627697,activeContextsEnd=128,activeContextsMax=128,activeContextsStart=66,averageBlockedCountPerContext=0.0,averageCORBACallsPerContext=0.0,averageJDBCCallsPerContext=0.0,averageJNDICallsPerContext=0.0,averageRemoteCacheCallsPerContext=0.0,averageWaitedCountPerContext=0.0,completedContexts=17,contextCpuSecondsAverage=3.0073935294117646E-4,contextJDBCConnWaitSecondsAverage=0.0,contextSecondsAverage=5.642147647058823E-4,contextSecondsMax=432310.279012334,contextUserSecondsAverage=0.0,contextsPerSecond=0.20163700157121983,elapsedSeconds=84.309922621,endTime=2023-05-27 10:35:33.382 -0400,errorCount=0,percentageOfContextTimeBlocked=0.0,percentageOfContextTimeInCORBACalls=0.0,percentageOfContextTimeInJDBCCalls=1.1907240740019307E-5,percentageOfContextTimeInJDBCConnWait=1.956330099324393E-7,percentageOfContextTimeInJNDICalls=0.0,percentageOfContextTimeInRemoteCacheCalls=0.0,percentageOfContextTimeWaited=0.0,redirectCount=0,startTime=2023-05-27 10:34:09.072 -0400], BaselineStatistics=[activeContextsAverage=17.072420586985455,activeContextsEnd=128,activeContextsMax=128,activeContextsStart=3,averageBlockedCountPerContext=0.022922005952320013,averageCORBACallsPerContext=0.0,averageJDBCCallsPerContext=4.4034480836585725,averageJNDICallsPerContext=0.001559554861408923,averageRemoteCacheCallsPerContext=0.08460386055236226,averageWaitedCountPerContext=0.005054049780627166,completedContexts=2637291,contextCpuSecondsAverage=0.001489276805779112,contextJDBCConnWaitSecondsAverage=1.1264842605120177E-4,contextSecondsAverage=0.04658719249507581,contextSecondsMax=1800.673565344,contextUserSecondsAverage=0.001307277050579553,contextsPerSecond=5.334314841123434,elapsedSeconds=494401.076529741,endTime=2023-05-27 10:35:33.382 -0400,errorCount=12,percentageOfContextTimeBlocked=0.0,percentageOfContextTimeInCORBACalls=0.0,percentageOfContextTimeInJDBCCalls=21.86312413239578,percentageOfContextTimeInJDBCConnWait=0.24180127631238674,percentageOfContextTimeInJNDICalls=0.002312341369270306,percentageOfContextTimeInRemoteCacheCalls=0.04387962826117007,percentageOfContextTimeWaited=0.0,redirectCount=0,startTime=2023-05-21 17:15:32.305 -0400] 2023-05-27 10:35:33,382 INFO [wt.jmx.core.SharedScheduledExecutor.worker] com.infoengine.util.IeContextMonitor.stats.summary - ElapsedSeconds=600.002, CompletedContexts=0, ErrorCount=0, ContextSecondsAverage=NaN, ContextSecondsMax=0.0, ContextCpuSecondsAverage=NaN, ContextUserSecondsAverage=NaN, ActiveContextsAverage=0.0, ActiveContextsMax=0, ActiveContextsStart=0, ActiveContextsEnd=0 2023-05-27 10:35:33,383 INFO [wt.jmx.core.SharedScheduledExecutor.worker] com.infoengine.jndi.DirContextMonitor.stats.summary - ServiceName=com.utcaus.EnterpriseLdap, IntervalSeconds=600.002, TotalContexts=0, ErrorCount=0, ContextsPerSecond=0.0, AverageContextSeconds=0.0, LongestContextSeconds=0.0, AverageActiveContexts=0.0, MaxActiveContexts=0 2023-05-27 10:35:33,383 INFO [wt.jmx.core.SharedScheduledExecutor.worker] com.infoengine.jndi.DirContextMonitor.stats.summary - ServiceName=com.utcaus.Ldap, IntervalSeconds=600.003, TotalContexts=0, ErrorCount=0, ContextsPerSecond=0.0, AverageContextSeconds=0.0, LongestContextSeconds=0.0, AverageActiveContexts=0.0, MaxActiveContexts=0 2023-05-27 10:44:09,781 INFO [wt.jmx.core.SharedScheduledExecutor.worker] wt.summary.general - JVMName=734509@, HeapMemoryUsage=643160336, NonHeapMemoryUsage=325782496, Uptime=495004791, ProcessCpuTime=7949140000000, JITCTime=270956, GarbageCollectors=[G1 Old Generation,G1 Young Generation], GCTime=[578664,10256], GCCount=[929,283], Classes=32564, Threads=245, TotalStartedThreadCount=138771, CodeCachePercentUsage=NaN 2023-05-27 10:45:33,382 WARN [wt.jmx.core.SharedScheduledExecutor.worker] wt.method.MethodContextMonitor.stats.summary - ElapsedSeconds=600.001896003, CompletedContexts=8, ErrorCount=0, RedirectCount=0, ContextSecondsAverage=0.00578046275, ContextSecondsMax=0.008206355, ContextCpuSecondsAverage=0.0023634275, ContextUserSecondsAverage=0.0025, ContextJDBCConnWaitSecondsAverage=0.0, ActiveContextsAverage=130.0123936803109, ActiveContextsMax=132, ActiveContextsStart=128, ActiveContextsEnd=132, AverageJDBCCallsPerContext=0.0, AverageJNDICallsPerContext=0.0, AverageRemoteCacheCallsPerContext=0.0, PercentageOfContextTimeInJDBCCalls=0.0, PercentageOfContextTimeInJDBCConnWait=0.0, PercentageOfContextTimeInJNDICalls=0.0, PercentageOfContextTimeInRemoteCacheCalls=0.0 2023-05-27 10:45:33,384 WARN [wt.jmx.core.SharedScheduledExecutor.worker] wt.jmx.notif.methodContext - Time=2023-05-27 10:45:33.383 -0400, Name=MethodContextsNotifier, SourceObjectName=com.ptc:wt.subsystem=Monitors,wt.monitorType=MethodContexts, class=class javax.management.Notification, type=wt.method.MethodContextMonitorMBean.maxAverageActiveContextsThreshold.exceeded, userData=130.0123936803109, message=Threshold of average active method contexts exceeded; average active contexts: 130.0124, JVM Name=734509@, Database Connections=15, MaxAverageActiveContextsThreshold=37, RecentStatistics=[activeContextsAverage=131.38045394969262,activeContextsEnd=132,activeContextsMax=132,activeContextsStart=131,averageBlockedCountPerContext=0.0,averageCORBACallsPerContext=0.0,averageJDBCCallsPerContext=0.0,averageJNDICallsPerContext=0.0,averageRemoteCacheCallsPerContext=0.0,averageWaitedCountPerContext=0.0,completedContexts=1,contextCpuSecondsAverage=0.002965126,contextJDBCConnWaitSecondsAverage=0.0,contextSecondsAverage=0.007209526,contextSecondsMax=432910.280733025,contextUserSecondsAverage=0.01,contextsPerSecond=0.011961477921314188,elapsedSeconds=83.601709302,endTime=2023-05-27 10:45:33.383 -0400,errorCount=0,percentageOfContextTimeBlocked=0.0,percentageOfContextTimeInCORBACalls=0.0,percentageOfContextTimeInJDBCCalls=1.1796607632116384E-5,percentageOfContextTimeInJDBCConnWait=1.9381533542918624E-7,percentageOfContextTimeInJNDICalls=0.0,percentageOfContextTimeInRemoteCacheCalls=0.0,percentageOfContextTimeWaited=0.0,redirectCount=0,startTime=2023-05-27 10:44:09.782 -0400], BaselineStatistics=[activeContextsAverage=17.20931762561805,activeContextsEnd=132,activeContextsMax=132,activeContextsStart=3,averageBlockedCountPerContext=0.022921936420557548,averageCORBACallsPerContext=0.0,averageJDBCCallsPerContext=4.403434726210414,averageJNDICallsPerContext=0.0015595501306450274,averageRemoteCacheCallsPerContext=0.08460360391445945,averageWaitedCountPerContext=0.0050540344496395745,completedContexts=2637299,contextCpuSecondsAverage=0.0014892794574335331,contextJDBCConnWaitSecondsAverage=1.1264808434273096E-4,contextSecondsAverage=0.04658706871167547,contextSecondsMax=1800.673565344,contextUserSecondsAverage=0.0013072806685931325,contextsPerSecond=5.327865162195883,elapsedSeconds=495001.078239194,endTime=2023-05-27 10:45:33.384 -0400,errorCount=12,percentageOfContextTimeBlocked=0.0,percentageOfContextTimeInCORBACalls=0.0,percentageOfContextTimeInJDBCCalls=21.863115903528474,percentageOfContextTimeInJDBCConnWait=0.2418011853029498,percentageOfContextTimeInJNDICalls=0.002312340498948655,percentageOfContextTimeInRemoteCacheCalls=0.04387961174570612,percentageOfContextTimeWaited=0.0,redirectCount=0,startTime=2023-05-21 17:15:32.305 -0400] Any idea what might cause this behavior as described?

avillanueva · Accepted Answer

Success? I did some analysis on which queue jobs were running and my attention was drawn to this job which runs daily:

NAME	IDA2A2	SCHEDULETIME	ENTRY_TARGETCLASS	ENTRY_TARGETMETHOD	SCHEDULEITEM	TARGETCLASS	TARGETMETHOD	ITEMDESCRIPTION	ITEMNAME
commonScheduleQueue	1630151366	08/16/2023 10:30	wt.scheduler.StandardSchedulingService	executeScheduledItem	1630151365	com.ptc.windchill.enterprise.datamonitor.WBRDataMonitorHelper	processFailedAsyncExecutions	WBR_DATAMONITOR_PURGE_SCHEDULE	WBR_DATAMONITOR_PURGE_SCHEDULE

Previously, I put in a call about some Jasper report error messages. This is documented in the CS articles below.

https://www.ptc.com/en/support/article/CS378753?source=search

https://www.ptc.com/en/support/article/CS380347?source=search

This matches us exactly as we are employing an alias hostname. I was going to fix this with a CPS update.

I then came across this article which I think was the real cause:

https://www.ptc.com/en/support/article/CS358954?source=search

My theory is that there were some daily Jasper reports related to Change Object Summary reports that were failing to execute, likely due to the issues noted above. This job hangs and does not finish resulting in the step up we are seeing in the logs above for contexts. After a few days, things get maxed out and BGMS halts. Following the steps in CS358954, I found the entries in the WBRDataMonitor table and marked them as FAILED. When 10:30 rolled around, no step up was observed. I think I fixed the issue but I will wait for a few days to close this thread.

What I do not understand is that we do not use these reports but I can see how a user might trigger them to run. How do I see what has been requested and clear them? If I do not, I can see this event occurring again until I can get the CPS updated.

buenosroas · Answer

Have you by the way considered to use the On-demand System Scan from PTC? Gather sufficient data and run it through a solid scan for suggestions on what to alter in your system and see what output you get.

Sign up

Please use your PTC eSupport account.

Welcome to the PTC Community

Please use your PTC eSupport account.