Community Tip - Want the oppurtunity to discuss enhancements to PTC products? Join a working group! X
Hi All,
I mentioned this in Brian's other thread "Strange Windchill Server Behavior" with regards to having issues allocating memory to tomcat mod jk workers/process. Tomcat workers are now imbedded in the method servers and are having memory issues. I can see by the mod_jk.log that tomcat stops workers stop working then tomcat completely stops working which results in a method server to crash. As a result, all users cannot log-in or find it extremely slow.
My current production system is a Windows VM Server 2008 R2 64bit with 48 GIGs of RAM. I have 4 foregrond method servers and 3 background (1 main, 1 WVS and 1 for replication/routing).
Here is the errors I see in mod_jk.log:
[Tue Mar 11 12:40:19.691 2014] [160:5104] [error] jk_ajp_common.c
(2647): (tomcat9) connecting to tomcat failed.
[Thu Mar 13 06:45:42.108 2014] [160:4396] [error]
jk_ajp_common.c (2127): (tomcat1) Tomcat is down or refused connection. No
response has been sent to the client (yet)
[Thu Mar 13 06:58:00.565 2014] [160:4772] [error] jk_lb_worker.c
(1485): All tomcat instances failed, no more workers left
2014-03-13 06:32:04,969 WARN [Low Memory Detector]
wt.method.MemoryUsageRedirectStrategy - Entering low memory state
2014-03-13 06:38:31,881 ERROR [ajp-bio-8010-exec-148]
wt.method.MethodContextMonitor.contexts.servletRequest plmadmin - 2014-03-13
10:38:07.516 +0000, 2t251m;hsnenth4;5780;3vjb90;441211, -, -,
2t251m;hsnenth4;5780;3vjb90;441206, plmadmin, 10.30.11.10, -, -, , 0, 4,
0.008136441, 0, 0.0, 0.0312002, 24.365867659
at org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:912)
I don't think modifying the tomcat start for memory allocation helps anymore since it is now tomcat is imbedded in Windchill method servers.
Oh yes, on my windows server states that my physical memory (MB) is about:
total 49151
cached 25000
Available 24000
Free 151
and my method server heaps are:
<property name="wt.method.maxHeap" overridable="true" targetfile="codebase/wt.properties"<br"/> value="5120"/>
<property name="wt.method.minHeap" overridable="true" targetfile="codebase/wt.properties"<br"/> value="2048"/>
Any suggestions would be great.
thanks Dave,
We currently have this provided from PTC:
<property name="wt.manager.cmd.MethodServer.java.extra.args" overridable="true" targetfile="codebase/wt.properties"</p">
value="-Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000$(wt.manager.cmd.MethodServer.gc.log.args)"/>
<property name="wt.manager.cmd.ServerManager.java.extra.args" overridable="true"targetFile="codebase/wt.properties"</p">
value="-XX:+DisableExplicitGC$(wt.manager.cmd.ServerManager.gc.log.args)"/>
<property name="wt.manager.cmd.MethodServer.platform.java.args" overridable="true"" targetfile="codebase/wt.properties"value="-XX:PermSize=256m" -xx:maxpermsize="512m"/">
<property name="wt.manager.rmi.maxSockets" overridable="true"targetFile="codebase/wt.properties"value="500"/">
<property name="wt.method.gcInterval" overridable="true"targetFile="codebase/wt.properties"value="3600"/">
I noticed memory garbage collection gets in contention after the server hits a critical condition. I noticed a cascading effect with more and more errors toreplication, epmdocument helper and so forth. Then, I see memory warnings when it is too late or has reached a critical condition.
Thanks,
Patrick
In Reply to David DeMay:
Spending too much time in non heap garbage collection? What garbage collection tuning parameters do you have setup and what is your heap non heap memory ratios?
Sent from my Verizon Wireless 4G LTE Smartphone
Hi Jesse,
I could not have said it any better and I completely agree with you.
Thank you very much.
Patrick
In Reply to Jess Holle:
To be clear anything within the same JVM process all sees/experiences
memory issues as one -- irrespective of the cause. The whole process
uses a single memory heap, so with 10.x's embedded Tomcat there's no
such thing as adjusting "Tomcat" memory -- one adjusts memory for the
overall method server process and everything therein shares that memory.
The OutOfMemoryError could have been caused by /anything/ running within
that JVM, e.g. any Windchill transaction or servlet/JSP code. There's
no information below to indicate what caused the OutOfMemoryError --
except that it occured because of "GC overhead limit exceeded", which
means that the JVM ran so low on memory that it was spending almost all
of its time scrounging for more memory via garbage collection and yet
not freeing any meaningful amount of memory.
Once the OutOfMemoryError occurred that should have resulted in a quick
death/euthanization of the now useless method server and thus a
replacement by a new one. If not that's a rather separate issue from
whatever caused the OutOfMemoryError.
As for what caused the OutOfMemoryError, further analysis would be
required to understand that.
One can use wtcore/jsp/jmx/javaProcesses.jsp to links to stored
performance data on this method server, entering the JVM name from the
logs below (5780@WHQPMAS01) into the "JVM" field (and adjusting or
clearing the time period fields as necessary). This can be used to
obtain 10 minute heap and GC data samples -- though the heap information
will be rather meaningless if you configured to disable explicit GC in
your method servers. You can also examine charts of request concurrency
and the like for this process. All of this should give a feel for the
duration/shape of the problem, e.g. whether the problem slowly built
over a long time or occurred suddenly.
You can also use wtcore/jsp/jmx/listSamples.jsp and
wtcore/jsp/jmx/logEvents.jsp to examine in-flight samples and completion
data for servlet requests and method contexts around the time in
question. With recent Hot Spot (Sun/Oracle) JVMs (Java 6 Update 25 or
newer), you can see the cumulative memory allocation for each thread or
context. Note that samples occur every 15 seconds and both sample and
completion data is only collected for requests that have been running
for 8 seconds or more (or where the request ends in an error, in which
case completion data is always collected).
I'd guess that either: (1) there was higher concurrency than the system
was sized for or (2) one or at most a few requests used huge amounts of
memory, huge at least when compared to what the system was sized for --
and further guess that it's case #2, as this is usually the case.
Whether the system was simply not sized large enough or whether a
software issue led to high concurrency or wasteful memory usage is yet
another question.
--
Jess Holle
A few months ago after we upgraded to Windchill 10.1, we have seen a couple times that the frontground method server(s) were not responding and we found that there are many of the warnings of "low memeory detector" then eventurely led to "java.lang.OutOfMemoryError: GC overhead limit exceeded" error in the method server logs.
We made a property change com.ptc.core.collectionsrv.engine.collected_result_limit=5000 (default is -1 unlimited) then we haven't seen it happen again in the past 2 months. Seems this setting does help for us.
Hope it helps for you too if you are seeing similar strange behaviors on your Windchill server and don't know what causes your JVMs frequently going into Garbage Collection mode and getting OutOfMemoryError.
Detail explaination :
In Reply to Patrick Chin:
Hi All,
I mentioned this in Brian's other thread "Strange Windchill Server Behavior" with regards to having issues allocating memory to tomcat mod jk workers/process. Tomcat workers are now imbedded in the method servers and are having memory issues. I can see by the mod_jk.log that tomcat stops workers stop working then tomcat completely stops working which results in a method server to crash. As a result, all users cannot log-in or find it extremely slow.
My current production system is a Windows VM Server 2008 R2 64bit with 48 GIGs of RAM. I have 4 foregrond method servers and 3 background (1 main, 1 WVS and 1 for replication/routing).
Here is the errors I see in mod_jk.log:
[Tue Mar 11 12:40:19.691 2014] [160:5104] [error] jk_ajp_common.c
(2647): (tomcat9) connecting to tomcat failed.[Thu Mar 13 06:45:42.108 2014] [160:4396] [error]
jk_ajp_common.c (2127): (tomcat1) Tomcat is down or refused connection. No
response has been sent to the client (yet)[Thu Mar 13 06:58:00.565 2014] [160:4772] [error] jk_lb_worker.c
(1485): All tomcat instances failed, no more workers left2014-03-13 06:32:04,969 WARN [Low Memory Detector]
wt.method.MemoryUsageRedirectStrategy - Entering low memory state
2014-03-13 06:38:31,881 ERROR [ajp-bio-8010-exec-148]
wt.method.MethodContextMonitor.contexts.servletRequest plmadmin - 2014-03-13
10:38:07.516 +0000, 2t251m;hsnenth4;5780;3vjb90;441211, -, -,
2t251m;hsnenth4;5780;3vjb90;441206, plmadmin, 10.30.11.10, -, -, , 0, 4,
0.008136441, 0, 0.0, 0.0312002, 24.365867659at org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:912)
I don't think modifying the tomcat start for memory allocation helps anymore since it is now tomcat is imbedded in Windchill method servers.
Oh yes, on my windows server states that my physical memory (MB) is about:
total 49151
cached 25000
Available 24000
Free 151
and my method server heaps are:
value="5120"/>
value="2048"/>Any suggestions would be great.
Thanks Liu,
I'll try that. Ihttps://www.ptc.com/appserver/cs/view/solution.jsp?n=CS16012
I wonder what should be the value for all my dependencies. I believe we have the default value.
In Reply to Liu Liang:
A few months ago after we upgraded to Windchill 10.1, we have seen a couple times that the frontground method server(s) were not responding and we found that there are many of the warnings of "low memeory detector" then eventurely led to "java.lang.OutOfMemoryError: GC overhead limit exceeded" error in the method server logs.
We made a property change com.ptc.core.collectionsrv.engine.collected_result_limit=5000 (default is -1 unlimited) then we haven't seen it happen again in the past 2 months. Seems this setting does help for us.
Hope it helps for you too if you are seeing similar strange behaviors on your Windchill server and don't know what causes your JVMs frequently going into Garbage Collection mode and getting OutOfMemoryError.
Detail explaination :
- Property com.ptc.core.collectionsrv.engine.collected_result_limit (defaults to -1) should be set so that "Open in Pro/ENGINEER" adds dependencies by chunks.
- PTC recommends 5000 as an initial value. In a Windchill shell execute below command and restart Windchill: xconfmanager -p -s com.ptc.core.collectionsrv.engine.collected_result_limit=5000 -t codebase/wt.properties
- Customer should adjust this limit using test results in their environment
- On systems with a lot users working with huge assemblies this value should be reduced, so that add to workspace is more likely to collect dependencies in chunks. This will cause less memory usage and decrease the likelihood of an OutOfMemoryError.
- if this limit is raised then the number of applied chunks is decreased and there is more chance to encounter an OutOfMemoryError while opening a large structure.
- This value must not be set to 100 or lower value.
- Example of chunking depending the value of com.ptc.core.collectionsrv.engine.collected_result_limit
- With a structure with 3 layer of dependencies as such:
- 1st layer: 200 dependents
- 2nd layer: 500 dependents
- 3rd layer: 5000 dependents
- If com.ptc.core.collectionsrv.engine.collected_result_limit=100, 3 chunks are used
- If com.ptc.core.collectionsrv.engine.collected_result_limit=5000, 1 chunk is used
In Reply to Patrick Chin:Hi All,
I mentioned this in Brian's other thread "Strange Windchill Server Behavior" with regards to having issues allocating memory to tomcat mod jk workers/process. Tomcat workers are now imbedded in the method servers and are having memory issues. I can see by the mod_jk.log that tomcat stops workers stop working then tomcat completely stops working which results in a method server to crash. As a result, all users cannot log-in or find it extremely slow.
My current production system is a Windows VM Server 2008 R2 64bit with 48 GIGs of RAM. I have 4 foregrond method servers and 3 background (1 main, 1 WVS and 1 for replication/routing).
Here is the errors I see in mod_jk.log:
[Tue Mar 11 12:40:19.691 2014] [160:5104] [error] jk_ajp_common.c
(2647): (tomcat9) connecting to tomcat failed.[Thu Mar 13 06:45:42.108 2014] [160:4396] [error]
jk_ajp_common.c (2127): (tomcat1) Tomcat is down or refused connection. No
response has been sent to the client (yet)[Thu Mar 13 06:58:00.565 2014] [160:4772] [error] jk_lb_worker.c
(1485): All tomcat instances failed, no more workers left2014-03-13 06:32:04,969 WARN [Low Memory Detector]
wt.method.MemoryUsageRedirectStrategy - Entering low memory state
2014-03-13 06:38:31,881 ERROR [ajp-bio-8010-exec-148]
wt.method.MethodContextMonitor.contexts.servletRequest plmadmin - 2014-03-13
10:38:07.516 +0000, 2t251m;hsnenth4;5780;3vjb90;441211, -, -,
2t251m;hsnenth4;5780;3vjb90;441206, plmadmin, 10.30.11.10, -, -, , 0, 4,
0.008136441, 0, 0.0, 0.0312002, 24.365867659at org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:912)
I don't think modifying the tomcat start for memory allocation helps anymore since it is now tomcat is imbedded in Windchill method servers.
Oh yes, on my windows server states that my physical memory (MB) is about:
total 49151
cached 25000
Available 24000
Free 151
and my method server heaps are:
value="5120"/>
value="2048"/>Any suggestions would be great.
Hi Guys,
After now looking at swap, this property may not apply because we have it unlimited default of -1 and also our background method servers were also dying because were getting queue problems emails. We only have 11 JVM processes running:
There are no other major/minor processes running. No index server or cognos running either.
Our method server never peaks beyond 4 GIG of RAM and always hover around 2.5 GIG.
As a result, the 7 method servers should only take a max from the system a max of 35 GB which leaves 13G the rest.
There is always 25 GB available.1 ofour method server is getting maxed at times and I don’t know which process ID it is on CPU Core1. But you know java JVMs, they can jump to another CPU core. If 50% of our memory is available (24GB of 48 GB) why is Windchill using so much swap on our Windows VM machine. When our system dies, the memory allocation doesn't change.
Hmm.
In Reply to Patrick Chin:
Thanks Liu,
I'll try that. If our designers accidentally selects regenerate on our drawing, we have in our regeneration process states that there is 120,000 features to complete in some of our sub assemblies and if that is used multiple times well that is multiplied by the number of occurrences. We really only have a max of 300 individual items that
is used multiple times in the total structure. Now thinking of why 120,00 features to regenerate, if there is a direct or indirect (multi-sublevel then goes back to the top)circular reference, Windchill would behave as a large
assembly listing out the dependency repeatedly. This makes sense for some companies who have smaller assemblies but have heavy complexity in dependencies.this looks like something I can relate to.
https://www.ptc.com/appserver/cs/view/solution.jsp?n=CS16012
I wonder what should be the value for all my dependencies. I believe we have the default value.
In Reply to Liu Liang:A few months ago after we upgraded to Windchill 10.1, we have seen a couple times that the frontground method server(s) were not responding and we found that there are many of the warnings of "low memeory detector" then eventurely led to "java.lang.OutOfMemoryError: GC overhead limit exceeded" error in the method server logs.
We made a property change com.ptc.core.collectionsrv.engine.collected_result_limit=5000 (default is -1 unlimited) then we haven't seen it happen again in the past 2 months. Seems this setting does help for us.
Hope it helps for you too if you are seeing similar strange behaviors on your Windchill server and don't know what causes your JVMs frequently going into Garbage Collection mode and getting OutOfMemoryError.
Detail explaination :
- Property com.ptc.core.collectionsrv.engine.collected_result_limit (defaults to -1) should be set so that "Open in Pro/ENGINEER" adds dependencies by chunks.
- PTC recommends 5000 as an initial value. In a Windchill shell execute below command and restart Windchill: xconfmanager -p -s com.ptc.core.collectionsrv.engine.collected_result_limit=5000 -t codebase/wt.properties
- Customer should adjust this limit using test results in their environment
- On systems with a lot users working with huge assemblies this value should be reduced, so that add to workspace is more likely to collect dependencies in chunks. This will cause less memory usage and decrease the likelihood of an OutOfMemoryError.
- if this limit is raised then the number of applied chunks is decreased and there is more chance to encounter an OutOfMemoryError while opening a large structure.
- This value must not be set to 100 or lower value.
- Example of chunking depending the value of com.ptc.core.collectionsrv.engine.collected_result_limit
- With a structure with 3 layer of dependencies as such:
- 1st layer: 200 dependents
- 2nd layer: 500 dependents
- 3rd layer: 5000 dependents
- If com.ptc.core.collectionsrv.engine.collected_result_limit=100, 3 chunks are used
- If com.ptc.core.collectionsrv.engine.collected_result_limit=5000, 1 chunk is used
In Reply to Patrick Chin:Hi All,
I mentioned this in Brian's other thread "Strange Windchill Server Behavior" with regards to having issues allocating memory to tomcat mod jk workers/process. Tomcat workers are now imbedded in the method servers and are having memory issues. I can see by the mod_jk.log that tomcat stops workers stop working then tomcat completely stops working which results in a method server to crash. As a result, all users cannot log-in or find it extremely slow.
My current production system is a Windows VM Server 2008 R2 64bit with 48 GIGs of RAM. I have 4 foregrond method servers and 3 background (1 main, 1 WVS and 1 for replication/routing).
Here is the errors I see in mod_jk.log:
[Tue Mar 11 12:40:19.691 2014] [160:5104] [error] jk_ajp_common.c
(2647): (tomcat9) connecting to tomcat failed.[Thu Mar 13 06:45:42.108 2014] [160:4396] [error]
jk_ajp_common.c (2127): (tomcat1) Tomcat is down or refused connection. No
response has been sent to the client (yet)[Thu Mar 13 06:58:00.565 2014] [160:4772] [error] jk_lb_worker.c
(1485): All tomcat instances failed, no more workers left2014-03-13 06:32:04,969 WARN [Low Memory Detector]
wt.method.MemoryUsageRedirectStrategy - Entering low memory state
2014-03-13 06:38:31,881 ERROR [ajp-bio-8010-exec-148]
wt.method.MethodContextMonitor.contexts.servletRequest plmadmin - 2014-03-13
10:38:07.516 +0000, 2t251m;hsnenth4;5780;3vjb90;441211, -, -,
2t251m;hsnenth4;5780;3vjb90;441206, plmadmin, 10.30.11.10, -, -, , 0, 4,
0.008136441, 0, 0.0, 0.0312002, 24.365867659at org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:912)
I don't think modifying the tomcat start for memory allocation helps anymore since it is now tomcat is imbedded in Windchill method servers.
Oh yes, on my windows server states that my physical memory (MB) is about:
total 49151
cached 25000
Available 24000
Free 151
and my method server heaps are:
value="5120"/>
value="2048"/>Any suggestions would be great.
Hi Patrick,
Did you try setting both maxHeap and minHeap of method server to the same value 5120?
The PTC performance tuning expert recommanded us to set them to the same value to avoid overhead swapping.
Liu
In Reply to Patrick Chin:
Hi Guys,
After now looking at swap, this property may not apply because we have it unlimited default of -1 and also our background method servers were also dying because were getting queue problems emails. We only have 11 JVM processes running:
- PartsLink 32bit JVM
- Windchill DS
- Server Manager
- 7 method servers
- PTC System Monitor
There are no other major/minor processes running. No index server or cognos running either.
- wt.manager.maxHeap=373
- wt.manager.minHeap=373
- wt.method.maxHeap=5120
- wt.method.minHeap=2024
Our method server never peaks beyond 4 GIG of RAM and always hover around 2.5 GIG.
As a result, the 7 method servers should only take a max from the system a max of 35 GB which leaves 13G the rest.
There is always 25 GB available.1 ofour method server is getting maxed at times and I don’t know which process ID it is on CPU Core1. But you know java JVMs, they can jump to another CPU core. If 50% of our memory is available (24GB of 48 GB) why is Windchill using so much swap on our Windows VM machine. When our system dies, the memory allocation doesn't change.
Hmm.