cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - Want the oppurtunity to discuss enhancements to PTC products? Join a working group! X

Java process for method server exceeds limit during large search

jfrankovich
11-Garnet

Java process for method server exceeds limit during large search

All,

We have been experiencing seemingly random runaway Java processes on our Windchill system over the last few weeks and are trying to determine a root cause and preventative measures. At this point, we have been able to force what appears to be a similar situation by running searches that result in very large result sets. (Though we question the likelihood of user searches actually being the historical cause...)

For example, if we run a basic search for EPMdocs using a keyword like *.asm, it returns the first 15 of 36,124 assembly objects in our system in about 25 seconds (no FAST indexed search) and the rest of the system is fine. BUT, if we click on "Full List" (an admittedly bad idea) the memory consumption of one methodserver Java process quickly jumps past its maxheap setting and proceeds to slowdown the entire WC system. (We have 2 methodservers and 1 background methodserver incidentally.) It then takes 15 minutes or more for the process to recover (if at all) and in most cases we have to kill the process or restart the system.

PTC's recommendations so far have been focused on using query limits to reduce any possible burden on the JVMs. But this has the undesirable consequence of yielding an error message (and no results at all) whenever a search exceeds this arbitrary limit - surely a frustrating user experience. A much more palatable and less Draconian measure would be to simply tell users "don't click on Full List" and leave the limits reasonably high.

And while this all addresses search issues very directly, we can't help but wonder if it fully addresses the issue of the runaway JVM.

So now to the questions. Have you seen such behavior from your method servers and how did you resolve it? (properties, added methodservers, etc.) Does search create a similar burden in your system and how have you addressed it? (query limits, user training, etc.)

Thanks,
John Frankovich
GSI Group LLC
www.gsiag.com<">http://www.gsiag.com>

5 REPLIES 5

On Fri, 2011-02-18 at 17:47 -0600, John Frankovich wrote:
> All,
>
I think I can answer part of your problem.  Does the potential memory
for you methods servers exceed the physical memory of your machine? If
so you are hard swapping.  If not it may be doing fullgc.  If that does
not makes sense to you I can try and provide a more detailed answer.


>
> We have been experiencing seemingly random runaway Java processes on
> our Windchill system over the last few weeks and are trying to
> determine a root cause and preventative measures. At this point, we
> have been able to force what appears to be a similar situation by
> running searches that result in very large result sets. (Though we
> question the likelihood of user searches actually being the historical
> cause…)
>

>
> For example, if we run a basic search for EPMdocs using a keyword like
> *.asm, it returns the first 15 of 36,124 assembly objects in our
> system in about 25 seconds (no FAST indexed search) and the rest of
> the system is fine. BUT, if we click on “Full List” (an admittedly bad
> idea) the memory consumption of one methodserver Java process quickly
> jumps past its maxheap setting and proceeds to slowdown the entire WC
> system. (We have 2 methodservers and 1 background methodserver
> incidentally.) It then takes 15 minutes or more for the process to
> recover (if at all) and in most cases we have to kill the process or
> restart the system.
>

>
> PTC’s recommendations so far have been focused on using query limits
> to reduce any possible burden on the JVMs. But this has the
> undesirable consequence of yielding an error message (and no results
> at all) whenever a search exceeds this arbitrary limit – surely a
> frustrating user experience. A much more palatable and less Draconian
> measure would be to simply tell users “don’t click on Full List” and
> leave the limits reasonably high.
>

>
> And while this all addresses search issues very directly, we can’t
> help but wonder if it fully addresses the issue of the runaway JVM.
>

>
> So now to the questions. Have you seen such behavior from your method
> servers and how did you resolve it? (properties, added methodservers,
> etc.) Does search create a similar burden in your system and how have
> you addressed it? (query limits, user training, etc.)
>

>
> Thanks,
>
> John Frankovich
>
> GSI Group LLC
>
> www.gsiag.com
>

>
>
>
>
> ----------

We had similar problems and accepted that we just needed to increase the
heap capacity of the MethodServers to handle these contingencies. We
have very large assemblies with lots of dependencies so it wasn't just
manual searches that were likely triggering the big hits.

We increased RAM on the server to allow the heap of each MS and BGMS (3
and 1) to be about 4GB. This definitely worked to resolve most of our
issues. Oh, we also set the initial heap size equal to the max heap
size. We just reserve all the possible memory space on the server right
away. Any other apps that come along just have to work around this.

When we have gone to swap, we have found that it doesn't come back out
of swap. We're looking into that a bit but are just making sure we can
stay out in the first place.

That said, we had a recurrence of MethodServers dropping starting about
3 weeks ago. It wasn't nearly as severe because the system always
recovered on its own (nice to have 3 big MS's running). We could never
catch it in the act with verbose logging turned on. It happened once or
twice a day, about every other day, for two weeks. It hasn't happened
this week at all. I didn't change anything on the server. Who knows
what our IT department might have done, or perhaps someone doing
something extra massive in the system was on vacation this week.

One thing we have not done is significantly limit our search returns.
We just needed the system to be able to handle big queries in search and
elsewhere. Our larger assemblies (master assemblies) can have 40k+
objects or related objects in them.

If you have large assemblies or lots of dependencies and relationships
in your assemblies, or just lots of models, the hardware sizing guide,
though it mentions increasing resources if you have these things, does
not go far enough, at least not in our case.

Daniel Reid
Kenworth Truck Company
BenLoosli
23-Emerald II
(To:jfrankovich)

Some additional information on your system would help.
32 or 64 bit OS?
Windows or Unix?
How much memory in the system?
What is your JVM heap size set to, min & max?
What version of Windchill?


Thank you,

Ben H. Loosli
USEC, INC.

We've had similar issues as well in the past. This memory issue has reared it's ugly head in several different instances. We currently have 3 foreground MethodServer's(1.5 gb allocated to each) running. For some reason we can't allocate more than 1.5 gb per methodserver. We also have 1 background server running. This has definitely helped us, however, the memory issue remains if multiple users hit the system with these queries at once.

One other thing, watch out for those circular references within the model. This can cause Windchill to surpass the query limit veryquickly.

Here's a brief desc ofa few wt.properties and db.properties:

wt.manager.monitor.start.MethodServer - The number of foreground method servers to be started. The optimal number or method servers must consider the number of CPUs and the quantity of physcal memory installed. The OOTB default is fine for single user development environments but is generally insufficient for multi-user production use.

wt.manager.monitor.start.BackgroundMethodServer - The number of background Method Servers to be started.

wt.manager.monitor.services - Defines the set of services instantiated and monitored by the StandardServerMonitor class. If wt.manager.monitor.start.BackgroundMethodServer > 0 then "MethodServer, BackgroundMethodServer" else "MethodServer"

wt.queue.executeQueues - Property used in an environment with multiple method servers to establish the default behavior of a method server.

wt.method.loadbalance.maxRedirects - Maximum number of times a client call will be redirected to another server. This property specifies the maximum number of times a single method call will be redirected. The default setting is 1. A setting of 0 causes method calls to be redirected until a server that falls below the threshold has been identified. Should be set to the # of MS -1 if wt.manager.monitor.start.MethodServe>2 or 2 if wt.manager.monitor.start.MethodServer=2

wt.method.loadbalance.activeContext - Defines a threshold for load balancing multiple method servers on a single host. Should be 60% of db connections.

wt.pom.maxDbConnections - Depends on method server heap size. Please consult WCA documentation WCConfigAssistant.pdf for more information.

wt.pom.minDbConnections - Depends on method server heap size. Please consult WCA documentation WCConfigAssistant.pdf for more information.

You better place a additional background method server to handle your publishing from your already packed system. I would allocated a min of 768meg to each and a max of 3 to 4 gig. It all depends on you OS (better be 64 bit), amount of ram (hopefully 24GIG to 48 GIG). 4 to 6GIG per core is good. So 2X quad core is 48GIG and 3 to 4 methodservers and 2 background. I advise to have at least 1 CAD worker per 10 users. Thus, 60 concurrent users = 6 CAD workers. You can place all the workers on one machine with a 2X quad core with 48GIG of ram windows 2008 R2 64bit.

You can upgrade to JDK 1.6.24 to solve the Windchill security issue with Java and hopefully have less issues with hanging MethodServer issues.

As for oracle, check to make sure you have at least 24GIG of ram with 40% allocation to memory for oracle. We have 48GIGs of ram with 40% memory allocation (SGA and PGA) 2X quad cores. I have no issues with my systems in terms of performance.

We are also on WC9.1M050 and Oracle 11.02.01 running on 2 separate HP 460CG6 blades with 48GIGs of ram with RedHat 5.4 ES.

Check this out:

https://www.ptc.com/appserver/cs/view/solution.jsp?n=135487

Hope this helps,

Patrick Chin

Announcements


Top Tags