Re: Windchill Server Status - Site Not Responding?

ptc-1047216 · ‎Jan 07, 2014

Hi all,

I've recently installed an monolithic instance of Windchill 10.1 M040 (SQL Server) and have spent several weeks tailoring it to our requirements, testing workflows & lifecycles document uploads, check ins & outs and so far everything's working well. However, I've just noticed that when I interogate the "Server Status" the report states the following;

File Servers: 0 available; 1 Unavailable

and the bottom of the report states;

https://<our winchill domain name>/Windchill/servlet/WindchillGW

master

SITE NOT RESPONDING

2014-01-07 14:23:59.570 +0000

0%

?

I can ping the server without any issues.

Can anyone shed any light on what might be wrong?

Many thanks!

tswett · ‎Jan 07, 2014

Are you using HTTP or HTTPS for your environment? Try setting the wt.fv logger to Debug and you should see a verbose error message on why it may be having an issue.

Follow this PTC TS document on how to change Log4J Logging quickly:
https://www.ptc.com/appserver/cs/view/solution.jsp?n=141146

ptc-1047216 · ‎Jan 07, 2014

Thanks Tim,

We're using HTTPS.

I've set the logger to Debug (I used wt.util.jmxSetLogLevel -all Log4j Debug) and then waited for the next Server Status ping to occur, but trawling through the logs reveals no clues (at least to me).

That said, I'm not sure what it is I'm actually looking for, all I can say is that they look no different to me than they normally do with no obvious errors are jumping out.

tswett · ‎Jan 07, 2014

Try this (it has happened before with us on our Apache installation):

In $APACHE_HOME/conf/extra/, there is a file "modjk.conf"

Add the following line to the bottom of this file and restart Apache:
JkMountCopy All

This is something about multiple Tomcat processes.

ptc-1047216 · ‎Jan 07, 2014

I got all exited there for a moment, only to discover that that entry (JkMountCopy All) is already in the modjk.conf file.

Grrrr...

tswett · ‎Jan 07, 2014

I would open a Sev-2 TS call then. They should have some more recommendations.

ptc-1047216 · ‎Jan 07, 2014

At the moment this WC instance does not have a connection to an SMTP host but looking through the log reveals that it is trying to send me an email with the subject line of "Site Status Change Notification".

I'll request access to our SMTP host so as to allow this email to be sent - hopefully this may shed some light on the issue. Until then, I'll hang tight!

Many thanks all.

MikeLockwood · ‎Jan 07, 2014

We always see this as well - it says SITE NOT RESPONDING in red as you show for some time. A bit later (maybe 2 minutes), it correctly shows connected. At about that time we get a JConsole email that says all is well with the vault. Meanwhile, users can work normally including all content operations. Somehow it appears to be a delay in how the server status page updates.

jessh · ‎Jan 07, 2014

The basic reason for this is really, really simple:

When the method server pings https://<our winchill domain name>/Windchill/servlet/WindchillGW/wt.httpgw.HTTPServer/ping, it cannot successfully ping it, i.e. it does not receive a 200 response code to this URL request prior to the request timing out.

Which method server is doing this ping is largely indeterminant, I believe -- as this responsibility gets handed between method servers. I am pretty sure, however, that only foreground method servers do such pings.

If the foreground method servers cannot successfully do such a ping (e.g. you don't have a web server directly on each of your cluster nodes or you improperly configured to require authentication for this URL), then a failure will always be indicated.

Looking further into the code, "SITE NOT RESPONDING" indicates that other probing requests against https://<our winchill domain name>/Windchill also fail -- leading to the conclusion that the overall site (including simple anonymous static pages usually served by the web server) cannot respond (vs. the method server not being responsive, for instance).

TshepoMokhere · ‎Feb 13, 2014

Hi Jesse,

Thanks for your response. how ever I'm left puzzled as to whether the is a solution for this momentaneous unresposiveness of the File server's method server. Reason I'm puzzled is that during this period I can ping the server and I can also remotely log onto the file server and even log onto Windchill from the file server workstation " with my servers it is exactly 5 minutes" and then the "Site Status Change Notification" saying the site is available again is received.

Thanks and Kind regards,

Tshepo

jessh · ‎Feb 13, 2014

Well, the SITE NOT RESPONDING status indicates that pings to all of the following failed to respond with a 200 response in a (nearly) timely manner:

https://<our winchill domain name>/Windchill/servlet/WindchillGW/wt.httpgw.HTTPServer/ping

https://<our winchill domain name>/Windchill/servlet/WindchillGW/wtcore/test/dynAnon.jsp

https://<our winchill domain name>/Windchill/servlet/WindchillGW/wtcore/test/staticAnon.html

If you've routed requests for static pages through Tomcat, however, then all of these would be routed to a method server -- and thus one sufficiently unresponsive method server will result in SITE NOT RESPONDING.

Beyond this, my only guess would be that a background method server on a cluster node without any foreground method servers somehow decided to execute the ping -- which it shouldn't. That would either be a bug or a configuration error, but in either case that would be something to confer with technical support about so they can confer with the appropriate development team.

TshepoMokhere · ‎Feb 14, 2014

Thanks Jess, I will investigate whether the BGMS is not the culprit.

shussaini · ‎Feb 14, 2014

You can also have a look at this article CS49869.

Hope that helps.

Regards
~Syed

TshepoMokhere · ‎Feb 14, 2014

Thanks Syed, I will have a look.

Regards,

Tshepo

jessh · ‎Feb 14, 2014

That's certainly worth checking -- though that shouldn't be a problem with 10.1 M040 (the original version noted), as this setting is proper out-of-the-box there. In fact, it looks like it should be proper out-of-the-box in 10.1 F000 and even in 10.0 F000 and R9.1 (M070 at least -- I didn't check all MOR levels). I can only assume the article is addressing cases where this setting was inadvertently changed on site.

TshepoMokhere · ‎Feb 14, 2014

What the name of the setting Jess, as we using R10.1M020.

jessh · ‎Feb 14, 2014

See CS49869 for details. Essentially these pings should only come from foreground method servers and need to be disabled from background method servers.

mkohn · ‎Apr 13, 2016

I had the same thing showing up after rehosting the server with a different URL. It also answered all of the pings and testing. I ended up importing in the certificate via the keytools import into the jssecacerts keystore and that fixed it for me.