Community Tip - Want the oppurtunity to discuss enhancements to PTC products? Join a working group! X
This is a nebulous problem I've been dealing with for months. It could be totally unrelated to the CAD Worker and related to some other program or security suite that was installed. Win Server 2019. I have all my CAD workers, Worker Daemons, etc, setup on different ports, different virtual hostnames, different folders so and 1 worker per instance. Production has 3 Creo Workers, Allegro Cadence, Doc Worker and Thumbnail worker.
What seems like clockwork, every 6-7 days publishing just halts. I've clear logs, .wf and related folders to reset to a clean state. Stopped worker daemons, manually stopped and restarted workers. I am pretty sure this is not related to Windchill side. What I do know is rebooting the Win Server 2019 VM fixes the issue and without any other changes, publishing resumes normally for the next 6-7 days.
The worker logs go as far as to state Starting PROE and that's it. They eventually move into a failed state. I checked Event Viewer but could not find anything that jumped out there. It obviously screams something in memory but at a loss. Any other places or things to try to diagnose issue?
Solved! Go to Solution.
Well, it was not Nessus scans. We paused those but we did believe we found the cause. It was a powershell script trying to apply a GPO for Defender. It was not completing but it kept trying over and over again. Turns out it was not needed at all so getting IT to remove and stop it. There were dozens and dozens of them queued up in task manager that would build over the week until things started failing. That is why restarts cleared them out and fixed the issue until the next week. Thanks everyone for their suggestions.
Perhaps you could create a scheduled task to shutdown and restart the vm every night.
Solves the problem but does not get to the root of it.
Do you use the property to limit the number of publish jobs a Creo session can run before it is closed and restarted? That was a game changer for the CAD Workers here. I constantly had run away Creo sessions and endless monitoring warnings until we dialed it back to 10 publish jobs/creo session. Big thanks to @TomU for that tip!
I have it set to 20, I can dial it back but in my case I can confirm that xtop is not running. Everything related to what I know to be the CAD worker processes and Creo are shutdown. It fails to start back up again. I've certain thrown hundreds of jobs at it in a day and it just smiled and asked for more. I would think that setting would kick in when there was a large number on continuous jobs it had to process.
Is UAC disabled?
Is the transfer folder getting cleared after successful jobs, I have seen that a few times where the job folders dont get cleared and new jobs can be created because the folder name string has reached its limit, then restart the worker and the folder is cleared
I do see my transfer folders with left over folders but not sure that is it. I am putting a script in place to clear that. But if this was the issue. a restart would fail immediately with the same cause and I have not cleared this folder.
What version of Creo View are you using? Does your organization happen to use NESSUS as a security scanner? We have been seeing the same issue with Creo View because of how our security scanners are hitting the services.
Version is 9.1 for the adapters and Creo 9 for publishing. I can check on that related to NESSUS. What behavior are you seeing? My last battle for crying uncle and restarting was to shutdown and clear everything. Worker Daemons were restarted, workspaces removed, temp files cleared. Somethings started, other stuff was just unreliable. Restart and everything works. Strange. I know these might be services related but task managers shows a ton of conhost.exe, powershell and scvhost.exe processes running. I should get a count on restart to see how many are there. Perhaps there is something that is building all week and hits some limit a week later which is cleared on restart.
Nessus scans the machine, hits the port then we get reports of the services being stopped. Sometimes you can then start up from there Worker Agent Admin. sometimes shows fails to start. Then you have to start the service manually.
Seems related to this:
https://www.ptc.com/en/support/article/CS355444?source=search
Well, it was not Nessus scans. We paused those but we did believe we found the cause. It was a powershell script trying to apply a GPO for Defender. It was not completing but it kept trying over and over again. Turns out it was not needed at all so getting IT to remove and stop it. There were dozens and dozens of them queued up in task manager that would build over the week until things started failing. That is why restarts cleared them out and fixed the issue until the next week. Thanks everyone for their suggestions.