Skip to main content
avillanueva
23-Emerald I
July 10, 2013
Question

Resolving Cad Agent Timeout Issues

  • July 10, 2013
  • 9 replies
  • 11221 views
I am sure we've all seen this from time to time:
Jul 10, 2013 11:33:59 AM:Timeout exceeded waiting for a reply from the CadAgent - Time out 60 seconds
Jul 10, 2013 11:34:59 AM:Timeout exceeded waiting for a reply from the CadAgent - Time out 60 seconds
Jul 10, 2013 11:35:59 AM:Timeout exceeded waiting for a reply from the CadAgent - Time out 60 seconds
Jul 10, 2013 11:36:59 AM:Timeout exceeded waiting for a reply from the CadAgent - Time out 60 seconds
Jul 10, 2013 11:37:59 AM:Timeout exceeded waiting for a reply from the CadAgent - Time out 60 seconds
Jul 10, 2013 11:38:59 AM:Timeout exceeded waiting for a reply from the CadAgent - Time out 60 seconds
Jul 10, 2013 11:39:59 AM:Timeout exceeded waiting for a reply from the CadAgent - Time out 60 seconds
Jul 10, 2013 11:40:59 AM:Timeout exceeded waiting for a reply from the CadAgent - Time out 60 seconds

Tech support notes to adjust time out in wvs.properties
publish.cadtimeout.assembly=7200
publish.cadtimeout.component=600
publish.cadtimeout.drawing=7200

Been there, done that. So, with all your publisher queues displaying this scrolling message, is there any hope to fix? I tried deleting the executing jobs and just go to the next one. Those too, end up in this loop. I am looking for something to do without having to restart the system.
Antonio Villanueva - Sr. Software Engineer - ISR Systems
UTC AEROSPACE SYSTEMS
100 Wooster Heights Road, Danbury, CT 06804
Tel: +1 203 797 5682
antonio.villanueva@utas.utc.com<">mailto:antonio.villanueva@utas.utc.com> www.utcaerospacesystems.com
CONFIDENTIALITY WARNING: This message may contain proprietary and/or privileged information of UTC Aerospace Systems and its affiliated companies. If you are not the intended recipient please 1) do not disclose, copy, distribute or use this message or its contents, 2) advise the sender by return e-mail, and 3) delete all copies (including all attachments) from your computer. Your cooperation is greatly appreciated.

9 replies

10-Marble
July 10, 2013
There are also timeout settings in the recipe file as well. I think that could be part of the issue. Try to increase these beyond 60 seconds.

[cid:image001.png@01CE7D6E.273C9780]

Steve D.
1-Visitor
July 10, 2013
Antonio,

Is there any Firewall enabled on the CADWorker machine? Is the user local admin who is starting the GSWorker Daemon in the services? Shutdown GS Worker Daemon from services and start workerdaemon.exe manually and see if there are any pop-ups.

Thanks,

Kiran Lakshminarayanan


avillanueva
23-Emerald I
July 10, 2013
Both of those are the values I have.
12-Amethyst
July 10, 2013
These are good suggestions from Kiran.

The other suggestion I would make is to bring up the model in question in Proe on that machine and see if it takes longer to pull up than your timeout or if it has any prompts that require human intervention. In the latter case you would need to fix the model before it can be published.

--Bob
10-Marble
July 10, 2013
Does this happen for all jobs, or just some jobs? I know we had an issue with timeouts for Drawings while trying to create the thumbnail. This issue happened/was noticed a few weeks after we upgraded the Adapters to support Creo 1 from Wildfire 5.0. Is there any additional error messages in the worker/helper logs that may be of use? The timeout issue can result in the worker already failing and not communicating back. That's what we found in this case.

You may want to look at these two articles.

12-Amethyst
July 11, 2013

Hi Antonio,



We had a similar experience but our problem was large assemblies were taking longer than the timeout set in the recipe file. We didn't want to set the timeout too high because it caused the jobs behind it wait for too long. The solution was to split out the long running jobs (large assemblies) to their own queue set and have a dedicated worker process all of those jobs. Then, the timeout in the dedicated worker's recipe file could be as high as it needed to be without blocking smaller jobs.



We also put the publishing queues on their own background method server. That way, when something happened (all got stuck in the executing state) we could just kill that bgms and not hurt anything happening on the ootb bgms.



Let me know if you want more information on either solution.



~Jamie



19-Tanzanite
July 11, 2013
I just started having a timeout issue that started yesterday, but is only on one assembly. I happened to check the trail file on the worker and it gives me this wonderful error:
! Message Dialog: Warning
!mem_use INCREASE Blocks 2338088, AppSize 259071815, SysSize 294674976
! : Fatal error encountered. A traceback has been written to
! : D:\ptc\creoelements_viewadapters\proe_setup\traceback.log
! : Please send it to Technical Support.

Any ideas? It pulls up quickly on the users machine. This doesn't seem to be an actual timeout, so hopefully someone else has had this issue.

Brian Toussaint
CAD Administrator

Hoshizaki America, Inc.
"A Superior Degree Of Reliability"
618 Hwy. 74 S., Peachtree City, GA 30269
Phone: (770) 487-2331 ext. 1216
Fax: (770) 487-3359
www.hoshizaki.com
avillanueva
23-Emerald I
July 11, 2013
Thanks to those who are responding. This has been a great thread. My biggest frustration is that its not anything simple. The system (I believe) is configured fine. Timeout values are ok. Its not related to a bad assembly. We ended up bouncing the system and it all came back to life. The jobs that were hanging not processed with no issues, complete. During the issue, we also got calls that workflow tasks had stopped moving and there was a high number of active contexts on the background methodserver. It did not show high CPU or GC %. My technical term for this was its "out to lunch" and it was in fact lunch time.

It explained why the CadAgent was timing out. What I need a deeper understanding is where is the CadAgent running. I think some people on the threads were confusing the publisher, queues, agents, workers and helpers. I do not believe this was an issue on the Cad Worker machine (Daemon, helpers or workers). They were not getting any jobs to them. I could start and stop them with no issues.

My point of the thread was to see if there was a way, short of reboot, to return the background MS to normal operation. To unstick what was stuck. If there was some thread or process within the system, it would sure be nice to reach in an diagnose it without bringing the whole system down. That, in the end, is what we were forced to do.
10-Marble
July 11, 2013
We've had times where I've caused issues with the publishing queue and we found that we could kill the background m.s. process and servermanager will fire up a new one and operations will return to normal. This allows us not to have to bring down the whole application. This is probably not a best practice but it has worked for us when restarting the application would not be ideal.

Steve D.
avillanueva
23-Emerald I
September 13, 2018

After all these years, this issue confounds me. Tech Support has been useless (you have) on this issue. No one has yet gotten to root cause. What frustrating is there are so many links in the chain that its tough to diagnose. The only thing I do know is that the suggestion of restarting the Background MS was the only thing that remedied the issue but does not solve it.  Users are typically unaware of the restart.

I now run multiple PDMLink 10.2 and 10.1 installations and all have the issue to some degree. I cannot queue up a scheduled republish job since it will never complete without baby sitting the queue.  I am ok with it skipping and moving on to next job but it stalling all together needs to be solved.

Here is what I know:

  • Issue seems to be in the Background MS since restarting it resolved it. Any changes on the CAD worker have no effect. 
  • It does not appear to be data dependent but I have not 100% ruled that out.
  • Typically it starts after a job has failed some some reason but some thread does not end, blocking the entire queue. Typically it would fail at the step of "generating thumbnail" but I have seen other jobs fail there an not hold up queue..
  • Killing the executing queue jobs does not resolve the issue. No trick appears to work.
  • Configuration - 3MS-1BGMS-3 queues-3 Creo workers on single workstation (using virtual host callouts). Linux to Windows over SFTP transfer.

At this point, since I am seeing it all over, I am open to all options. I have thought about splitting off publishing to its own BGMS, creating a thumbnail worker exclusively. I have played with timeout settings but issue seems to remain.  Anyone else still seeing this or has solved it?

  • No alert I know of can tell me when its occurring other than the users reporting jobs are not processing.
7-Bedrock
September 14, 2018
  • Configuration - 3MS-1BGMS-3 queues-3 Creo workers on single workstation (using virtual host callouts). Linux to Windows over SFTP transfer.

On your worker, have you updated your local hosts file with your 3 aliases in windows\system32\drivers\etc\hosts?  Sounds like you have it on the server side in /etc/hosts, but the worker needs to know about these too.

 

Did you manually add the correct alias to proeworker.bat as -DA <alias>?  Keep in mind that the preo2pv gui tool overwrites this.

 

For troubleshooting hung creo processes, Resource Monitor is a great tool because you can filter on your different worker paths.  Next time try identifying the hung job, killing the xtop.exe process, and see if it's able to continue.