Often when we think about monitoring an applications health, we look to performance metrics and observing changes over time.
However, when it comes to critical issues which need immediate attention, alerts setup on relevant ThingWorx logs are the way to notify Ops Teams of events. Logs provide contextualised detail of an event that has occurred, allowing for triage and directing troubleshooting.
Let me illustrate an example: ThingWorx is a database application and requires that DB for proper function. A log message indicating that the DB connection has been severed, and another one indicating that a connection to the database cannot be established immediately tells you that your problem is with the DB - right when it occurs, no analysis required.
Given this, here is a list of some log message substrings to use as examples to build out your own production system monitoring aimed at detecting common critical or high severity issues using your log management system (Splunk, Loki, DataDog, ElasticSearch, etc.).
ThingWorx Platform
Connection Servers
Have any log messages that you've found that could be added here? Post them in the comments and I'll add them to the list.