Community Tip - Did you get called away in the middle of writing a post? Don't worry you can find your unfinished post later in the Drafts section of your profile page. X
The natively exposed ThingWorx Platform performance metrics can be extremely valuable to understanding overall platform performance and certain of the core subsystem operations, however as a development platform this doesn't give any visibility into what your built solution is or is not doing.
Here is an amazing little trick that you can use to embed custom performance metrics into your application so that they show up automatically in your Prometheus monitoring system. What you do with these metrics is up to your creativity (with some constraints of course). Imaging a request counter for specific services which may be incredibly important or costly to run, or an exception metric that is incremented each time you catch an exception, or a query result size metric that informs you of how much data is being queried from the database.
Refer to Resources > MetricsServices:
You'll need to give your metric a name - identified by key - and this is meant to be dotted notation* which will then be converted to underscores when the metric is exposed on the OpenMetrics endpoint. Use sections/domains in the dotted notation to structure your metrics in-line with your application design.
COUNTER type metrics are the most commonly used and relate to things happening through time. They are an index which will get timestamped as they're collected by Prometheus so that you will be able to look back in time and analyse and investigate what happened when and what the scale or impact was. After the fact functions and queries will need to be applied to make these metrics most useful (delta over time, increase, rate per second).
Common examples of counter type metrics are: requests, executions, bytes transferred, rows queried, seconds elapsed, execution time.
Resources["MetricServices"].IncrementCounterMetric({
basetype: "LONG",
value: 1,
key: "__PTC_Reported.integration.mes.requests",
aggregate: false
});
GAUGE type metrics are point-in-time status of some thing being measured.
Common gauge type metrics are: CPU load/utilization, memory utilization, free disk space, used disk space, busy/active threads.
Resources["MetricServices"].SetGaugeMetric({
basetype: "NUMBER",
value: 12,
key: "__PTC_Reported.Users.ConnectedOperatorCount",
aggregate: true
});
Be aware of the aggregate flag, as it will make this custom metric cluster level which can have some unintended consequences. Normally you always want performance metrics for the specific node as you then see what work is happening where and can confirm that it is being properly distributed within the cluster. There are some situations however where you might want the cluster aggregation however, like with this concurrently connected operators.
Happy Monitoring!
Hi @geva,
This is very good information, thanks for sharing.
I'm assuming the metric itself is created automatically once a call to Increment/Decrement/Set is executed with a new name?
You're welcome Vlad. You're correct about the automatic creation of the metric name.
Hi @geva ,
KB Article - CS421138: Custom Metrics in ThingWorx
(https://www.ptc.com/en/support/article/CS421138)
States the following:
Is there a chance that the capability you described above will be removed by PTC, because it was not intended to work ?
Regards,
Tom
HI Tom - It's funny that you are calling out one of the two things that I am concerned about not having mentioned in the article.
You are however correct in that this approach is not officially supported and so can and likely will change in the future. In line with the CS article and associated R&D ticket mentioned within, we are working on a specific endpoint for custom metrics using a new metrics library (OpenTelemetry). There will certainly be overlap in support of metrics currently provided by DropWizard as we go to OpenTelemetry; so I don't foresee this option going away any time soon.
Now you're probably wondering about the second thing that worries me. It is that this mechanism built for internal metrics is likely persisted in the database, meaning that it should be used parsimoniously as each metric will account for a row in property_vtq as well as the associated UPDATE as the metrics change. Normally Performance Metrics should only be in memory, but expoiting this approach doesn't allow for a choice.
I was recently thinking about a solution for the use case where you might have very quickly updating metrics that you'd like to expose, and wouldn't want to incur the DB impact associated with updating a persisted InfoTable property (which is how these internal metrics are stored), .... and that would be to use two sets of metrics. One which are updated very frequently in memory only - choose your own name, and the other using the above mentioned nomenclature which you populate from the in-memory metric every few minutes. This would safeguard against excessive DB UPDATES, as well as allowing for high volume metrics.