I have a customer with Thingworx 8.4 as a backend and Postgres as database.
His instance is deployed on a server which follows PTC Sizing recommendations for hardware.
However, I haven't found any documentation for tuning the configuration of the persistence provider (PostgresPersistenceProviderPackage) in the platform-settings.json for production.
In addition I look for tuning Tomcat configuration for production as well.
For example the number of Thread which would be recommended in the connector:
<Connector port="443" protocol="org.apache.coyote.http11.Http11NioProtocol" maxThreads="200"
Right now the configuration of the production environment is the same as the development environment which sometimes causes timeouts in production.
I've read the first link but there is no recommendation for production environment.
I wasn't aware of the second link, even though it is more Postgres oriented than tuning Thingworx side, it seems to be a good material. I'll study it.
I'm still looking for a production documentation Thingworx side.
The sizing guide for the ThingWorx version you are running can be found here.
Note that ThingWorx 8.4 is no longer supported so it's recommended that you upgrade to one of the 9.x releases. There have been many performance-related fixes and improvements in the newer releases.
I acknowledge that I also did not see a tweak guide for the actual Persistence Provider configuration fields.
The one I usually modified in the past was the Max Connection Pool Size (this must be always be smaller than the number of connections configured in Postgres, otherwise you'd essentially instruct ThingWorx to use more connections than PostgreSQL offers).
But in your situation I do not believe such a guide would be useful, because you still need to understand why the timeouts happen.
Are they HTTP timeouts or DB level timeouts? Typically in ThingWorx applications I see a lot of DB-level stress due to various factors. Did you monitor the PostgreSQL server to see what exactly causes the timeouts? Many times I saw that actually code improvements provided the most benefit.
The timeout is Thingworx side but caused by a very long SQL Query. We have identified that the problem is caused by a lack of indexation on Postgres side thus we have fixed this.
However, the DB server has been sized based on the documentation but it has a very little usage regarding to the load the application asks for.
The CPU & RAM are at most used at 10-15% and this is why I was wondering if raising/tweaking some values would be beneficial for using the most of the power of the machine.
Thank you for the additional insight.
You need to keep in mind that the standard sizing documentation we have (not the sizing tests) contains recommendations that are way higher than what is really required by some applications that we see out there. Simply said, this happened because we can not estimate what your application will do.
I know people for example that are running ThingWorx in a RaspberryPi (to be clear, this is NOT supported).
What I would do in your situation - if the application works normally, I would downsize the respective DB machine as much as possible until I see usage of 60-70%.
I would also do the same for the ThingWorx machine. You need some monitoring to see the hourly usage for 1 week - checking spikes etc...
Note: All will be good in normal usage, but what will impact you is the types of failures that you will encounter AND the actions that other third party systems do on ThingWorx, that you can not control. If for example you use store and forward on your Edge devices, and all of them disconnect for 1 day let's say, you'll be hit with a spike of property writes when all of them connect, which will impact you (you just need to simulate these failure types to understand what can be modified).
Also, other systems might trigger a service call on ThingWorx with a huge JSON, then again, a lot of processing will happen.
Not much modifications can be done on the PersistenceProvider for these situations (except maxConnections). Let us know if this happens though, and we can chime in with some ideas.