The Downsides of Depending on Multi User Load Test...

vaillan · ‎Jan 06, 2013

I'm all for performance testing Windchill so long as it's the right type of testing being done. But unfortunately, I often feel that testing time and resources could be better spent concentrating on areas other than the maximum concurrency the system can handle from a given set of load scripts.

In many cases load testing Windchill means either creating custom load scripts using a tool like JMeter or SilkPerformer or using a benchmark test provided by PTC to load the system with a synthetic load (there is also a single user test for CAD data management operations also available from PTC).

In my opinion there are a several problems with using a multi-user load to apply a synthetic load to test the performance of a Windchill system:

If the load closely mimics or represents what the ACTUAL load will be in the future, I have no issue with the tests being run. Unfortunately, what I see more often happening is someone well intentioned makes assumptions about a future load which aren't based on a real world data resulting in tests which not testing the right things or will come close to what will actually happen when the system goes live.
Often the concurrency of the test is very high, unrealistically so. The tester wants to know what happens when 300, 500 or 1000 concurrent users are "on the system". Having an artificially high number of users on the system all doing things that don't represent reality will either give a false sense of performance will be ok, or more likely the load tests will run into a limitation the test is unaware of. Which bring me to #3.
The reality is, that most of these high concurrency tests start to become limited either by:
- One or a small number resource intensive operations they contain, and tuning/improvement of these operations is critical to the benchmark results
- Or the MethodServers aren't configured to handle a high concurrent load. Adjusting the DBConnections & load balancing parameters is of primary importance for the load performance also adjusting the JVM garbage collection parameters, db.properties statement cache and fetch sizes can tweaked weak a little more throughput and performance from the system. The unfortunate thing is optimal set of parameters for good benchmark performance are not the same a production Windchill system should be run with. Which leads to the question of what is REALLY being tested, and what the purpose of the test is.
  - For those interested the set of properties which most impact load performance are:
    - wt.properties: wt.method.loadbalance.activeContext --> needs to be set high
    - wt.properties: wt.method.loadbalance.maxRedirects --> needs to be set low or maybe to 0
    - wt.properties: jvm settings, specifics will vary
    - db.properties:wt.pom.maxDBConnections -> needs to be set high
    - db.properties: wt.pom.rowPrefetchCount --> needs to be set lower
    - db.properties: wt.pom.queryLimit --> needs to be set to a lower value than a production setting but higher than 0
    - db.properties: wt.pom.traceTiming.enable --> Needs to be set to false, but doing in so doing this will stop all JDBC related logging
There are many large production Windchill systems around the world which are running the same code, (assuming the release being tested is not a first release). If there are problems with the out of the box code other sites would have discovered on these systems and would incorporated into MOR's.

But by far my biggest issue with LoadTests is that they don't stress or test the operations that are most likely to cause problems on a Windchill system when it goes live and spending time on load tests takes away time from those things. I have looked at hundreds maybe thousands of of systems over the years which were having either performance or stability problems and I can only remember one case where the load peak load from performing a small number of operations caused problems. In that case (I think it was an R8 system) there was an e-mail sent to 800+ people to look at something and many of them did that within a short period of time. The code has improved since then, heaps are bigger and today I think in a properly sized system this action would be less likely to cause problems. Maybe a brief slow down, but nothing to serious. But by far the most common cause of stability problems is caused by one big or few relatively large operations run concurrently and consume many of the available the resources (MethodServer Memory or Oracle capacity being among the most commonly affected) starving the other normal operations of the ability to run quickly if at at all. When this starts to happen in a production system Windchill will report a high number of concurrent users giving the appearance of a load related problem, when it's really a problem brought on by one or a small number of large transactions. Identifying these transactions and correcting their root causes, are the next steps.

Below is by no means an exhaustive list, but some of the more common patterns which performance problems tend to appear are:

BIG operations on BIG datasets. Think of set state, BOM report, or structure expansion request on the largest top level structure or any operation that needs to access or touch a large amount of data is at risk for a performance problem
Customized code is a notorious place to find problems relating to performance and stability
Use of edge or 'new' functionality in a heavy or extreme way
Searching, particularly with advanced criteria and/or across many object types or from the dialog searches like when adding affected data to a change object

My advice is to keep everything in balance. Understand that load tests have significant short comings in ensuring acceptable end user performance and identifying and concentrating on the larger slower operations should be significant part of a performance assurance testing plan. Lastly, arranging for a Performance Engineer to analyze data generated as part of a performance test (either multi user or single operation) will ensure that problems exposed during the performance tests are identified and dealt with appropriately.

The Downsides of Depending on Multi User Load Testing to Ensure Windchill Performance

The Downsides of Depending on Multi User Load Testing to Ensure Windchill Performance