IoT Tips

ThingWorx DevOps By Victoria Firewind, IoT EDC This presentation accompanies a recent Expert Session, with video content including demos of the following topics: found here! DevOps is a process for taking planned changes through development, through testing, and into production, where they can be accessed by end users. One test instance typically has automated tests (integration testing) which ensure application logic is preserved in spite of whatever changes the developers are making, and often there is another test instance to ensure the application is usable (UAT testing) and able to handle a production load (load testing). So, a DevOps Pipeline starts with a task manager tasking out planned changes, where each task will become a branch in the repository. Each time a new branch is created, a new pipe is needed, which in this case, is produced by Docker Hub. Developers then make changes within that pipe, which then flow along the pipe into testing. In this diagram, testing is shown as the valve which when open (i.e. when tests all pass) then allows the changes to flow along the pipe into production. A good DevOps process has good flow along the various pipelines, with as much automated or scripted as possible to reduce the chances for errors in deployments. In order to create a seamless pipeline, whether or not it winds up automated, several third party tools are useful: b Container software is a very good way to improve the maintainability and updatability of a ThingWorx instance, while minimizing the amount of resources needed to host each component. n 1. Create Docker Image Consult the Help Center if need be. Update your YML file with everything you need before starting the image: see the example in the PTC community. License the instance using the license management website. Follow the instructions from Docker for installing those tools: Docker itself (docker) and Docker Compose (docker-compose). n 2. Save Docker Image in Docker Repo Docker Hub has some free options, and if a license is purchased, can host more than a single Docker image and tag. It is also possible to set up your own Docker registry. n 3. Access the image in Docker Desktop Download Docker Desktop and sign-in to the Docker account which hosts the repository. Create some folders for storing the h2.env file and the ThingworxStorage and ThingWorxPlatform mounted folders. Remember to license these containers as well. Developers login to the license management site themselves and put those into the ThingWorxPlatform mounted folder (“license_capability_response.bin”). Git is a very versatile tool that can be used through many different mediums, like Azure DevOps or Github Desktop. To get started as a totally new Git user, try downloading Github Desktop on your local machine and create a local repository with the provided sample code. This can then be cloned on a Linux machine, presumably whichever instance hosts the integration ThingWorx instance, using the provided scripts (once they are configured). Remember to install Git on the Linux machine, if necessary (sudo apt-get install git). A sample ThingWorx application (which is not officially supported, and provided just as an example on how to do DevOps related tasks in ThingWorx) is attached to this post in a zip file, containing two directories, one for scripts and one for ThingWorx entities. Copy the Git scripts and config file into the top level, above the repository folder, and update the GitConfig.sh file with the URL for your Git repository and your login credentials. Then these scripts can be used to sync your Linux server with your Git repository, which any developer could easily update from their local machine. This also ensures changes are secure, and enables the potential use of other DevOps procedures like tasks, epics, and corresponding branches of code. Steps to DevOps using the provided code as an example: Clone the repository into the SystemRepository or any other created repository, use the provided scripts in a Linux environment. Import the DeploymentUtilities entity, which again is scripted for Linux or for use with a development IDE with bash support. Then import the ThingWorx application from source control or use the script (which itself makes use of that DeploymentUtilities entity). Now create some local changes, add things, etc. and try out the UpdateApplication script or export to source control and then push to the Git repo. Data and localization table exports are also possible. Run the tests using the provided IntegrationTester thing or create your own by overriding the IntegrationTestTS thing shape, or use the TestTwxApplication script from a Linux terminal. Design a process for your application which allows for easy application exports and updates to and from a repository, so that developers can easily send in their changes, which can then be easily loaded and tested in another environment. In Conclusion: DevOps is a complex topic and every PTC customer will have their own process based around their unique requirements and applications. In the future, more mature pipeline solutions will be covered, ones that involve also publishing to Solution Central for easier deployment between various testing instances and production.

Sep 30, 2021

Hi All Our expert session: Thingworx Flow Overview is tomorrow!!! Click the link below to register and remember to talk about it to colleagues that might benefit from its content. Expert Session: Thingworx Flow Overview Date and Time: December 10th, 8h00 EST Duration: 1 hour Host: Antony Moffa; Vinay Vaidya - Thingworx IoT Platfom Senior Directors Registration Here: https://www.ptc.com/en/customer-success/expert-sessions-for-thingworx-foundation-webcasts See you there! Here are other upcoming sessions that might be of your interest: Upgrade to Thingworx 9 – How to Plan / Evaluate Impacts This session will highlight the key points you should evaluate to properly plan your upgrade to Thingworx 9 Register Here Active Active Clustering This session will cover the main aspects of the High Availability Clustering feature launched with the ThingWorx 9.0 release Register Here

Dec 9, 2020

ThingWorx Performance Monitoring with Grafana authored by EDC team member Desheng Xu ( @xudesheng ) Monitoring ThingWorx performance is crucially important, both during the load testing of a newly completed application, and after the deployment of new code in an existing application. Monitoring performance ensures that everything works as expected at the Enterprise level. This tutorial steps you through configuring and installing a tool which runs on the same network as the ThingWorx instance. This tool collects data from the Platform and translates it into something visual and easy to understand via Grafana. tsample is small and customizable, and it plays a similar role to telegraf. Its focus is on gathering ThingWorx performance metrics. Historically, this tool also supported collecting OS level performance metrics, but this is no longer supported. It is highly recommended to collect OS level performance metrics by using telegraf, a tool designed specifically for that purpose (and not discussed here). This is not the only way to go about monitoring ThingWorx performance, but this tool uses a very good approach that has been proven effective both at customer sites and internally by PTC to monitor scale tests. Find the most recent release here. Recommended Deployment Architecture tsample can be deployed in the same box where ThingWorx Tomcat is running, but it's recommended to deploy it on a separated box to minimize any performance impact caused by the collector. tsample supports export to InfluxDB and/or local file. In this document, it is assumed that InfluxDB will be used for monitoring purpose. Please note that this is not the same instance of InfluxDB being used by ThingWorx (if configured). This article will not cover setting up InfluxDB or NGINX (if necessary), so please configure these before beginning this tutorial. Supported Platform tsample has been tested on Windows 2016, MacOS 10.15, Ubuntu 16.04, and Redhat 7.x. It's anticipated to work on a more general Ubuntu/Redhat/Mac/Windows release as well. Please leave a comment or contact the author, @xudesheng , if Raspberry Pi support is needed. Configuration File Where to Store the Configuration File tsample will pick up the configuration file in the following sequence: from the command line... ./tsample -c <path to configuration file> from the environment... Linux: export TSAMPLE_CONFIG=<path to configuration file> ./tsample Windows: set TSAMPLE_CONFIG=<path to configuration file> tsample.exe from a default location... tsample will try to find a file with the name "config.toml " from the same folder in which it starts. How to Craft a Configuration File You can use following command to generate a sample file: ./tsample -c config.toml -e or: ./tsample -c config.toml --export A file with the name "config.toml " will be generated with a sample configuration. You can then adjust its content in accordance with the following. Configuration File Content Format Configuration file must be in toml format. title and owner sections Both sections are optional. The intention of these two sections is to support doc tool in future. TestMachine section This is section is required, and it defines where this tool will run. thingworx_servers section This section is where you define targeted ThingWorx applications. Multiple ThingWorx servers can be defined with the same or different metrics to be collected. thingworx_servers.metrics sections Underneath each thingworx_servers section, there are several metrics. In default example, following metrics have been included: ValueStreamProcessingSubsystem DataTableProcessingSubsystem EventProcessingSubsystem PlatformSubsystem StreamProcessingSubsystem WSCommunicationsSubsystem WSExecutionProcessingSubsystem TunnelSubsystem AlertProcessingSubsystem FederationSubsystem You can add your own customized metrics, as long as the result follows the same Data Shape. The default Data Shape has 3 columns: If the output Data Shape exceeds this limit, the tool will likely not work properly. result_export_to_db section This section defines the target InfluxDB as a sink of collected performance metrics. result_export_to_file section This section defines the target file storage for collected performance metrics. Grafana Configuration Example Monitor Value Stream Step 1. Connect Grafana to InfluxDB Step 2: Create a New Dashboard Step 3. Create a New Query Depending on which metrics you defined to collect in the tsample configuration file, you will see a different choice of measurement in Grafana. Here, we will use ValueStreamProcessingSubsystem as an example. Step 4. Choose the Right Platform and Storage Provider Some metrics depend on the database storage provider, like Value Stream and Stream. Step 5. Choose the Metrics Figures Select "remove" to get rid of the default 'mean' calculation. Select "non_negative_difference" from Transformations. Using this transformation, Grafana can show us the speed of writes. Then, remove the default GROUP BY "time" clause. Assign a meaningful alias of this query. Step 6. Add Another Query You can add another query as 'Value Stream Queued Speed' by following the same steps. Step 7. Assign a Panel Title Step 8. Review the Result Let's go back to the dashboard page and select "last 15 minutes" or "last 5 minutes" from the top right corner. It should show a result similar to the chart below. Step 9. Save the Dashboard Don't forget to save your dashboard before we add more panels. Step 10. Refine the Panel It's difficult to figure out the high-level write speed from the above panel, so let's enhance it. Add a new query with the following configuration: In the above query, there are two additional figures: 20s and 1m... How do you choose? 20s should be the same as sampling_cycle_inseconds in your tsample configuration file. If you choose a different value, then you could end up with misleading results. Larger values such as "1m" may give you a smoother result, but they could also hide system instability. Going larger than 1m is not recommended in most cases. With this new query, it's much easier to figure out what the average write speed in current testing is. Tips: if your sampling_Cycle_inseconds is 30s, then you may not need this additional query. The following image is a sample at the 30s interval time. You would not need an additional average query to get a smooth write speed. The next example is a sample at the 10s interval time. Without additional queries, you may not be able to get a meaningful understanding of the write speed. From the above three examples, it's recommended to configure the sampling interval time at 30s, or anything larger than 20s. You can then choose whether you need additional queries based on the visualization result. Step 11. Further Refinement The above charts illustrate the queuing and writing speed. However, it is possible that the Value Stream may perform at a reasonable speed, but the Value Stream queue may be growing and could exceed its capacity. Let's add another query to monitor this: However, it is difficult to read this chart, since it has a different value range on the y-axis: Let's move this query to a second y-axis on the right: This will make the view much easier to see: The current queue size or remaining queue size will always move up and down; it is healthy as long as it does not continue to grow to a high level. What Else Can Be Monitored? The following metrics would be monitored by most customers: Value Stream write and queue speed Value Stream queue size Stream write and queue speed Stream queue size Event performed speed (completedTaskCount) Event submitted speed (submittedTaskCount) Event queue size Websocket communication Websocket connection ThingWorx Memory Usage Monitoring Create a new panel and add a new query: In a running system, memory usage will always move up and down - at times sharply or quickly - when the system is busy. The system is healthy as long as memory doesn't go up continuously or stay at a maximum for a long period of time. Conclusion Setting up monitoring is absolutely crucial to managing the performance of an enterprise ThingWorx application. Using Grafana makes tracking and visualizing the performance much easier. Stay tuned to the EDC tag for more monitoring tips to come!

Mar 27, 2020

Remote Monitoring of Assets Benchmark As @ttielebein introduced previously, one of the missions of the IOT Enterprise Deployment Center (EDC) is to publish benchmarks that showcase the ThingWorx Platform deployed to solve real-world IOT business problems. Our goal is that these benchmarks can be used as a reference or baseline for architects working on their own implementations... showing not only a successful at-scale implementation, but also what happens when that same implementation is pushed to ...or even past... it's limits. Please find the first installment attached - a reference benchmark demonstrating ThingWorx deployed to monitor 15,000 assets with a high-volume of data properties per asset. Over 250 hours of simulations were conducted as part of producing this benchmark. The IOT EDC team will be monitoring this post (as well as our other posts in the IOT Tech Tips forum) to answer any questions we can about the approaches taken in designing, deploying and simulating this implementation. As the team will publish more benchmarks like this will be published in the future, we also greatly value any feedback you have that can help us to improve the content for future documents.

Dec 16, 2019

Developing Great IoT Solutions Brought to you once again by your EDC team, find attached here a brand-new, comprehensive overview of ThingWorx best practices! This guide was crafted by combining all available feedback, from support cases to PTC Community threads, and tapping all internal resources. Let this guide serve to bridge the knowledge gaps ThingWorx developers most commonly see. The Developing Great IoT Solutions (DGIS) Guide is a great way to inform both business and technically minded folks about the capabilities of the ThingWorx Platform. Learn how to design good solutions from a high-level, an overview designed specifically with the business audience in mind. Or, learn how to implement good IoT designs through a series of technical examples. Start from very little knowledge of the Platform and end up understanding data structures and aggregation, how to use the collection widget, and how to build a fully functional rules engine for sending and acknowledging alerts in ThingWorx. For the more advanced among us, check out the Appendix. Find here a handy list of do's and don'ts surrounding ThingWorx best practice in development, with links to KCS, Help Center, and Community content. Reinforce your understanding of the capabilities of the ThingWorx Platform with this guide, today! A big thanks to all who were involved on this project! Happy developing!

Jul 31, 2019

Persistent vs. Logged Properties By Mike Jasperson, VP of IoT EDC Executive Summary ThingWorx provides several different “aspects” (or storage options) for how property values are saved. These options each have different implications for performance and scalability. Understanding those implications is important for designing a scalable IOT solution. Persistent Properties are best used for non-telemetry data which will change infrequently (for example only a few times in a day) and where historical values are not required. When overused, Persistent properties can put significant pressure on the database layer of your ThingWorx implementation, leading to poor performance of your IOT application. As the number of Things in your IOT application scales up, the quantity or frequency of persistent properties per Thing needs to be carefully considered. Logged Properties are best used for telemetry data where historical values need to be retained, but also for any other value that is expected to change frequently. Logged properties can create some additional requirements: a process for handling null/default values after restarts, more disk space, and a data retention policy. There are benefits as well, though, like more flexibility and scalability for the ingestion of larger volumes of data. Persistent + Logged Properties perform database operations of both aspects. Combined use should be very limited – only properties that update infrequently (a few times a day), and that must be in-memory in the event of a ThingWorx restart. In-Memory Only Properties are neither persistent nor logged – they are not stored to the database. These properties can greatly improve scale for values that need to be available for the application to drive UIs or compute other derived values that will be stored. However, high-frequency updates of in-memory properties can create scale challenges in HA (high availability) ThingWorx configurations where memory state needs to be constantly shared between multiple ThingWorx nodes. Find a complete summary as well as example cases in the document attached.

Oct 14, 2021

JMeter for ThingWorx Overview Apache JMeter is an open-source tool designed for load testing and measuring the performance of a web application. JMeter has a wide range of features to facilitate this testing, including support for a variety of server and protocol types, a full-featured testing IDE with the ability to record the test steps from both a browser or a native application, and built-in debugging tools. Information about JMeter can be found on Apache’s website. Working with JMeter is not always intuitive, but it also isn’t that much harder than regular software development. Take some time to explore the official Apache JMeter Documentation and figure out where things go and how to mechanically make use of the JMeter IDE. Then step through this tutorial to create a basic test that logins to ThingWorx, accesses a mashup, and clicks on a few widgets. This is the first in a series to come, courtesy of IoT EDC Engineer Tim Atwood ( @atwood ) and the whole EDC team. Installation Download JMeter from Apache’s website. Unpack the archive and copy the files to a desired location. Run the application by double clicking on the “ApacheJMeter.jar” file within the bin directory. JMeter is now installed and ready to use. Creating a Test Set up a proxy in your browser of choice (or on the OS in settings). Select the green “templates” icon in JMeter, and then select “Recording” for the template. Configure the recording template to point towards your ThingWorx Navigate or Foundation server, then click “Create”. Hit “Start” under the “HTTP(S) Test Script Recorder” tab of the new JMeter project. Make sure the port is set correctly under Global Settings. A pop-up box will appear that always stays visible on top of the active browser window, so that the recording can be controlled and stopped at any time. Leave the “Transaction name” field empty so that each transaction recorded by the software is automatically named after the web request (this helps differentiate one from the other, and they can each be renamed later). Open your browser, and navigate (via direct URL if possible, to keep things simple) to the mashup you wish to test. Login and let the page load. Click on anything you’d like on the mashup to capture the activity of that test. Then click “Stop” on the pop-up recorder window to stop the recording. Each transaction will be assigned an index as well, and the source code behind each of these transactions can be reviewed and manually modified in the main JMeter window. Here is the login request for instance: The HTTP Authorization Manager is used to automatically authorize a defined user login for the thread to any of the Base URLs listed. In this case, though, there are two separate servers being accessed during the test, and one may need to be added manually: Save the project before continuing, as manual modifications come next. Within the task page as you do the recording, a set of parameters or body data will be recorded. Modifying this is how you want to parametrize the test scenario, variables like the username and password. To simulate logging in as other users, you have to parameterize this, and not rely on the administrator account name and password entered into the browser. Rename the task controller to “MyTasks” or something more easily identified than the long string it has now: Some recorded items like static images and stylesheets will be non-essential, things the browser processes for better graphical representation, but which are often cached and do not greatly affect the scalability results of the test. These can be highlighted and disabled all at once: Also ensure that any cascading stylesheets have been disabled. Enable the “View Results Tree” to ensure you can review the results of the test script during the editing phase. However, this “Listener” element has a high memory footprint during test execution, so it should be disabled before running an actual scale test. Next we need to parametrize the user login information and pull it from a csv file. The colon means that “Administrator” is the default user to use for login. You can add other properties as well, like ramp up time, run time, number of users, and protocols to use. The ramp up time determines how quickly the threads are allocated for the test, which if done slowly enough, prevents the thundering herd scenario. In more complex scenarios, logic controllers can be inserted to control the flow of the test. This allows for options such as if-then conditions for different user permissions, or parameter-based routes for better randomization of actions in different threads. This will be covered in more detail in a future article. Pre- and Post-Processors can be used as well, with the latter being used here much more than the former, to extract information from the response, in order to then use that as part of the variables going into one of the follow up requests. For example, see the script in this image: This one has a variable that it extracts from the object number property, defined in the CSV file, and converts it into another variable that is used in subsequent scripts. This script uses the object number reference to pull the name out of the body data and make the request, which is then post-processed by a bunch of these extractors. One is a JSON extractor which is trying to get an ID out of the JSON response. There is a regular expression extractor and a bean shell post-processor, which populates some variables based on what it responded with. Once it extracts all of the variables from the response to this particular request (GetSearchResults in this case), it then tailors the additional requests based on these. - Customize the script according to the needs of your own application. Alternate between recording and manually modifying the recording code to ensure the test performs exactly as required and from the perspective of different users with different permissions. Also vary the type of activity performed on the mashup. Highlight the “View Results Tree” tab and click the green start button at the top of the window to see the results appear. If you are getting an unauthorized message, ensure that the scope is right for the login information, which may require moving the “HTTP Authentication Manager” component around in the project. Be sure to check the URLs and credentials entered for each type of user. Occasionally the recorder will insert a long authentication string into the URL, and you want to manually set the URL for the credentials to the most generic URL possible for the server. This can be parametrized too: Referencing the CSV file defined here: Which looks like this for a more complicated scenario (covered in the future): The columns here represent the username, password, object number in Windchill, and object name in Windchill, as well as the wait time used to vary the way the logic is executed and some extra variables which differentiate for the switches what to do to create a more varied and realistic test. Conclusion Following these steps again and again on the various mashups throughout an application can ensure that a script for each web page and each type of user on each web page is created and added to the testing suite. This results in a load test that is perfectly representative of the real-world user load placed on an application. Load testing is a critical part of the development lifecycle in any application, and ThingWorx is no exception. Any further questions about the capabilities of JMeter not covered here, can be answered by the whole JMeter user manual, found on the Apache website. Future articles will include some basic scripts that test basic things, which can serve as an example for more complex ThingWorx JMeter script development. Here is an example of one tool PTC uses for internal QA of ThingWorx, designed to load test a Navigate application (specifically its built-in mashups): Something similar to this tool may be available for public use later this summer. In the meantime, feel free to use the tutorial above to create scripts of your own. Any issues building your custom load tests in JMeter can be discussed right here on this thread with our JMeter experts. Happy developing!

Jun 30, 2020

When to Include InfluxDB in the ThingWorx Development Lifecycle (this article is also available for download as a PDF attached) The Short Answer InfluxDB is a time series database designed specifically for data ingestion. Historically, InfluxDB has been viewed as a high-scale expansion option for ThingWorx: a way to ensure the application works as intended, even when scaled up to the enterprise level. This is certainly one way to view it, because when there are many, many remote things, each with a lot of properties writing to the Platform at short intervals, then InfluxDB is a sure choice. However, what about in smaller applications? Is there still a benefit to using an optimized data ingestion tool in any case? The short answer is: yes, there is! Using InfluxDB for optimized data ingestion is a good idea even in smaller-sized applications, especially if there are plans to scale the application up in the future. It is far better to design the application around InfluxDB from the start than to adjust the data model of the application later on when an optimized data ingestion process is required. PostgreSQL and InfluxDB simply handle the storage of data in different ways, with the former functioning better with many Value Streams, and the latter with fewer Value Streams. Switching the way data is retained and referenced later, when the application is already on the larger side, causes delays in growing the application larger and adding more devices. Likewise, if the Platform reaches its ingestion limits in a production environment, there can be costly downtime and data loss while a proper solution (which likely involves reworking the application to work optimally with InfluxDB) is implemented. Don’t think that InfluxDB is for expansion only; it is an optimized ingestion database that has benefits at every level of the application development lifecycle. From the end to end, InfluxDB can ensure reliable data ingestion, reduced risk of data loss, and reduced memory and CPU used by the deployment overall. Preliminary sizing and benchmark data is provided in this article to explain these recommendations. Consider how ThingWorx is ingesting data now, how much CPU and other resources are used just for acquiring the data, and perhaps InfluxDB would seem a benefit to improve application performance. The Long Answer In order to uncover just how beneficial InfluxDB can be in any size application, the IoT Enterprise Deployment Center has run some simulations with small and medium sized applications. The use case in the simulation is simple with user requests coming from a collection of basic mashups and data ingestion coming from various numbers of things, each with a collection of “fast” and “slow” properties which update at different rates. This synthetic load of data does not include a more complete application scenario, so the memory and CPU usage shown here should not be used as sizing recommendations. For those types of recommendations, stay tuned for the soon-to-release ThingWorx 9.0 Sizing (or check out the current 8.5 Sizing Guide). Comparing Runs When determining the health of the ThingWorx Platform, there are several categories to inspect: Value Stream Queue Rate and Queue Size, HTTP Requests, and the overall Memory and CPU use for each server. Using Grafana to store the metrics results in charts like those below which can easily be compared and contrasted, and used to evaluate which hardware configuration results in the best performance. The size of the numbers on the vertical axis indicate total numbers of resources used for that metric, while the slope or trend of each chart indicates bottlenecks and inadequate resource allocation for the use case. In this case, all darker charts represent data from PostgreSQL ONLY configurations, while the lighter charts represent the InfluxDB instances. Because this is not a sizing guide, whether each of these charts comes from the small or medium run is unimportant as long as they match (for valid comparisons between with Influx and without it). The smaller run had something like 20k Things, and the larger closer to 60k, both with 275 total Platform users (25 Admins) and 3 mashups, which were each called at various refresh rates over the course of the 1-hour testing period. Note that in the PostgreSQL ONLY instances, there were more Thing Templates and corresponding Value Streams. This change is necessary between runs because only with fewer Value Streams does InfluxDB begin to demonstrate notable improvements. The most important thing to note is that the lighter charts clearly demonstrate better performance for both size runs. Each section below will break down what the improvement looks like in the charts to show how to use Grafana to verify the best performance. Value Stream Queue The vertical axis on the Value Stream Queue Rate chart shows how many total writes per minute (WPS) the Platform can handle. The average is 10 WPS higher using InfluxDB in both scenarios, and InfluxDB is also much more stable, meaning that the writes happen more reliably. The Value Stream Queue Size chart demonstrates how well the writes within the queue are processed. Both of these are necessary to determine the health of data ingestion. If the queue size were to increase and trend upward in the lighter Queue Size chart, then that would mean the Platform couldn’t handle the higher ingestion rate. However, since the Queue Size is stable and close to 0 the entire time, it is clear that the Platform is capable of clearing out the Value Stream Queue immediately and reliably throughout the entire test. FIGURE 1 – THESE REPRESENT THE DATA GETTING STORED INTO THE DATA PROVIDER. NOTE: THE FORMER IS MUCH LOWER THAN THE LATTER. FIGURE 2 – NOTE THE DATA LOSS IN THE NON-INFLUX INSTANCE (THE QUEUE IN GREEN REACHES THE MAX IN YELLOW). THE INFLUX INSTANCE HAS LESS TROUBLE CLEARING OUT THE QUEUE, AS DEMONSTRATED BY THE CONSISTENTLY LOW QUEUE SIZE. HTTP Requests Taking the strain of ingestion off of the Platform’s primary database frees its resources up for other activities. This in turn improves the performance and reliability of the Platform to respond to HTTP requests, those which in a typical application are used to aggregate data into smaller data stores (depending on the use case) and which render the mashups for the end users. The business logic and mashups can be more complex when there is one database designated for ingestion (InfluxDB) and one for everything else (PostgreSQL). FIGURE 3 – THE DARKER CHART SHOWS A LOT OF CHOPPINESS, MEANING THAT WHILE THE PLATFORM WAS RESPONDING THE WHOLE TIME, IT WAS NOT DOING SO RELIABLY. THE SMOOTHER SECOND CHART SHOWS HOW MUCH EASIER THE PLATFORM CAN HANDLE THESE REQUESTS WHEN THE LOAD IS DISTRUBITED INTELLIGENTLY ACROSS MULTIPLE SERVERS, EACH OPTIMIZED FOR THE TYPE OF DATA THEY RECEIVE. THE “STAIRCASE” SHAPE OCCURS BECAUSE THE SIMULATOR INCREASES THE WORK LOAD EVERY 10 MINUTES UNTIL IT BREAKS. Likewise, the nature of Postgres lends well towards this differentiation, given that there are many more database tables required for supporting the HTTP requests, something Postgres does well. That leaves Influx to handle the time-series data and ingestion, and those are the primary strengths of that software as well. So, splitting the load across multiple servers in this way results in smaller server sizes overall, each which is stream-lined and optimized to handle exactly what it is given by the Platform. Note that in both of these charts, there are no bad requests, so both would seem to be successful runs. However, as future charts will demonstrate more clearly, there is a catastrophic failure when the load is increased around 12:30p. The simulation ends before the server begins to show any real symptoms of the issue, and that is why there are no bad requests. The maximum Operations Per Second (OPS) in the Hardware Specifications and Performance section is taken from before the failure begins. Clearly the InfluxDB instance has better performance given that the average Operations Per Second (OPS) is substantially higher, nearly 4 times what is seen in the PostgreSQL ONLY instance. Obviously how well the Platform manages the business logic and mashup loading will depend on a lot of factors. In this test scenario, the OPS was increased by increasing the mashup refresh rate on the InfluxDB instances (which could handle over double the operations). Likewise, the number of Stream writes to the PostgreSQL database could be double what it was when PostgreSQL was the only database. Therefore, configuring InfluxDB for the data ingestion and leaving Postgres for the rest of the application certainly makes the load much easier on the Platform, and the same would be true even in a much more complex scenario. Memory and CPU The important thing here is to keep the memory use low enough that any spikes in usage won’t cause a server malfunction. CPU Usage should stay at or below around 75%, and Memory should never exceed around 80% of the total allocated to the server. The sizing guides can help determine what this allocation of memory needs to be. Of note in these charts is the slight, upward slope of the CPU usage in the darker chart, indicating the start of a catastrophic failure, and the difference in the total memory needed for the ThingWorx Platform and Postgres servers when Influx is used or not. As is apparent, the servers use much less memory when the database load is split up intelligently across multiple servers. FIGURE 4 – THE THINGWORX CPU IS ABOUT THE SAME HERE AS IN THE INFLUXDB CONFIGURATION BELOW BUT LOOK AT HOW MUCH MORE MEMORY BOTH THE PLATFORM AND THE POSTGRES DATABASE NEED ALLOCATED TO THEM IN THIS CONFIGURATION (64 GB A PIECE). ALSO NOTE THE JUMP IN CPU AND MEMORY USAGE AFTER 12:30P. THIS IS REFERENCED IN THE PREVIOUS SECTION, AND THE SLOPE UPWARD OF THE USAGE AFTER THAT POINT INDICATES THE START OF A CATASROPHIC FAILURE. THE TEST ENDS TOO SOON TO SEE ANY SYMPTOMS OF FAILURE, BUT IT IS A SURE THING AFTER THE INCREASE IN LOAD AROUND 12:30P. FIGURE 5 – INFLUX NEEDS AN EXTRA SERVER, BUT THE SIZE OF THE INFLUX AND POSTGRES SERVERS TOGETHER IS LESS THAN HALF THE SIZE AS THAT REQUIRED FOR THE SINGLE POSTGRES DATABASE IN THE POSTGRES ONLY CONFIGURATION (8 GB). THINGWORX IS SMALLER TOO (32 GB). Hardware Specifications and Performance These are the exact specifications for each simulated instance, broken down by size and whether InfluxDB is configured or not. Note that some of the hardware specifications may be more than is necessary real-world use case depending. As stated previously, this document is not a sizing guide (use the official ThingWorx Sizing Guide). Note that the maximum number of WPS and OPS are shown here. The maximums are higher in the InfluxDB scenarios, meaning that even with smaller-sized servers, the InfluxDB configurations can handle much greater loads. Summary In conclusion, if InfluxDB may at some point be needed in the lifecycle of an application, because the expected number of things or the number of properties on each thing is large enough that it will max the limitations of the Platform otherwise, then InfluxDB should be used from the very start. There are benefits to using InfluxDB for data ingestion at every size, from performance to reliability, and of course the obviously improved scalability as well. Reworking the application for use with InfluxDB later on can be costly and cause delays. This is why the benefits and costs associated with an InfluxDB-centric hardware configuration should be considered from the start. More servers are required for InfluxDB, but each of these servers can be sized smaller (depending on the use case), and all of this will affect the overall cost of hosting the ThingWorx application. The benefits of InfluxDB are especially pronounced when used in conjunction with clusters, which will be demonstrate fully in the 9.0 Sizing Guide (soon to be released). If InfluxDB is used to interface with the clusters, then there are even more resources to spare for user requests. It is considered ThingWorx best practice for high ingestion customers to make use of InfluxDB in applications of any size. Note, though, that this will mean the number of Value Streams per Influx Database will need to be limited to single digits. We hope this helps, and from everyone here at the EDC, happy developing!

May 31, 2020

ThingWorx DevOps with Jenkins DevOps as a topic is vast and has been addressed at many times throughout the history of the PTC Community. Previous posts address what DevOps is, teach how to make use of DevOps like a pro, announce updates to the PTC Git Extension, and explain why this extension is so helpful to achieving continuous Git integration with ThingWorx. This post provides a PDF guide on Jenkins integration with ThingWorx, including tutorials with detailed information on how to setup your ThingWorx instance and how to configure your Jenkins Pipeline. The PDF is listed for download separately, but it is also included in the zip with the other required files for the tutorial. The Jenkins Pipeline provided here is intended as an example / starting point for managing your DevOps in ThingWorx and can easily be extended. Please note that this Pipeline is not officially supported by PTC.

Aug 26, 2020

Load Testing through C SDK Remote Device Simulation in ThingWorx As discussed in the EDC's previous article, load or stress testing a ThingWorx application is very important to the application development process and comes highly recommended by PTC best practices. This article will show how to do stress testing using the ThingWorx C SDK at the Edge side. Attached to this article is a download containing a generic C SDK application and accompanying simulator software written in python. This article will discuss how to unpack everything and move it to the right location on a Linux machine (Ubuntu 16.04 was used in this tutorial and sudo privileges will be necessary). To make this a true test of the Edge software, modify the C SDK code provided or substitute in any custom code used in the Edge devices which connect to the actual application. It is assumed that ThingWorx is already installed and configured correctly. Anaconda will be downloaded and installed as a part of this tutorial. Note that the simulator only logs at the "error" level on the SDK side, and the data log has been disabled entirely to save resources. For any questions on this tutorial, reach out to the author Desheng Xu from the EDC team (@DeShengXu). Background: Within ThingWorx, most things represent remote devices located at the Edge. These are pieces of physical equipment which are out in the field and which connect and transmit information to the ThingWorx Platform. Each remote device can have many properties, which can be bound to local properties. In the image below, the example property "Pressure" is bound to the local property "Pressure". The last column indicates whether the property value should be stored in a time series database when the value changes. Only "Pressure" and "TotalFlow" are stored in this way. A good stress test will have many properties receiving updates simultaneously, so for this test, more properties will be added. An example shown here has 5 integers, 3 numbers, 2 strings, and 1 sin signal property. Installation: Download Python 3 if it isn't already installed Download Anaconda version 5.2 Sometimes managing multiple Python environments is hard on Linux, especially in Ubuntu and when using an Azure VM. Anaconda is a very convenient way to manage it. Some commands which may help to download Anaconda are provided here, but this is not a comprehensive tutorial for Anaconda installation and configuration. Download Anaconda curl -O https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh Install Anaconda (this may take 10+ minutes, depending on the hardware and network specifications) bash Anaconda3-5.2.0-Linux-x86_64.sh To activate the Anaconda installation, load the new PATH environment variable which was added by the Anaconda installer into the current shell session with the following command: source ~/.bashrc Create an environment for stress testing. Let's name this environment as "stress" conda create -n stress python=3.7 Activate "stress" environment every time you need to use simulator.py source activate stress Install the required Python modules Certain modules are needed in the Python environment in order to run the simulator.py file: psutil, requests. Use the following commands to install these (if using Anaconda as installed above): conda install -n stress -c anaconda psutil conda install -n stress -c anaconda requests Unpack the download attached here called csim.zip Unzip csim.zip and move it into the /opt folder (if another folder is used, remember to change the page in the simulator.json file later) Assign your current user full access to this folder (this command assumes the current user is called ubuntu ) : sudo chown -R ubuntu:ubuntu /opt/csim Move the C SDK source folder to the lib folder Use the following command: sudo mv /opt/csim/csdkbuild/libtwCSdk.so.2.2.4 /usr/lib You may have to also grant a+x permissions to all files in this folder Update the configuration file for the simulator Open /opt/csim/simulator.json (or whatever path is used instead) Edit this file to meet your environment needs, based on the information below Familiarize yourself with the simulator.py file and its options Use the following command to get option information: python simulator.py --help Set-Up Test Scenario: Plan your test Each simulator instance will have 8 remote properties by default (as shown in the picture in the Background section). More properties can be added for stress test purposes in the simulator.json file. For the simulator to run 1k writes per second to a time series database, use the following configuration information (note that for this test, a machine with 4 cores and 16G of memory was used. Greater hardware specifications may be required for a larger test): Forget about the default 8 properties, which have random update patterns and result in difficult results to check later. Instead, create "canary properties" for each thing (where canary refers to the nature of a thing to notify others of danger, in the same way canaries were used in mine shafts) Add 25 properties for each thing: 10 integer properties 5 number properties 5 string properties 5 sin properties (signals) Set the scan rate to 5000 ms, making it so that each of these 25 properties will update every 5 seconds. To get a writes per second rate of 1k, we therefore need 200 devices in total, which is specified by the start and end number lines of the configuration file The simulator.json file should look like this: Canary_Int: 10 Canary_Num: 5 Canary_Str: 5 Canary_Sin: 5 Start_Number: 1 End_Number: 200 Run the simulator Enter the /opt/csim folder, and execute the following command: python simulator.py ./simulator.json -i You should be able to see a screen like this: Go to ThingWorx to check if there is a dummy thing (under Remote Things in the Monitoring section): This indicates that the simulator is running correctly and connected to ThingWorx Create a Value Stream and point it at the target database Create a new thing and call it "SimulatorDummyThing" Once this is created successfully and saved, a message should pop up to say that the device was successfully connected Bind the remote properties to the new thing Click the "Properties and Alerts" tab Click "Manage Bindings" Click "Add all properties" Click "Done" and then "Save" The properties should begin updating immediately (every 5 seconds), so click "Refresh" to check Create a Thing Template from this thing Click the "More" drop-down and select "Create ThingTemplate" Give the template a name (ensure it matches what is defined in the simulator.json file) and save it Go back and delete the dummy thing created in Step 4, as now we no longer need it Clean up the simulator Use the following command: python simulator.py ./simulator.json -k Output will look like this: Create 200 things in ThingWorx for the stress test Verify the information in the simulator.json file (especially the start and end numbers) is correct Use the following command to create all things: python simulator.py ./simulator.json -c The output will look like this: Verify the things have also been created in ThingWorx: Now you are ready for the stress test Run Stress Test: Use the following command to start your test: python simulator.py ./simulator.json -l or python simulator.py ./simulator.json --launch The output in the simulator will look like this: Monitor the Value Stream writing status in the Monitoring section of ThingWorx: Stop and Clean Up: Use the following command to stop running all instances: python simulator.py ./simulator -k If you want to clean up all created dummy things, then use this command: python simulator.py ./simulator -d To re-initiate the test at a later date, just repeat the steps in the "Run Stress Test" section above, or re-configure the test by reviewing the steps in the "Set-Up Test Scenario" section That concludes the tutorial on how to use the C SDK in a stress or load test of a ThingWorx application. Be sure to modify the created Thing Template (created in step 6 of the "Set-Up Test Scenario" section) with any business logic required, for instance events and alerts, to ensure a proper test of the application.

Nov 6, 2019

The natively exposed ThingWorx Platform performance metrics can be extremely valuable to understanding overall platform performance and certain of the core subsystem operations, however as a development platform this doesn't give any visibility into what your built solution is or is not doing. Here is an amazing little trick that you can use to embed custom performance metrics into your application so that they show up automatically in your Prometheus monitoring system. What you do with these metrics is up to your creativity (with some constraints of course). Imaging a request counter for specific services which may be incredibly important or costly to run, or an exception metric that is incremented each time you catch an exception, or a query result size metric that informs you of how much data is being queried from the database. Refer to Resources > MetricsServices: GetCounterMetric GetGaugeMetric IncrementCounterMetric DecrementCounterMetric SetGaugeMetric You'll need to give your metric a name - identified by key - and this is meant to be dotted notation* which will then be converted to underscores when the metric is exposed on the OpenMetrics endpoint. Use sections/domains in the dotted notation to structure your metrics in-line with your application design. COUNTER type metrics are the most commonly used and relate to things happening through time. They are an index which will get timestamped as they're collected by Prometheus so that you will be able to look back in time and analyse and investigate what happened when and what the scale or impact was. After the fact functions and queries will need to be applied to make these metrics most useful (delta over time, increase, rate per second). Common examples of counter type metrics are: requests, executions, bytes transferred, rows queried, seconds elapsed, execution time. Resources["MetricServices"].IncrementCounterMetric({ basetype: "LONG", value: 1, key: "__PTC_Reported.integration.mes.requests", aggregate: false }); GAUGE type metrics are point-in-time status of some thing being measured. Common gauge type metrics are: CPU load/utilization, memory utilization, free disk space, used disk space, busy/active threads. Resources["MetricServices"].SetGaugeMetric({ basetype: "NUMBER", value: 12, key: "__PTC_Reported.Users.ConnectedOperatorCount", aggregate: true }); Be aware of the aggregate flag, as it will make this custom metric cluster level which can have some unintended consequences. Normally you always want performance metrics for the specific node as you then see what work is happening where and can confirm that it is being properly distributed within the cluster. There are some situations however where you might want the cluster aggregation however, like with this concurrently connected operators. Happy Monitoring!

Jul 4, 2024

By Tim Atwood and Dave Bernbeck, Edited by Tori Firewind Adapted from the March 2021 Expert Session Produced by the IoT Enterprise Deployment Center The primary purpose of monitoring is to determine when your application may be exhausting the available resources. Knowledge of the infrastructure limits help establish these monitoring boundaries, determining straightforward thresholds that indicate an app has gone too far. The four main areas to monitor in this way are CPU, Memory, Networking, and Disk. For the CPU, we want to know how many cores are available to the application and potentially what the temperature is for each or other indicators of overtaxation. For Memory, we want to know how much RAM is available for the application. For Networking, we want to know the network throughput, the available bandwidth, and how capable the network cards are in general. For Disk, we keep track of the read and write rates of the disks used by the application as well as how much space remains on those. There are several major infrastructure categories which reflect common modes of operation for ThingWorx applications. One is Bare Metal, which relies upon the traditional use of hardware to connect directly between operating system and hardware, with no intermediary. Limits of the hardware in this case can be found in manufacturing specifications, within the operating system settings, and listed somewhere within the IT department normally. The IT team is a great resource for obtaining these limits in general, also keeping track of such things in VMware and virtualized infrastructure models. VMware is an intermediary between the operating system and the hardware, and often its limits are determined based on the sizing of the application and set by the IT team when the infrastructure is established. These can often be resized as needed, and the IT team will be well aware of the limits here, often monitoring some of the performance themselves already. This is especially so if Cloud Providers are used, given that these are scaled up virtualizations which are configured in easy-to-use cloud portals. These two infrastructure models can also be resized as needed. Lastly Containers can be used to designate operating system resources as needed, in a much more specific way that better supports the sharing of resources across multiple systems. Here the limits are defined in configuration files or charts that define the container. The difficulties here center around learning what the limits are, especially in the case of network and disk usage. Network bandwidth can fluctuate, and increased latency and network congestion can occur at random times for seemingly no reason. Most monitoring scenarios can therefore make due with collecting network send and receive rates, as well as disk read and write rates, performed on the server. Cloud Providers like Azure provide VM and disk sizing options that allow you to select exactly what you need, but for network throughput or network IO, the choices are not as varied. Network IO tends to increase with the size of the VM, proportional to the number of CPU cores and the amount of Memory, so this may mean that a VM has to be oversized for the user load, for the bulk of the application, in order to accommodate a large or noisy edge fleet. The next few slides list the operating metrics and common thresholds used for each. We often use these thresholds in our own simulations here at PTC, but note that each use case is different, and each situation should be analyzed individually before determining set limits of performance. Generally, you will want to monitor: % utilization of all CPU cores, leaving plenty of room for spikes in activity; total and used memory, ensuring total memory remains constant throughout and used memory remains below a reasonable percentage of the total, which for smaller systems (16 GB and lower) means leaving around 20% Memory for the OS, and for larger systems, usually around 3-4 GB. For disks, the read and write rates to ensure there is ample free space for spikes and to avoid any situation that might result in system down time; and for networking, the send and receive rates which should be below 70% or so, again to leave room for spikes. In any monitoring situation, high consistent utilization should trigger concern and an investigation into what’s happening. Were new assets added? Has any recent change caused regression or other issues? Any resent changes should be inspected and the infrastructure sizing should be considered as well. For ThingWorx specific monitoring, we look at max queue sizes, entries performed, pool sizes, alerts, submitted task counts, and anything that might indicate some kind of data loss. We want the queues to be consistently cleared out to reduce the risk of losing data in the case of an interruption, and to ensure there is no reason for resource use to build up and cause issues over time. In order for a monitoring set-up to be truly helpful, it needs to make certain information easily accessible to administrative users of the application. Any metrics that are applicable to performance needs to be processed and recorded in a location that can be accessed quickly and easily from wherever the admins are. They should quickly and easily know the health of the application from a glance, without needing to drill down a lot to be made aware of issues. Likewise, the alerts that happen should be meaningful, with minimal false alarms, and it is best if this is configurable by the admins from within the application via some sort of rules engine (see the DGIS guide, soon to be released in version 9.1). The monitoring tool should also be able to save the system history and export it for further analysis, all in the name of reducing future downtime and creating a stable, enterprise system. This dashboard (above) is a good example of how to rollup a number of performance criteria into health indicators for various aspects of the application. Here there is a Green-Yellow-Red color-coding system for issues like web requests taking longer than 30s, 3 minutes, or more to respond. Grafana is another application used for monitoring internally by our team. The easy dashboard creation feature and built-in chart modes make this tool super easy to get started with, and certainly easy to refer to from a central location over time. Setting this up is helpful for load testing and making ready an application, but it is also beneficial for continued monitoring post-go-live, and hence why it is a worthy investment. Our team usually builds a link based on the start and end time of tests for each simulation performed, with all of the various servers being monitored by one Grafana server, one reference point. Consider using PTC Performance Advisor to help monitor these kinds of things more easily (also called DynaTrace). When most administrators think of monitoring, they think of reading and reacting to dashboards, alerts, and reports. Rarely does the idea of benchmarking come to mind as a monitoring activity, and yet, having successful benchmarks of system performance can be a crucial part of knowing if an application is functioning as expected before there are major issues. Benchmarks also look at the response time of the server and can better enable tracking of actual end user experience. The best option is to automate such tests using JMeter or other applications, producing a daily snapshot of user performance that can anticipate future issues and create a more reliable experience for end users over time. Another tool to make use of is JMeter, which has the option to build custom reports. JMeter is good for simulating the user load, which often makes up most of the server load of a ThingWorx application, especially considering that ingestion is typically optimized independently and given the most thought. The most unexpected issues tend to pop up within the application itself, after the project has gone live. Shown here (right) is an example benchmark from a Windchill application, one which is published by PTC to facilitate comparison between optimized test systems and real life performance. Likewise, DynaTrace is depicted here, showing an automated baseline (using Smart URL Detection) on Response Time (Median and 90th percentile) as well as Failure Rate. We can also look at Throughput and compare it with the expected value range based on historical throughput data. Monitoring typically increases system performance and availability, but its other advantage is to provide faster, more effective troubleshooting. Establish a systematic process or checklist to step through when problems occur, something that is organized to be done quickly, but still takes the time to find and fix the underlying problems. This will prevent issues from happening again and again and polish the system periodically as problems occur, so that the stability and integrity of the system only improves over time. Push for real solutions if possible, not band-aids, even if more downtime is needed up front; it is always better to have planned downtime up front than unplanned downtime down the line. Close any monitoring gaps when issues do occur, which is the valid RCA response if not enough information was captured to actually diagnose or resolve the issue. PTC Tech Support developed a diagnostic data gathering query for Oracle that customers can use, found in our knowledgebase. This is an example of RCA troubleshooting that looks at different database factors, reporting on which queries perform the worst based on inputted criteria. Another example of troubleshooting is for the Java JVM, where we look at all of the things listed here (below) in an automated, documented process that then generates a report for easy end user consumption. Don’t hesitate to reach out to PTC Technical Support in advance to go over your RCA processes, to review benchmark discrepancies between what PTC publishes and what your real-life systems show, and to ensure your monitoring is adequate to maintain system stability and availability at all times.

Mar 24, 2021

Generating and Reviewing JMeter Results Overview The 4th in a series of articles on load testing with JMeter, this one covers pushing the limits of a test to see how much the application can handle, as well as generating and analyzing reports once the testing completes. This article rounds off the basics of JMeter, such that anyone should be able to perform enterprise-level load testing after reviewing the content here. Multiple criteria can be used to evaluate results, including: response time (as monitored both by JMeter, and by some other tool on the system side) throughput number of errors resource saturation CPU, Memory, disk, and network utilization Depending on use case, some of these may be considered more important than others. For instance, some customers don't care if users wait a while for results to appear on the page (response time), because they set their users' expectations and mitigate the experience with well-designed loading graphics. With response times secondary, the real issues center around data loss or system outages, with resource utilization and number of errors becoming the more important indicators of system health. Request and database timeout errors are more important indicators, as they occur most often when resources are saturated and there is data loss. It is typical for many customers to find preventing data loss and/or promoting data integrity to be more important than preventing long response times. Consider which of these factors is most important to your use case as you determine what kind of information to gather and review in your reports. How to Create Client-Side Reports in JMeter Creating reports for the client-side data is very simple using JMeter, both from the command line and within the UI (as shown in the tutorial below). These reports have graphical displays of response times, information about the number and type of response errors, and other criteria of performance used to gauge the success or failure of a load test. Follow these steps to generate an index file, which when opened in your browser of choice, will show all of the relevant JMeter data. Tutorial: Create an empty directory in which to store reports: Start the JMeter test with these options, or run these commands after the fact, to generate the HTML report: Once the test completes, use: jmeter -g <outputfile.jtl/csv> -o <path to output folder for html report> To start a test with the correct command for report generation, use this command: jmeter -n -t <test JMX file> -l <outputfile.jtl/csv> -e -o <Path to output folder> Running the above commands will generate these files: When the test is complete, the many JMeter client consoles will look like this: Go ahead and close the windows to terminate once they are finished. Optionally you can run multiple tests sequentially using the same jmeter-server windows. Click on the “index.html” file to open the results viewing window: At any time, modify the settings of this “HTML dashboard” using the details from the JMeter user manual. This citation describes many options for these dashboards, as well as recommendations on how to group and format the results in ways which best convey the success or failure of the test, based on the custom requirements of the application and how granular the view needs to be. Most of the time, the default settings work ok, showing something similar to this: The charts aren’t labeled very well here, so click on the Response Times submenu: This page may take some time to render if there is a lot of data: Next, scroll down to see all the requests that occurred and sort them by how long they took to complete. Anything which took over 5 seconds (or more depending on what is expected) should be investigated as part of the post-test analysis. Does something need to be tuned or optimized? This is how to tell which request is holding things up for your customers. There is also a chart that shows the overview, grouping the response times by how long they took to demonstrate the health of the system more concretely. Typically, the bars look something like this: This represents expected behavior, where most of the requests are quite fast, and then there are a few that had errors or took a bit longer. This is pretty typical for web activity. You can also generate the report through the main JMeter client: Give it a results file and an output directory to generate the same index file: There are log files in each of the JMeter client directories called “jmeter-server.log”: These files may show the wrong timezone, but the elapsed times are correct, and they will show when the JMeter clients started, how many threads they ran, which servers were which, and if there were any errors. Not all errors will mean a failed test, so review anything that appears and determine what is expected. Consider designing a batch script to gather all of these logs together, or even analyze them automatically to extract only relevant information. How to Create Server-Side Results in DynaTrace Collecting data from the environment, including CPU usage, Memory utilization (used vs. total), Garbage Collection times and other metrics of system health on the server, will require the use of an external tool. PTC’s official tool for this is called DynaTrace (PTC System Monitor), shown here. PTC offers a runtime license for DynaTrace to anyone who buys certain products, including Kepware Server, ThingWorx Foundation and Navigate, Windchill, Integrity, and more. Read more information about DevOps on the PTC Community, and stay tuned for more articles on the subject to come from the EDC. Another option would be something like telegraf and Grafana (from the previous blog post), which facilitate the option to create dashboards around the data output specific to the needs of the application, which can still be monitored even once the application goes live. It can certainly be worth it to use such a tool for monitoring the server-side, but the set-up takes more time. Likewise, many VMs have monitoring faculties for CPU usage and memory utilization built-in, but DynaTrace also has visualization, consolidation of system elements, and other features that make it easy to use right out of the box. See the screenshots below for some examples on how to use DynaTrace, and be sure to review PTC’s full documentation here. The example shown here is a ThingWorx Navigate system, with Windchill and ThingWorx Foundation set up side-by-side. This chart shows the overall response times of the server-side of the system. JMeter collects the statistics on what the client looks like, while another tool is required to collect the server-side metrics like CPU usage and Memory utilization, things that indicate the health of the VM or computer hosting the clients. An older version of DynaTrace is depicted here, available for free for all ThingWorx customers from the PTC Downloads Site (under various product listings). In DynaTrace, you can build new dashboards using PurePaths: You can also look at the response times for each service, but be sure to change the response limit to a large number so that all the results are returned. Changing the response limit to a large number to ensure all of the results show in the PurePaths dashboard. Highlighted here in DynaTrace is the longest service that ran, which in this case took 95 seconds to fully respond: More specific analysis of this service can now begin. Perhaps it needs to be tuned, or otherwise optimized to handle the number of threads, i.e. the number of users. Perhaps the system needs more resources or the VM isn’t large enough for the test. Perhaps more JMeter clients and system resources are required. Something will explain this long response time, and that will inform as to what work might still remain before this system can scale up to the enterprise level. How to Use the Test Results Load Testing often means scaling the test up a little more each time until the system eventually breaks, or the target performance is reached. Within JMeter, this won’t mean increasing the overall number of threads per one JMeter client, but instead, scaling horizontally to other JMeter clients (as covered in the previous blog post). Now that the remote or distributed clients are configured and the test running, how do we know when the test is beginning to fail? It turns out that this answer is not a simple one. Which results are considered desirable will vary from one customer to the next based on many factors, and analyzing the test results is a massive topic all on its own. However, there is one thing that any customer would care to review, and that is the response time overview chart found within the JMeter reports. This chart can be used to compare the performance of the majority of threads against a baseline, indicating the point at which the test begins to fail, i.e. the point at which the limits of the system are reached. The easiest way to determine a good standard response time for a load test, a baseline, is to start with a single JMeter client and record the response times for just 1-5 threads. You can record the response times for individual requests, particularly queries and other services with expected long response times, or the average response times across all requests or groups of requests, if the performance of some mashups are more important than others. This approach is better than relying on the response times seen in a browser because HTML pages load differently when rendered in a browser, with differing graphical resource requirements than what is requested in JMeter. Note that some customers will also manually record response times within a separate browser-based test scenario during load testing as either a sanity check or as part of their overall benchmarking in order to further validate the scalability of the application, but this wouldn’t involve JMeter given that browsers load things differently and cross-comparison is a bad idea. Once the baseline response times are established, start increasing the thread counts across the many JMeter clients until you see the response times go up on average. PTC’s standard criteria for load testing is exceeded when the average response times are roughly doubled, or when the system seems overwhelmed with the user load on the server side (which is what to look out for in DynaTrace or the external system monitor). At this point, the application is said to have reached a bottleneck, which could be a simple tuning problem, or it could be saturated by resource requirements. Either way, the bottleneck is proof that the system can’t take any more threads without users beginning to notice and the response times approaching an unreasonable delay. Other criteria can be used as well, say if any one thread takes more than 5 seconds to respond. Also ensure there are no unexpected errors, as gateway errors represent failed tests too. Sometimes there will be errors even when the test is successful, though, so consider monitoring the error percentage, a column in the Summary Report tab of JMeter, to see what is normal. The throughput column may also be something to monitor. Many watch for increases in throughput as the thread count increases to ensure there is no degradation in performance (which may indicate hardware or sizing constraints). The Summary Report will look something like this, with thread group results from all of the clients appearing side by side, differentiated from each other by the unique port: Conclusions Generating and reviewing reports within JMeter is straight-forward and easily customizable. Be sure to also monitor the system itself using an external tool like DynaTrace, PTC’s official System Monitor, which has a lot of value considering how easy it is to use out of the box. If the system looks healthy on the server side and the response times are within an acceptable range on the client side, then the application is ready for enterprise use. Be sure to generate a baseline for response times within JMeter, remembering that browsers have different loading processes than JMeter, and not to cross-compare. This article constitutes the end of the basics. The final article to come will talk about more advanced test design features and best practices, so stay tuned!

Sep 28, 2020

The New and Improved DGIS Guide to ThingWorx Development Written by Victoria Firewind of the IoT EDC The classic Developing Great IoT Solutions guide has been reskinned and revamped for newer versions of ThingWorx! The same information on how to build a quality IoT application is now available for versions of ThingWorx 9.1+, and now, a complete sample application is included to demonstrate these ideas. Find within the attached archive a PDF with high-level overview information on development and application design geared towards managers and business users, so that everyone can understand the necessary requirements, common terms, and key tips on how to ensure an application is scalable and maintainable right from the very start. Reduce your chances of running into issues between PoC and Go Live by reviewing this information today! Also find within this PDF a series of tutorials which teach not just how to use the ThingWorx software, but which also educate on how to make good application design choices. A basic rules engine for sending real-time notifications is included here, as well as a complete demo application which illustrates each concept in a real-world use case. This Coffee Machine Demo App relies upon the tutorial entities, which can also now be imported directly using the other XML files provided here. This ensures that anyone can review these concepts, regardless of how much time one can commit or how much knowledge one already has on the subject. This is a complex guide, and any issues, questions, or bugs found within can be reported right here on this thread. Happy developing from the IoT EDC!

Apr 13, 2021

Thread Safe Coding, Part 1: The Java Extension Approach Written by Desheng Xu and edited by @vtielebein Overview Time and again, customers report that one of their favorite ThingWorx features is using However, the Javascript language doesn't have a built-in semaphore locker mechanism, nothing to enable thread-safe concurrent processing, like you find in the Java language. This article demonstrates why thread safe coding is necessary and how to use the Java Extension for this purpose. Part 2 presents an alternative approach using database lockers. Demo Use Case Let's use a highly abstracted use case to demo thread-safe code practices: There are tens of machines in a factory, and PLC will emit a signal to indicate an issue happens during run-time. The customer expects to have a dashboard that shows today's total count of issues from all machines in real-time. The customer is also expecting that a timestamp of each issue can be logged (regardless of the machine). Similar use cases might be to: Show the total product counts from each sub-line in the current shift. Show the total rentals of bicycles from all remote sites. Show the total issues of distant cash machines across the country. Modeling Thing: DashboardCounter, which includes: 1 Property: name:counter, type:integer, logged:true, default value:0 3 services: IncreaseCounter(): increase counter value 1 GetCounter(): return current counter value ResetCounter(): set counter value to 0 1 Subscription: a subscription to the data change event of the property counter, which will print the new value and timestamp to the log. GetCounter var result = me.counter; IncreaseCounter me.counter = me.counter + 1; var result = me.counter; ResetCounter me.counter = 0; var result = 0; Subscription MonitorCounter Logger.info(eventData.newValue.value+":"+eventData.newValue.time.getTime()); ValueStream For simplicity, the value stream entity is not included in the attachment. Please go ahead and assign a value stream to this Thing to monitor the property values. Test Tool A small test tool mulreqs is attached here, along with some extensions and ThingWorx entities that are useful. The mulreqs tool uses a configuration file from the OS variable definition MULTI_REQUEST_CONFIG. In Linux/MacOS: export MULTI_REQUEST_CONFIG="./config.json" in config.json file, you can use the following configuration: { "host":"twx85.desheng.io", "port":443, "protocol":"https", "endpoint":"/Thingworx/Things/DashboardCounter/services/IncreaseCounter", "headers":{ "Content-Type":"application/json", "Accept": "application/json", "AppKey":"5cafe6eb-adba-41df-a7d6-4fc8088125c1" }, "payload":{}, "round_break":50000, "req_break":0, "round_size":50, "total_round":20 } host, port, protocol, headers are very identical to define a ThingWorx server. endpoint defines which service is called during the test. payload is not in use at this moment but you have to keep it here. total_round is how many rounds of the test you want to run. round_size defines how many requests will be sent simultaneously during each round. round_break is the pause time during each round in Microseconds, so 50000 in the above example means 50ms. req_break is 0, this is the delay between requests. "0" means requests to the server will happen simultaneously. The expectation from the above configuration is service execution a total of 20*50 times, 1000 times. So, we can expect that if the initial value is 0, then counter should be 1000 at the end, and if the value stream is clean initially, then the value stream should have a history from 1 to 1000. Run Test Use the following command to perform the test: .<your path>/mulreqs Execution output will look like: Check Result You will be surprised that the final value is 926 instead of 1000. (Caution: this value will be different in different tests and it can be any value in the range of 1 and 1000). Now, look at the value stream by using QueryPropertyHistory. There are many values missing here, and while the total count can vary in different tests, it is unlikely to be exactly the last value (926). Notice that the last 5 values are: 926, 925, 921, 918. The values 919, 920, 922, and 923 are all missing. So next we check if there are any errors in the script log, and there are none. There are only print statements we deliberately placed in the logs. So, we have observed two symptoms here: The final value from property counter doesn't have the expected value. The value stream doesn't have the expected history of the counter property changes. What's the reason behind each symptom, and which one is a thread-safe issue? Understanding Timestamp Granularity ThingWorx facilitates the collection of time series data and solutions centered around such data by allowing for use of the timestamp as the primary key. However, a timestamp will always have a minimal granularity definition when you process it. In ThingWorx, the minimal granularity or unit of a timestamp is one millisecond. Looking at the log we generated from the subscription again, we see that several data points (922, 923, 924, 925) have the same timestamp (1596089891147), which is GMT Thursday, July 30, 2020, 6:18:11.147 AM. When each of these data points is flushed into the database, the later data points overwrite the earlier ones since they all have the same timestamp. So, data point 922 went into the value stream first, and then was overwritten by data point 923, and then 924, and then 925. The next data point in the value stream is 926, which has a new timestamp (1596089891148), 1ms behind the previous one. Therefore, data points 925 and 926 are stored while 922, 923, 924 are not. These lost data points are therefore NOT a thread-safe issue. The reason why some of these data points have the same timestamp in this example is because multiple machines write to the same value stream. The right approach is to log data points at the individual machine level, with a different value stream per machine. However, what happens if one machine emits data too frequently? If data points from the same machine still have a timestamp clash issue, then the signal frequency is too high. The recommended approach would be to down-sample the update frequency, as any frequency higher than 1000Hz will result in unexpected results like these. Real Thread Safe Issue from Demo Use Case The final value of the counter being an arbitrary random number is the real thread-safe coding issue. if we take a look at the code again: me.counter = me.counter + 1; This piece of code can be split into three-piece: Step 1: read current value of me.counter Step 2: increase this value Step 3: set me.counter with new value. In a multi-threaded environment, not performing the above three steps as a single operation will lead to a race issue. The way to solve this issue is to use a locking mechanism to serialize access to the property, which will acquire the lock, perform the three operations, and then release the lock. This can be done using either the Java Extension or the database thing to leverage the database lock mechanism. Use Java Extension to Handle Thread Safe Challenge This tutorial assumes that the Eclipse plug-in for ThingWorx extension development is already installed. The following will guide you through creating a simple Java extension step by step: Create a Java Extension Project Choose the minimal ThingWorx version to support and select the corresponding SDK. Let's name it JavaExtLocker, though it’s best to use lower-case in the project name. Add a ThingWorx Template in the src Folder Right-click the src folder and a a Thing Template. Add a Thing property Right click on the Java source file created in the above step and click the menu option called Thingworx Source, then select Add Property. Add Three Services: IncreaseCounter, GetCounter, ResetCounter Right click the Java source file and select the menu option called Thingworx source, then select Add Service. See above for the IncreaseCounter service details. Repeat these same steps to add GetCounter and ResetCounter: (Optionally) Add a Generated Serial ID Add Code to the Three Services @SuppressWarnings("deprecation") @ThingworxServiceDefinition(name = "IncreaseCounter", description = "", category = "", isAllowOverride = false, aspects = {"isAsync:false" }) @ThingworxServiceResult(name = "Result", description = "", baseType = "INTEGER", aspects = {}) public synchronized Integer IncreaseCounter() throws Exception { _logger.trace("Entering Service: IncreaseCounter"); int current_value = ((IntegerPrimitive (this.getPropertyValue("Counter"))).getValue(); current_value ++; this.setPropertyValue("Counter", new IntegerPrimitive(current_value)); _logger.trace("Exiting Service: IncreaseCounter"); return current_value; } @ThingworxServiceDefinition(name = "GetCounter", description = "", category = "", isAllowOverride = false, aspects = {"isAsync:false" }) @ThingworxServiceResult(name = "Result", description = "", baseType = "INTEGER", aspects = {}) public synchronized Integer GetCounter() throws Exception { _logger.trace("Entering Service: GetCounter"); int current_value = ((IntegerPrimitive)(this.getPropertyValue("Counter"))).getValue(); _logger.trace("Exiting Service: GetCounter"); return current_value; } @SuppressWarnings("deprecation") @ThingworxServiceDefinition(name = "ResetCounter", description = "", category = "", isAllowOverride = false, aspects = {"isAsync:false" }) @ThingworxServiceResult(name = "Result", description = "", baseType = "INTEGER", aspects = {}) public synchronized Integer ResetCounter() throws Exception { _logger.trace("Entering Service: ResetCounter"); this.setPropertyValue("Counter", new IntegerPrimitive(0)); _logger.trace("Exiting Service: ResetCounter"); return 0; } The key here is the synchronized modifier, which is what allows for Java to control the multi-threading to prevent data loss. Build the Application Use 'gradle build' to generate a build of the extension. Import the Extension into ThingWorx Create Thing Based on New Thing Template Check the New Thing Property and Service Definition Use the Same Test Tool to Run the Test Again { "host":"twx85.desheng.io", "port":443, "protocol":"https", "endpoint":"/Thingworx/Things/DeoLockerThing/services/IncreaseCounter", "headers":{ "Content-Type":"application/json", "Accept": "application/json", "AppKey":"5cafe6eb-adba-41df-a7d6-4fc8088125c1" }, "payload":{}, "round_break":50000, "req_break":0, "round_size":50, "total_round":20 } Just change the endpoint to point to the new thing. Check the Test Result Repeat the same test several times to ensure the results are consistent and expected (and don't forget to reset the counter between tests). Summary of Java Extension Approach The Java extension approach shown here uses the synchronized keyword to thread-safe the operation of several actions. Other options are to use a ReentryLock or Semaphore locker for the same purpose, but the synchronized keyword approach is much cleaner. However, the Java extension locker will NOT work in 9.0 horizontal architecture since Java doesn't a have distributed locker. IgniteLocker wouldn't work in the current horizontal architecture, either. So if using a thread-safe counter in version 9.0+ horizontal architecture, then leverage the database thing, as discussion below.

Nov 30, 2020

Load Testing through Remote Device Simulation Designing an enterprise-ready application requires extensive testing and quality assurance. This includes all sorts of tests, of course, from examining the user interface for flaws to verifying there is correct logic in all background services. However, no area of testing is more important than scalability. Load testing is how to test the application to ensure it still functions as desired when remote things are connected and streaming information to the Platform. Load testing is considered a critical component of the change management process. It is mentioned numerous times throughout PTC best practice documentation. This tutorial will step you through designing a load test using Kepware as a simulator. Kepware is free to download and use in short demos, making it the perfect tool for this type of test. Start by acquiring the latest version of Kepware from the download site. Click “Download Free Demo” if a license was not included in your PTC product package. The installation of Kepware is simple, and for details, see the Kepware Installation Guide. The tutorial shown here uses Kepware version 6.7 and ThingWorx version 8.4.4. Given that we are testing a ThingWorx application, this tutorial assumes ThingWorx is already installed and configured correctly. Once Kepware is installed, follow these steps: (This tutorial was developed by Desheng Xu and edited by Victoria Tielebein. Exact specifications of the equipment used in both large scale and local tests are given in step VI, which discusses the size of the simulation) Understand how to configure Kepware as a simulator Go to the Help menu within Kepware, and click on “Driver Help” Select “Simulator” in the pop-up window, and click “OK” Expand “Address Descriptions” and then “Simulation Functions” Select “Ramp Function” to review details about the function needed for this tutorial, as well as information about function syntax Close the window once this information has been reviewed Create a new project in Kepware Click “File” > “New” In case you are connected to runtime, Kepware will allow you to choose to edit this project offline Add a channel in Kepware Channels represent threads which Kepware will use to contact ThingWorx Under “Connectivity”, click “Click to add a channel.” From the drop-down list, select “Simulator” Use all the default settings, selecting “Next” all the way down to “Finish” Next, add one device to the channel Highlight the new channel and click “Click to add a device” (which will appear in the center of the screen) Once again, use the default settings, selecting “Next” all the way down to “Finish” Add a tag to this device Within Kepware, tags represent properties which bind to remote things on the Platform and update with new information over time. Each device will need several tags to simulate remote property updates. The easiest way to add many tags for testing is to create one, and then copy and paste it. Highlight the device created in the previous step and click “Click to add static tag”, which appears in the center of the screen For “name” type “tag1” For Address, enter the Ramp function: RAMP(1000,1,2000000,1) The first parameter is the update rate given in milliseconds The next two parameters are the range of values which can be sent The last parameter is the increment or step Together this means that every 1 second, this tag will send a new value that is 1 higher than the previous value to the Platform, starting at 1 and ending at 2 million Ensure the Data Type is given as “DWord” or any type which will be read as a “Number” (and NOT an “Integer”) on the Platform Change the Scan Rate to 250 Then click “OK” Add more devices to the test The most basic set-up is now done: if this project connected to the Platform, one remote thing with one remote property could be used to simulate property updates. That is not very useful for load testing, however. We need many more things than this, and many more properties. The number of tags on each device should match the expected number of remote properties in the application itself. The number of devices in each channel should be large enough that when more channels are created, the number of total devices is close to the target for the application. For example, to simulate 10,000 things, each with 25 remote properties, we need 25 tags per device, 200 devices per channel, and 50 channels. This would require a lot of memory to run and should not be attempted on a local machine. A full test of 40 channels each with 10 devices was performed as shown in the screenshots here. This simulates 10,000 writes per second to the Platform total, or about 400 remote device connections. This test used the following hardware specifications: Kepware machine running Windows 2016 64-bit, 2 cores, 8G ThingWorx Platform machine running Ubuntu 16.04, 4 cores, 16G PostgreSQL 9.6 machine running Ubuntu 16.04, 4 cores, 16G Influx 1.6.3 machine running Ubuntu 16.04, 4 cores, 16G A local test was also run on Windows 10 (64-bit), using the H2 database, with Kepware and ThingWorx running side by side on the machine, 4 cores, 16G. This test made use of only 2 channels, with 10 devices each. For local tests to see how the simulation works, this is fine, but a more robust set-up like the above will be needed in a true load test. If there is not enough memory on the machine hosting Kepware, errors like this will appear in the Kepware logs: One or more value change updates lost due to insufficient space in the connection buffer. Once you decide on the number of tags and devices needed, follow the steps below to add them. To add more tags, copy and paste the existing tag (ctrl+c and ctrl+v work in Kepware for convenience) until there as many tags as desired To add more devices, highlight the device in Kepware and copy and paste it as well (click on the channel before pasting) Then, copy and paste the entire channel until the number of channels, devices, and tags totals the desired load (be sure to click on “Connectivity” before hitting paste this time) Configure the ThingWorx connection Right click on Project in the left-hand navigation bar and in the pop-up window that appears, highlight ThingWorx Change the “Enable” field to “Yes” to activate the other fields Fill in the details for “Host”, “Port”, “Application Key”, and “Thing name” Note that the application key will need to be created in ThingWorx and then the value copied in here The certificate and encryption settings may also need to be adjusted to match your environment For local set-ups, it is likely that self-signed and all certificates will need to be accepted, so both of those fields will likely need to be set to “Yes” (Encryption may need to be disabled as well). In production systems, this should not be the case Save the project It doesn’t matter too much if this project is saved as encrypted or not, so either enter a password to encrypt the save or select “No encryption” Connect to ThingWorx Click “Runtime” > “Connect…” A pop-up will appear asking if you want to load this project, click “Yes” The connection status should then appear in the bottom portion of the window where the logs are displayed Configure in ThingWorx Login to the ThingWorx Platform Under “Industrial Connections” a thing should appear which is named as indicated in the Kepware configuration step above Click to open this thing and save it Also create a new thing, a value stream for ingesting data from Kepware Create remote things in ThingWorx Import the provided entity into ThingWorx (should appear as a downloadable attachment to this post) Open the KepwareUtil thing and go to the services tab Run the AutoKepwareCreate service to generate remote things on the Platform Give the name of the stream created above so each thing has a place to store property information The IgnoreTemplate flag should be set to false. This allows for the service to create a thing template first, which is then passed to the remote devices. The only reason this would be set to true is if the devices need to be deleted and recreated, but the template does not (then set the flag to true). To delete the devices, use the AutoKepwareDelete service also provided on the KepwareUtil thing Note that the AutoKepwareCreate service is asynchronous, so once it is executed, close the window and check the script logs to see when it completes. The logs will look like: KepwareUtil AutoKepwareCreate task finished!!! Check status of remote things Once the things are created, they should automatically connect to the Platform Run the TotalDeviceByTemplateWithTemplate service to see if the things are connected The template given here could be the one created by the AutoKepwareCreate service, or just give it RemoteThings if this is a small local set-up without many remote things on it The number of devices will equal the number of devices per channel times the total number of channels, which in the test shown here, is 400 isConnected will be checked if all of the devices are connected without issue If some of them are not connected, verify in the logs if there are any errors and resolve those before moving on View Ingestion Rate Once the devices are created, their tags should show as numbers (NOT integers), and they should already be updating with new values every second To view the ingestion rate, run the KepwareUtil service AutoKepwareRateSummary Give the thing template name that is created by the AutoKepwareCreate service, which will look like the name of the Kepware thing itself with a “T-“ in the front The start time should be close to the current time, and the periodInMinutes should be large enough to include some of the test (periodInMinutes is used to calculate the end time within the service) Note in the results here that the Average Write Per Second is only 9975 wps, which is close but not exactly what we would expect. This means that there are properties not updating correctly, which requires us to look at the logs and restart some things. If nothing shows up here, despite the Total Connected Things showing correctly, then look at the type of the tags on one of the remote devices. The type must be NUMBER for the query within this service to work, and not INTEGER. If the type of the tags is incorrect, then the type of the tags within Kepware was probably given as something which is not interpreted as a number in ThingWorx. Ensure DWord is used for the tags in Kepware Within the script log, look for any devices which show errors as seen in the image below and restart them to get their properties updating correctly Once the ingestion rate equals what is expected (in the case of the test here, 10,000 wps), use the AutoKepwareIngestionStat service on the KepwareUtil thing to see details about each remote device The TimeGapAvg in this service represents the gap between two ingestions in milliseconds, showing any lag that may be present between Kepware and ThingWorx The TimeGapSTD shows the standard deviation of the time gap between two ingestions on any given thing, also indicating lag (the lower this number, the better) The StartTime and EndTime show the first and last timestamp observed in the ThingWorx database during the given duration The totalCount shows the total number of ingested records during the sampling cycle The StartValue and EndValue fields show the first and last value ingested into the tag during the given duration If the ingestion rate is working as expected, and the ramp function is actually sending an update on time (in this case, once each second), then the difference between the EndValue and StartValue should always be equal to the totalCount plus 1. If this doesn’t match up, then there may be data loss or something else wrong with the property updates, which will show as a checked box in the valueException column. It is not enough to ensure that the ingestion rate is correct, as sometimes the rate may fluctuate only by 1 or 2 wps and appear perfect, even while some data is lost. That is why it is important to ensure that there are no valueException boxes showing as checked in the test of the application. If none of these are marked as having failed, then the test was successful and this ingestion rate is acceptable for the application This tutorial is a very basic way to simulate many remote devices ingesting data into the Platform. For this to be a true test of the application, the remote things created in this test will need to be given business logic tasks as well. The AutoKepwareCreate service can be modified to give any template (and not just RemoteThing) to the thing template which is created and subsequently passed into the demo devices. Likewise, the template itself can be created, and then manually modified to look like the actual remote device template in the application, before the rest of the things are created (using the IgnoreTemplate flag in the creation and deletion services, as discussed above). Ensure that events are triggered as expected and that subscriptions to property updates are in place on the thing template before creating the demo things. Make use of the subsystem monitor to ensure that the event, value stream, and stream queues do not grow so large that the Platform cannot keep up with the requests (for details about tuning the stream and value stream processing subsystems, see PTC’s best practice documentation). Also be sure to load some of the mashups to see how they perform while the ingestion test is happening. This will test whether or not the ingestion rate and business logic of the application can function side by side without errors, data loss, or performance issues.

Oct 28, 2019

Introduction to Digital Performance Management (DPM) Written by: Tori Firewind, IoT EDC “Digital Performance Management (DPM) is a closed-loop, problem solving solution that helps manufacturers identify, prioritize, and solve their biggest loss challenges, resulting in reduced cost, increased revenue, and improved service levels.” – DPM Help Center What is DPM? Digital Performance Manager (DPM) is an application which improves factory efficiency across a variety of different areas, namely “the four P’s” of Digital Transformation: products, processes, places, and people. Each performance issue in a factory can be mapped to at least one of these improvement categories in a new strategy for Continuous Improvement (CI) founded by PTC. Figure 1 – Each performance issue in a factory can be mapped to at least one of 4 fundamental improvement categories: products, processes, places, and people. PTC’s new, industry-leading strategy for continuous improvement (CI) in factories is a “best practice” approach, taking the collective knowledge of many customers to form a focused, prescriptive path for success. 11 Closing the Loop Across Products, Processes, People, and Places, Manufacturing Leadership Journal At PTC, CI in factories is driven by a “best practice” approach, with years of experience in manufacturing solutions combining with the collective knowledge of the many diverse use cases PTC has encountered, to generate a focused, prescriptive path for improvement in any individual factory. Figure 2 – DPM is a closed loop for continuous improvement, a strategy built around industry standard best practices and years of experience. PTC is also defining new industry standards for OEE analysis by using time as a currency within DPM. This standardization technique improves intuitive impact assessment and allows for direct comparison of metrics (see the Help Center for details on how each metric is calculated). DPM creates a closed loop for CI, from the monitoring phase performed both automatically and through manual operator input, to the prioritization and analyzation phases performed by plant managers. DPM helps plant managers by tracking metrics of factory performance that often go overlooked by other systems. With Analytics, DPM can also do much of the analysis automatically, finding the root causes much more rapidly. Figure 3 – All levels of the company are involved in solving the same problems effectively and efficiently with DPM. Instead of 100 people working on 100 different problems, some of which might not significantly improve OEE anyway, these same 100 people can tackle the top few problems one at a time, knocking out barriers to continuous improvement together. Production supervisors who manage the entire production line then know which less-than-effective components on the line need help. They can quickly design and redesign solutions for specific production issues. Task management within DPM helps both the production manager and the maintenance engineer to complete the improvement process. Using other PTC tools like Creo and Vuforia make the path to improvement even faster and easier, requiring less expert knowledge from the front-line workers and empowering every level of participation in the digital transformation process to make a direct, measurable impact on physical production. How Does DPM Work? DPM as an IoT application sits on top of the ThingWorx Foundation server, a platform for IoT development that is extensible and customizable. Manufacturers therefore find they rarely have to rip and replace existing systems and assets to reap the benefits of DPM, which gathers, aggregates, and stores production data (both automatically and through manual input on the Production Dashboard), so that it can be analyzed using time as a currency. DPM also manages the process of implementing improvements (using the Action Tracker) based on the collected data, and provides an easy way to confirm that the improvements make a real difference in the overall OEE (through the Performance Analysis Dashboard). Because the analysis occurs before and after the steps to improve are taken, manufacturers can rest assured that any resources invested on the improvements aren’t done so in vain; DPM is a predictive and prescriptive analysis process. DPM makes use of an external SQL Server to run queries against collected data and perform aggregation and analysis tasks in the background, on a separate server location than the thing model and ingestion database. This ensures that use cases involving real-time alerts and events, high-capacity ingestion, or others are still possible on the ThingWorx Foundation server. The IoT EDC is focusing in on DPM alone for a series of technical briefs which provide insight and expert level recommendations regarding DPM usage and configuration. Stay tuned into the PTC Community for more updates to come.

Feb 28, 2022

Announcing the Final Installment JMeter for ThingWorx, the Comprehensive Guide and Best Practice Tips This is the final post on using JMeter for ThingWorx. Below there are best practice tips for using JMeter and for load testing in general. Attached to this post is a comprehensive guide including all of the information from every post we've made on JMeter, including the tutorials. For a more central source, feel free to download the guide , or see the past posts here: JMeter for ThingWorx (original post) Building More Complex Tests in JMeter Distributed Testing with JMeter Generating and Reviewing JMeter Results JMeter Best Practice Tips Use Distributed Testing As already mentioned in a previous post, each JMeter client can only handle about 150-250 threads depending on the complexity of the tests, and each client will need around 1 CPU and up to 8 GB of RAM for the Java heap. Some test plans will run with fewer host resources, so resizing the test client VM up or down is often required during test development. Create a batch or shell script to start the multiple JMeter clients for greater ease of use. Use Non-Graphical Mode Non-graphical mode allows the system to scale up higher; client processing uses up resources just to keep the simulation running, but with graphical mode turned off, there is less of an impact on the response times and other results. Graphical mode is essentially only used for debugging. Turn off Embedded Resources This setting reloads all of the typically cached requests over and over; there will be far more download requests, and to the exclusion of other requests, than is helpful. Ensure this box is not checked, especially in the HTTP Requests Defaults element: Browser caching means that this setting doesn’t actually simulate a proper user load, given that many of the reloaded resources would not be reloaded by actual users. Use this incrementally, for one or two HTTP requests only, if there is a reason why those requests might need to download fresh images, scripts, or other resources with each call; for instance, simulate page timeouts using this once per hour or something similar. Using this across the whole project will prevent it from scaling well, while not actually simulating real-world conditions. Avoid Using Listeners For instance, the “View Results Tree”, which uses additional resources that may impact the results in disingenuous ways, based around the needs of the clients themselves and not the actual response times of the server. Many listeners are only for debugging a handful of threads while designing the tests. A list of recommended listeners for different purposes is in JMeter documentation. Summary Report is the only one you want enabled, as that exports the results as a csv or similarly formatted file, which can then be used to build reports. JMeter CAN handle SSO JMeter can authenticate into and test an SSO-enabled system. Sometimes the SSO configuration is essential for customers, and they may be quick to assume therefore that they cannot use JMeter, but that's not entirely true. Some external tools that might help with this are BlazeMeter (mentioned again in just a moment) and Fiddler, a good tool for decoding what data a particular SSO setup is exchanging during the authentication process. Use Logic Controllers for Parametrization Parametrization is critical to mirroring a proper user load, and allowing different data sets to be queried or created; the load should seem organic, random in the right ways, with actions occurring at random times, not predictable times, to prevent seeing artificial peaks of usage that don’t represent real usage of the Foundation server. Random order controllers direct the threads down different paths based on random dice rolls, allowing for a randomized collection of user activity each time, not something that has to be regenerated like a set of Boolean values that is specified in an input CSV and used to navigate a series of true or false switches. Switches just look for an environment variable to be either 1 or 0, and when it hits a switch that’s a 1, it triggers the switch below, running them in the order given under the transaction controller that goes with the switch. In this image, the 1’s and 0’s are given in the CSV input file; randomizing that input file therefore randomizes the execution of the switches too: Use Commercial Add-Ons There are many external, add-on tools and plugins which enhance JMeter’s capabilities. One external tool that can enhance JMeter’s capabilities is Blazemeter, which has some free and some paid options to help create better reports, removing automatically much of the “garbage” REST calls (which would otherwise need to be manually deleted), and provide more consumable test reports right out of the box. Other tools and plugins include: Maven Netbeans SonarQube Jenkins Autometer Gradle Amazon EC2 Lightning IntelliJ IDEA Cassandra Grafana For more best practice information, see the JMeter Best Practice Manual. General Load Testing Guidelines Concurrency Requirements – How to Properly Estimate the Size of the Load Test Take a brand new ThingWorx-based app. How people will be accessing the system and how often? How many are business users? How many are engineers? What do they do? Many assume that every named user in the corporate LDAP will need to access to the server, often 10s of thousands of users; this generally drastically oversizes the system. Load testing for many thousands of users is very hard and requires a lot of set-up, tuning, and optimization to get right; so if it seems that thousands of users are expected, then validating this claim is important: most customers don’t really have that many concurrent users in an engineering system. Use estimates based on how many people work at which offices, which time zones those people are in, and what kinds of users they are. Do they need access to engineering data? Perhaps there are simpler mashups for them that uses less resources. One tool for these sorts of estimations that PTC offers is the office time zone overlap Windchill Sizing Calculator (shown here) Other ways to estimate include: Analyzing the business processes, things like how long workloads typically take to complete and how many workloads are generated per day, converted into hour, minute, or second as desired for the peak duration, the length of the test. “Day in the Life” modelling, or considering things like “what does user X do in a day?” Maybe, user X checks out some drawings, edits them, and then checks them back in at 4:30. Maybe user Y actually digs into the underlying parts and assemblies, putting in change requests or orders throughout the day, instead of waiting for the end. Models are made based around the types of users. Also consider: What are worst case scenarios? What are the longest running activities? What produces the largest data transfers? What activities have large, heavy data base queries? When is the peak overlap of usage? Beginning and end of day downloads and check ins? Reports that are generated regularly? How do these impact the foreground users? For a simpler estimate, start with a percentage of the named user count, anywhere from 5-15% is a good ballpark percentage. Don’t overestimate to feel like the application has been financially worth it; even if everyone is logged in and using it all at once, which is unlikely, load testing for every single user doesn’t take into account the fact that people pause in between clicking on things to think, type emails, get coffee, and so forth. Fewer people than expected are actually doing concurrent activities like loading web pages and updating data streams. Whenever possible, use concurrency data from existing customer systems to guide the estimate for the new system. Legacy system are great places to start. Use Grafana to monitor the system side throughout the load test, which is also required to know the test has been successful; also set up Grafana to monitor the application once it goes live, to both prevent and mitigate more rapidly any technical issues with the server. Also remember that PTC Technical Support is here to help! Provide thread dumps with an open case to any TSE, and they will help troubleshoot the tests and review any errors in the ThingWorx or Tomcat logs.

Oct 22, 2020

ThingWorx Monitoring and Alerting, Part 1 Using Prometheus and Grafana By Tori Firewind, IoT EDC Introduction and Getting Started As ThingWorx has become a more mature product during the lifetime of the IoT EDC, so too have our dev ops recommendations. As we’ve stated throughout many posts now, testing is a key part of ensuring enterprise readiness, and it occurs at every stage of the process: from unit testing to preserve individual service logic, to integration tests which preserve the functionality of the application as a whole, to user and edge load testing and user experience testing, which ensure enterprise readiness. So testing is a critical component, but the process of dev ops never stops. In order to effectively test the system, a comprehensive monitoring solution is also required. Once the application is tested and the changes pushed into production, there is no knowing with certainty that everything will run smoothly indefinitely. Random spikes in usage, server bandwidth or availability, any unforeseeable factors like these can come along and cause issues for a system. If these issues aren’t detected and addressed early, then they can very rapidly morph into much larger problems: outages, data loss, inflated data tables which are hard to revert due to their size. It is critical to detect performance issues on a system as early as possible, to have as much information as is necessary to figure out where the problem is heading, and what may have started it. Monitoring is key to a healthy system. CI/CD stands for “Continuous Integration/Continuous Deployment”, a never-ending cycle of improvement. Testing just once before the initial go live isn’t enough. Each system should have automated tests that run continuously, as well as monitors and alerts which reveal problems sooner. Diagnostic tools play a role as well, being the bridge from the end of the dev ops process cycle back to the beginning (monitoring into planning). A good CI/CD dev ops process will ensure that problems are found earlier, fixed more rapidly, and fixed for everyone using the system. In a fully mature dev ops pipeline, issues are anticipated, discovered and researched before they become production outages or critical issues. These investigations or testing follow-ups produce development tasks (usually bugs, but also features at times) which then start the dev ops cycle all over again. This is why a good, efficient dev ops pipeline is needed, one which allows changes to quickly and safely go from development to production. This is also why diagnostic tools play a role in the monitoring piece of the dev ops process. They are the bridge between monitoring and planning. Tools like Dynatrace can be configured to provide call stacks and take thread dumps when issues start to occur, before the system is performing so poorly it needs a restart, which happens automatically in a cluster and can clear out any trace of the issue. Thread dumps are often necessary to diagnosing the root cause of the issue (to permanently fix it), and doing so quickly ensures application stability and availability. That is, after all, the purpose of the dev ops process. Diagnostics is therefore an equally important piece of the dev ops Figure-8-shaped pie, and one which deserves its own spotlight in an article to come. Every piece of the dev ops process must be viewed as equally important in its own way, lest the dev ops cycle get hung up on bottlenecks of its own. A safe and stable system is not one which never experiences issues, it is one which has a good, efficient plan in place to handle recovery and prevention of repetition. A wholesome dev ops process is a happy dev ops process. The Monitoring Stack There are many monitoring options available, but in our experience one of the easiest and most effective monitoring stacks to use with ThingWorx is Prometheus for metrics gathering with Grafana for metrics analysis and review. In a mature monitoring stack, Telegraf is also commonly installed on each VM/host to gather the system metrics (like CPU and Memory usage, things we’ve stated are good metrics of system performance and stability in past articles on scale and size testing) and output them in Prometheus format. Prometheus is a highly scalable open-source monitoring framework that contains out of the box monitoring and alert capabilities for Kubernetes-based deployments (not covered in this article). Using Prometheus is very simple because the ThingWorx application exposes a metrics endpoint which is formatted directly for use by Prometheus. There is also built-in alerting in Prometheus, but not the ability to form dashboards for reviewing data or screenshotting it for documentation purposes. That’s where Grafana comes into play. Grafana has a preconfigured Prometheus-type data source and many preconfigured dashboard templates for various applications and services. Telegraf is also easily imported into Grafana, as is shown in the section below. The Prometheus targets in the larger diagram are expanded out on the left. For each target, some tool exports the data in a syntax which Prometheus can scrape. For VMs, this can be Telegraf, for Kubernetes, the Node Exporter. JVM has a JMX Exporter, and other tools like CX Server use Graphite. Many apps already have a Prometheus endpoint built-in, like ThingWorx and Zookeeper. Telegraf is not strictly necessary; the node exporter can also be used on VMs, but Telegraf is the more common choice since it is a more mature dev ops tool. Once Prometheus is scraping the targets, alerting on them can be done with OOTB Prometheus functionality, and dashboards for monitoring can be made easily in Grafana (with built-in support as well). This stack does not include the diagnostics piece, something which triggers thread dumps or the like when issues do occur. There are too many ways to conduct a successful diagnostic piece to cover here. How to Get Started Getting started monitoring a ThingWorx application is incredibly easy in the latest versions. Simply open up a browser, and type in the ThingWorx URL, followed by “/Metrics”. At this endpoint, there is a specially formatted response that can automatically be read by the Prometheus monitoring software which contains subsystem and service data. In addition to the application metrics, Prometheus can be configured to collect metrics from a node exporter at the (virtualized) operating system or container (Kubernetes) level as well. If you haven’t already, install Grafana, install Telegraf as a service, and install Docker Desktop. These are the tools required (in addition to ThingWorx of course) to set-up a simple sandbox system for familiarization with the monitoring stack recommended by PTC. The easiest way to try Prometheus on a local Windows instance is to use Docker. The command for that will be found below, but first open up Docker Desktop to set contextual parameters that the command line will need. Then, modify the configuration file for Telegraf or create one (called telegraf.conf in the same folder as the exe file), and put the following into the file (or uncomment it; the default config file has thousands of lines, so just search for “prometheus”): Output plugin [[outputs.prometheus_client]] listen = "0.0.0.0:9125" Alternatively, install the Prometheus Node Exporter tool, which will likely require some additions to the Prometheus config file (not covered here) which we are about to create. Then, create a configuration file (called prom_config_localhost_scraper.yml in the command to come), add the following (assuming a standard localhost installation of ThingWorx): # my global config global: scrape_interval: 45s evaluation_interval: 30s scrape_timeout: 30s # scrape_timeout is set to the global default (10s). rule_files: - prom_config_rules.yml scrape_configs: - job_name: thingworx static_configs: - targets: ['host.docker.internal:8080'] basic_auth: username: "Administrator" password: "admin!123456789" metrics_path: /Thingworx/Metrics scheme: http params: x-thingworx-session: - "false" - job_name: prometheus static_configs: - targets: ['localhost:9090'] - job_name: Telegraf # If telegraf is installed, grab stats about the local # machine by default. static_configs: - targets: ['host.docker.internal:9125'] This example script file uses the host.docker.internal instead of localhost for the server target for ThingWorx because it is running outside of the Docker container which contains Prometheus. This yml file configures Prometheus to monitor both ThingWorx and itself, as well as the server metrics coming from Telegraf (as long as they are configured to push). It’s a sandbox-only configuration, really, as you wouldn’t want to use the Administrator user, or have the password printed in plain text in the config file in a real system. Also note the need for the x-thingworx-session parameter, as runaway sessions which spawn every 30s or so (whatever the scrape interval is) will result in memory issues over time (so we don’t want to use sessions here). The rules file given here (prom_config_rules.yml) needs to be created separately. This is where all of the alert rules will be defined. This will determine if an alert state is happening, but without configuring the alert manager, there won’t be any notification. That isn’t covered here but is covered extensively in the Grafana docs. Here is an alert example: groups: - name: alert.rules rules: # Alert for any instance that is unreachable for >5 minutes. - alert: HighMemory expr: mem_used > 14000000 for: 1s labels: severity: page annotations: summary: "High Memory" description: "Localhost Memory Usage is High" Now, save these files and use Powershell to run the Docker container: docker run -p 9090:9090 -v C:\<path_to_document>\prom_config_localhost_scraper.yml:/etc/prometheus/prometheus.yml prom/prometheus It should download Prometheus and install it in that container (if this is the first time), allowing you to very rapidly deploy it to an endpoint of localhost:9090 by default. If there is an error like the one shown below, this means that you forgot to start Docker Desktop (the application) before opening Powershell. Docker Desktop sets system parameters required for containers to run in a command line (in Linux, it should work if Docker is installed for use by the command line, simple as that). The localhost endpoints are accessible in a browser. ThingWorx defaults to localhost:8080 endpoint. Prometheus defaults to localhost:9090. Telegraf is on port 9125. Open any of these in a browser tab to see the full monitoring stack. You can see easily if Prometheus is working by clicking “Status” > “Targets” at localhost:9090: If all of the targets appear as blue and say “last scrape” and a time stamp, then they’re working as expected. If they don’t, ensure you have the right ports, that there aren’t any firewall issues (if things aren’t all on localhost), and that everything is running without errors. The last step in the process here is to install a dashboard tool like Grafana. Once this is installed and running on localhost:3000 (by default), you can display the data from Prometheus with a few configuration steps the Grafana UI. Highlight over the settings icon in the bottom left of the screen, and then click on “Data sources”. Select the “Add data source” button, and then click on Prometheus. You have to type the URL again (localhost:9090), but most of the defaults will be ok here, and all you have to do is click “Save and test”. Now both targets should appear within Grafana, with their metrics showing up throughout the Grafana UI. This data source is what allows for the building of monitoring dashboards.

Dec 23, 2022

Remote Monitoring of Assets in Connected Factories As stated in the previous reference benchmark, one of the missions of the IOT Enterprise Deployment Center (EDC) is to showcase how real-world IOT business problems are solved. Our goal is that these benchmarks can be used as a reference or baseline for architects working on their own implementations, showing not only a successful at-scale implementation, but also what happens when that same implementation is pushed to, or even past, it's limits. The second in this series is attached here, this time reflecting a Connected Factory implementation. ThingWorx was deployed alongside Kepware Server, with the numbers of things, the number of properties, and the write rate for those properties being varied to once again test the capabilities of a remote monitoring use case, but this time in a Connected Factory setting. The business logic was kept simple to ensure it was not the limiting factor, as the throughput between Kepware Server and ThingWorx was pushed to the limit. See first hand the capabilities of Kepware Server and ThingWorx Foundation to handle implementations centered around real-time data reporting More Connected Factory implementations will be added to this document in time, with multiple Kepware Server deployments and other scenarios to come. Please feel free to use this community post to ask any questions about our approach and discuss any design, deployment, and simulation factors.

Jul 31, 2020

ThingWorx DevOps with Azure: The Comprehensive DevOps Guide Written by Tori Firewind, IoT EDC As promised in a previous post, attached here is a comprehensive guide to DevOps in ThingWorx, including tutorials and instructions for creating a continuous integration, continuous deployment (CI/CD) process for application development. There are also updated scripts and entities attached, including an entire sample application for importing, exporting, and testing an application in ThingWorx. From Docker and Github to Azure DevOps and Solution Central, this guide has it all. Learn how to perform your role in the DevOps process whether an administrator or a developer, automate your deployments and testing, and create a more efficient process for publication changes to production. A complete DevOps process like this really does facilitate faster and easier updates with fewer risks, fewer delays, and a better pathway to success.

Nov 18, 2021

IoT Tips

ThingWorx DevOps

Live Expert Session: Thingworx Flow Overview is tomorrow (December 10th)!!!

ThingWorx Performance Monitoring with Grafana

IOT EDC Reference Benchmark - Remote Monitoring of Assets

Developing Great IoT Solutions - A Best Practice Guide

Persistent vs. Logged Properties

JMeter for ThingWorx

Why Use InfluxDB in a Small ThingWorx Application

ThingWorx DevOps with Jenkins

Load Testing through C SDK Remote Device Simulation in ThingWorx

Custom Solution Performance Metrics, Right Inside ThingWorx!

Top 5 Monitoring Best Practices

Generating and Reviewing JMeter Results

Introducing the New and Improved DGIS Guide to ThingWorx Development

Thread Safe Coding in ThingWorx, Part 1: The Java Extension Approach

Load Testing through Remote Device Simulation using Kepware in ThingWorx

Introduction to Digital Performance Management (DPM)

Announcing the Final Installment: JMeter for ThingWorx Comprehensive Guide and Best Practice Tips

ThingWorx Monitoring and Alerting: Using Prometheus and Grafana, Part 1

IOT EDC Reference Benchmark - Remote Monitoring of Assets in Connected Factories

ThingWorx DevOps with Azure: The Comprehensive DevOps Guide

ThingWorx Learning Paths

Getting Started on the ThingWorx Platform Learning Path