cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - Need to share some code when posting a question or reply? Make sure to use the "Insert code sample" menu option. Learn more! X

IoT Tips

Sort by:
ThingWorx DevOps By Victoria Firewind,  IoT EDC This presentation accompanies a recent Expert Session, with video content including demos of the following topics:   found here!   DevOps is a process for taking planned changes through development, through testing, and into production,   where they can be accessed by end users.   One test instance typically has automated tests (integration testing) which ensure application logic is preserved in spite of whatever changes the developers are making, and often there is another test instance to ensure the application is usable (UAT testing) and able   to handle a production load (load testing).   So, a DevOps Pipeline starts with a task manager tasking out planned changes, where each task will become a branch in the repository. Each time a new branch is created, a new pipe is needed, which in this case, is produced by Docker Hub.   Developers then make changes within that pipe, which then flow along the pipe into testing. In this diagram, testing is shown as the valve which when open (i.e. when tests all pass) then   allows the changes to flow along the pipe into production.   A good DevOps process has good flow along the various pipelines, with as much automated or scripted as possible to reduce the chances for errors in deployments.   In order to create a seamless pipeline, whether or not it winds up automated, several third party tools are useful:       b                  Container software is a very good way to improve the maintainability and updatability of a ThingWorx instance, while minimizing the amount of resources needed to host each component.   n  1. Create Docker Image Consult the Help Center if need be. Update your YML file with everything you need before starting the image: see the example in the PTC community.   License the instance using the license management website. Follow the instructions from Docker for installing those tools: Docker itself (docker) and Docker Compose (docker-compose).   n   2. Save Docker Image in Docker Repo Docker Hub has some free options, and if a license is purchased,   can host more than a single Docker image and tag. It is also possible to set up your own Docker registry.           n 3. Access the image in Docker Desktop Download Docker Desktop and sign-in to the Docker account which hosts the repository.   Create some folders for storing the h2.env file and the ThingworxStorage and ThingWorxPlatform mounted folders.   Remember to license these containers as well. Developers login to the license management site themselves and put those into the ThingWorxPlatform mounted folder (“license_capability_response.bin”).         Git is a very versatile tool that can be used through many different mediums, like Azure DevOps or Github Desktop.  To get started as a totally new Git user, try downloading Github Desktop on your local machine and create a local repository with the provided sample code.    This can then be cloned on a Linux machine, presumably whichever instance hosts the integration ThingWorx instance, using the provided scripts (once they are configured).  Remember to install Git on the Linux machine, if necessary (sudo apt-get install git).     A sample ThingWorx application (which is not officially supported, and provided just as an example on how to do DevOps related tasks in ThingWorx) is attached to this post in a zip file, containing two directories, one for scripts and one for ThingWorx entities.   Copy the Git scripts and config file into the top level, above   the repository folder, and update the GitConfig.sh file with the URL for your Git repository and your login credentials. Then these scripts can be used to sync your Linux server with your Git repository, which any developer could easily update from their local machine. This also ensures changes are secure, and enables the potential use of other DevOps procedures like tasks, epics, and corresponding branches of code.     Steps to DevOps using the provided code as an example: Clone the repository into the SystemRepository or any other created repository, use the provided scripts in a Linux environment. Import the DeploymentUtilities entity, which again is scripted for Linux or for use with a development IDE with bash support. Then import the ThingWorx application from source control or use the script (which itself makes use of that DeploymentUtilities entity). Now create some local changes, add things, etc. and try out the UpdateApplication script or export to source control and then push to the Git repo. Data and localization table exports are also possible. Run the tests using the provided IntegrationTester thing or create your own by overriding the IntegrationTestTS thing shape, or use the TestTwxApplication script from a Linux terminal. Design a process for your application which  allows for easy application exports and updates to and from a repository, so that developers can easily send in their changes, which can then be easily loaded and tested in another environment.   In Conclusion: DevOps is a complex topic and every PTC customer will have their own process based around their unique requirements and applications. In the future, more mature pipeline solutions will be covered, ones that involve also publishing to Solution Central for easier deployment between various testing instances and production.        
View full tip
ThingWorx Performance Monitoring with Grafana authored by EDC team member Desheng Xu ( @xudesheng )   Monitoring ThingWorx performance is crucially important, both during the load testing of a newly completed application, and after the deployment of new code in an existing application. Monitoring performance ensures that everything works as expected at the Enterprise level.  This tutorial steps you through configuring and installing a tool  which runs on the same network as the ThingWorx instance. This tool collects data from the Platform and translates it into something visual and easy to understand via Grafana.    tsample is  small and customizable, and it plays a similar role to telegraf. Its focus is on gathering ThingWorx performance metrics. Historically, this tool also supported collecting OS level performance metrics, but this is no longer supported. It is highly recommended to collect OS level performance metrics by using telegraf, a tool designed specifically for that purpose (and not discussed here). This is not the only way to go about monitoring ThingWorx performance, but this tool uses a very good approach that has been proven effective both at customer sites and internally by PTC to monitor scale tests.   Find the most recent release here.   Recommended Deployment Architecture tsample can be deployed in the same box where ThingWorx Tomcat is running, but it's recommended to deploy it on a separated box to minimize any performance impact caused by the collector. tsample supports export to InfluxDB and/or local file. In this document, it is assumed that InfluxDB will be used for monitoring purpose. Please note that this is not the same instance of InfluxDB being used by ThingWorx (if configured). This article will not cover setting up InfluxDB or NGINX (if necessary), so please configure these before beginning this tutorial.   Supported Platform tsample has been tested on Windows 2016, MacOS 10.15, Ubuntu 16.04, and Redhat 7.x.  It's anticipated to work on a more general Ubuntu/Redhat/Mac/Windows release as well. Please leave a comment or contact the author, @xudesheng , if Raspberry Pi support is needed.    Configuration File Where to Store the Configuration File tsample will pick up the configuration file in the following sequence: from the command line...   ./tsample -c <path to configuration file>​     from the environment... Linux:   export TSAMPLE_CONFIG=<path to configuration file> ./tsample​   Windows:   set TSAMPLE_CONFIG=<path to configuration file> tsample.exe   from a default location... tsample will try to find a file with the name "config.toml " from the same folder in which it starts.   How to Craft a Configuration File You can use following command to generate a sample file:     ./tsample -c config.toml -e     or:     ./tsample -c config.toml --export       A file with the name "config.toml " will be generated with a sample configuration. You can then adjust its content in accordance with the following.   Configuration File Content Format Configuration file must be in toml format. title and owner sections Both sections are optional. The intention of these two sections is to support doc tool in future. TestMachine section This is section is required, and it defines where this tool will run.   thingworx_servers section This section is where you define targeted ThingWorx applications. Multiple ThingWorx servers can be defined with the same or different metrics to be collected.   thingworx_servers.metrics sections Underneath each thingworx_servers section, there are several metrics. In default example, following metrics have been included: ValueStreamProcessingSubsystem DataTableProcessingSubsystem EventProcessingSubsystem PlatformSubsystem StreamProcessingSubsystem WSCommunicationsSubsystem WSExecutionProcessingSubsystem TunnelSubsystem AlertProcessingSubsystem FederationSubsystem You can add your own customized metrics, as long as the result follows the same Data Shape. The default Data Shape has 3 columns: If the output Data Shape exceeds this limit, the tool will likely not work properly.   result_export_to_db section This section defines the target InfluxDB as a sink of collected performance metrics.   result_export_to_file section This section defines the target file storage for collected performance metrics.   Grafana Configuration Example Monitor Value Stream Step 1. Connect Grafana to InfluxDB   Step 2: Create a New Dashboard   Step 3. Create a New Query Depending on which metrics you defined to collect in the tsample configuration file, you will see a different choice of measurement in Grafana. Here, we will use ValueStreamProcessingSubsystem as an example.   Step 4. Choose the Right Platform and Storage Provider Some metrics depend on the database storage provider, like Value Stream and Stream.   Step 5. Choose the Metrics Figures   Select "remove" to get rid of the default 'mean' calculation. Select "non_negative_difference" from Transformations. Using this transformation, Grafana can show us the speed of writes.     Then, remove the default GROUP BY "time" clause. Assign a meaningful alias of this query.   Step 6. Add Another Query You can add another query as 'Value Stream Queued Speed' by following the same steps.   Step 7. Assign a Panel Title   Step 8. Review the Result Let's go back to the dashboard page and select "last 15 minutes" or "last 5 minutes" from the top right corner. It should show a result similar to the chart below.   Step 9. Save the Dashboard Don't forget to save your dashboard before we add more panels.   Step 10. Refine the Panel It's difficult to figure out the high-level write speed from the above panel, so let's enhance it. Add a new query with the following configuration: In the above query, there are two additional figures: 20s and 1m... How do you choose? 20s should be the same as sampling_cycle_inseconds in your tsample configuration file. If you choose a different value, then you could end up with misleading results. Larger values such as "1m" may give you a smoother result, but they could also hide system instability. Going larger than 1m is not recommended in most cases. With this new query, it's much easier to figure out what the average write speed in current testing is.   Tips: if your sampling_Cycle_inseconds is 30s, then you may not need this additional query. The following image is a sample at the 30s interval time. You would not need an additional average query to get a smooth write speed.   The next example is a sample at the 10s interval time. Without additional queries, you may not be able to get a meaningful understanding of the write speed. From the above three examples, it's recommended to configure the sampling interval time at 30s, or anything larger than 20s. You can then choose whether you need additional queries based on the visualization result.   Step 11. Further Refinement The above charts illustrate the queuing and writing speed. However, it is possible that the Value Stream may perform at a reasonable speed, but the Value Stream queue may be growing and could exceed its capacity. Let's add another query to monitor this: However, it is difficult to read this chart, since it has a different value range on the y-axis: Let's move this query to a second y-axis on the right: This will make the view much easier to see: The current queue size or remaining queue size will always move up and down; it is healthy as long as it does not continue to grow to a high level.   What Else Can Be Monitored? The following metrics would be monitored by most customers: Value Stream write and queue speed Value Stream queue size Stream write and queue speed Stream queue size Event performed speed (completedTaskCount) Event submitted speed (submittedTaskCount) Event queue size Websocket communication Websocket connection   ThingWorx Memory Usage Monitoring Create a new panel and add a new query: In a running system, memory usage will always move up and down - at times sharply or quickly - when the system is busy. The system is healthy as long as memory doesn't go up continuously or stay at a maximum for a long period of time.   Conclusion Setting up monitoring is absolutely crucial to managing the performance of an enterprise ThingWorx application. Using Grafana makes tracking and visualizing the performance much easier. Stay tuned to the EDC tag for more monitoring tips to come!
View full tip
Remote Monitoring of Assets Benchmark   As @ttielebein introduced previously, one of the missions of the IOT Enterprise Deployment Center (EDC) is to publish benchmarks that showcase the ThingWorx Platform deployed to solve real-world IOT business problems.    Our goal is that these benchmarks can be used as a reference or baseline for architects working on their own implementations... showing not only a successful at-scale implementation, but also what happens when that same implementation is pushed to ...or even past... it's limits.   Please find the first installment attached - a reference benchmark demonstrating ThingWorx deployed to monitor 15,000 assets with a high-volume of data properties per asset.  Over 250 hours of simulations were conducted as part of producing this benchmark.   The IOT EDC team will be monitoring this post (as well as our other posts in the IOT Tech Tips forum) to answer any questions we can about the approaches taken in designing, deploying and simulating this implementation.    As the team will publish more benchmarks like this will be published in the future, we also greatly value any feedback you have that can help us to improve the content for future documents.
View full tip
Developing Great IoT Solutions Brought to you once again by your EDC team, find attached here a brand-new, comprehensive overview of ThingWorx best practices! This guide was crafted by combining all available feedback, from support cases to PTC Community threads, and tapping all internal resources. Let this guide serve to bridge the knowledge gaps ThingWorx developers most commonly see.    The Developing Great IoT Solutions (DGIS) Guide is a great way to inform both business and technically minded folks about the capabilities of the ThingWorx Platform. Learn how to design good solutions from a high-level, an overview designed specifically with the business audience in mind. Or, learn how to implement good IoT designs through a series of technical examples. Start from very little knowledge of the Platform and end up understanding data structures and aggregation, how to use the collection widget, and how to build a fully functional rules engine for sending and acknowledging alerts in ThingWorx.   For the more advanced among us, check out the Appendix. Find here a handy list of do's and don'ts surrounding ThingWorx best practice in development, with links to KCS, Help Center, and Community content.   Reinforce your understanding of the capabilities of the ThingWorx Platform with this guide, today!   A big thanks to all who were involved on this project! Happy developing!
View full tip
Hi All   Our expert session: Thingworx Flow Overview is tomorrow!!! Click the link below to register and remember to talk about it to colleagues that might benefit from its content.   Expert Session: Thingworx Flow Overview Date and Time: December 10th, 8h00 EST Duration: 1 hour Host: Antony Moffa; Vinay Vaidya - Thingworx IoT Platfom Senior Directors Registration Here: https://www.ptc.com/en/customer-success/expert-sessions-for-thingworx-foundation-webcasts    See you there!   Here are other upcoming sessions that might be of your interest: Upgrade to Thingworx 9 – How to Plan / Evaluate Impacts This session will highlight the key points you should evaluate to properly plan your upgrade to Thingworx 9 Register Here Active Active Clustering This session will cover the main aspects of the High Availability Clustering feature launched with the ThingWorx 9.0 release Register Here
View full tip
Persistent vs. Logged Properties By Mike Jasperson, VP of IoT EDC   Executive Summary ThingWorx provides several different “aspects” (or storage options) for how property values are saved.  These options each have different implications for performance and scalability.  Understanding those implications is important for designing a scalable IOT solution.   Persistent Properties are best used for non-telemetry data which will change infrequently (for example only a few times in a day) and where historical values are not required.  When overused, Persistent properties can put significant pressure on the database layer of your ThingWorx implementation, leading to poor performance of your IOT application.  As the number of Things in your IOT application scales up, the quantity or frequency of persistent properties per Thing needs to be carefully considered.   Logged Properties are best used for telemetry data where historical values need to be retained, but also for any other value that is expected to change frequently.  Logged properties can create some additional requirements: a process for handling null/default values after restarts, more disk space, and a data retention policy. There are benefits as well, though, like more flexibility and scalability for the ingestion of larger volumes of data.   Persistent + Logged Properties perform database operations of both aspects.  Combined use should be very limited – only properties that update infrequently (a few times a day), and that must be in-memory in the event of a ThingWorx restart.   In-Memory Only Properties are neither persistent nor logged – they are not stored to the database.  These properties can greatly improve scale for values that need to be available for the application to drive UIs or compute other derived values that will be stored.  However, high-frequency updates of in-memory properties can create scale challenges in HA (high availability) ThingWorx configurations where memory state needs to be constantly shared between multiple ThingWorx nodes.     Find a complete summary as well as example cases in the document attached.
View full tip
When to Include InfluxDB in the ThingWorx Development Lifecycle (this article is also available for download as a PDF attached)   The Short Answer InfluxDB is a time series database designed specifically for data ingestion. Historically, InfluxDB has been viewed as a high-scale expansion option for ThingWorx: a way to ensure the application works as intended, even when scaled up to the enterprise level. This is certainly one way to view it, because when there are many, many remote things, each with a lot of properties writing to the Platform at short intervals, then InfluxDB is a sure choice. However, what about in smaller applications? Is there still a benefit to using an optimized data ingestion tool in any case? The short answer is: yes, there is!   Using InfluxDB for optimized data ingestion is a good idea even in smaller-sized applications, especially if there are plans to scale the application up in the future. It is far better to design the application around InfluxDB from the start than to adjust the data model of the application later on when an optimized data ingestion process is required. PostgreSQL and InfluxDB simply handle the storage of data in different ways, with the former functioning better with many Value Streams, and the latter with fewer Value Streams. Switching the way data is retained and referenced later, when the application is already on the larger side, causes delays in growing the application larger and adding more devices. Likewise, if the Platform reaches its ingestion limits in a production environment, there can be costly downtime and data loss while a proper solution (which likely involves reworking the application to work optimally with InfluxDB) is implemented.   Don’t think that InfluxDB is for expansion only; it is an optimized ingestion database that has benefits at every level of the application development lifecycle. From the end to end, InfluxDB can ensure reliable data ingestion, reduced risk of data loss, and reduced memory and CPU used by the deployment overall. Preliminary sizing and benchmark data is provided in this article to explain these recommendations. Consider how ThingWorx is ingesting data now, how much CPU and other resources are used just for acquiring the data, and perhaps InfluxDB would seem a benefit to improve application performance.   The Long Answer In order to uncover just how beneficial InfluxDB can be in any size application, the IoT Enterprise Deployment Center has run some simulations with small and medium sized applications. The use case in the simulation is simple with user requests coming from a collection of basic mashups and data ingestion coming from various numbers of things, each with a collection of “fast” and “slow” properties which update at different rates. This synthetic load of data does not include a more complete application scenario, so the memory and CPU usage shown here should not be used as sizing recommendations. For those types of recommendations, stay tuned for the soon-to-release ThingWorx 9.0 Sizing (or check out the current 8.5 Sizing Guide).   Comparing Runs When determining the health of the ThingWorx Platform, there are several categories to inspect: Value Stream Queue Rate and Queue Size, HTTP Requests, and the overall Memory and CPU use for each server. Using Grafana to store the metrics results in charts like those below which can easily be compared and contrasted, and used to evaluate which hardware configuration results in the best performance. The size of the numbers on the vertical axis indicate total numbers of resources used for that metric, while the slope or trend of each chart indicates bottlenecks and inadequate resource allocation for the use case.   In this case, all darker charts represent data from PostgreSQL ONLY configurations, while the lighter charts represent the InfluxDB instances. Because this is not a sizing guide, whether each of these charts comes from the small or medium run is unimportant as long as they match (for valid comparisons between with Influx and without it). The smaller run had something like 20k Things, and the larger closer to 60k, both with 275 total Platform users (25 Admins) and 3 mashups, which were each called at various refresh rates over the course of the 1-hour testing period. Note that in the PostgreSQL ONLY instances, there were more Thing Templates and corresponding Value Streams. This change is necessary between runs because only with fewer Value Streams does InfluxDB begin to demonstrate notable improvements.   The most important thing to note is that the lighter charts clearly demonstrate better performance for both size runs. Each section below will break down what the improvement looks like in the charts to show how to use Grafana to verify the best performance.   Value Stream Queue The vertical axis on the Value Stream Queue Rate chart shows how many total writes per minute (WPS) the Platform can handle. The average is 10 WPS higher using InfluxDB in both scenarios, and InfluxDB is also much more stable, meaning that the writes happen more reliably. The Value Stream Queue Size chart demonstrates how well the writes within the queue are processed. Both of these are necessary to determine the health of data ingestion.   If the queue size were to increase and trend upward in the lighter Queue Size chart, then that would mean the Platform couldn’t handle the higher ingestion rate. However, since the Queue Size is stable and close to 0 the entire time, it is clear that the Platform is capable of clearing out the Value Stream Queue immediately and reliably throughout the entire test. FIGURE 1 – THESE REPRESENT THE DATA GETTING STORED INTO THE DATA PROVIDER. NOTE: THE FORMER IS MUCH LOWER THAN THE LATTER.   FIGURE 2 – NOTE THE DATA LOSS IN THE NON-INFLUX INSTANCE (THE QUEUE IN GREEN REACHES THE MAX IN YELLOW). THE INFLUX INSTANCE HAS LESS TROUBLE CLEARING OUT THE QUEUE, AS DEMONSTRATED BY THE CONSISTENTLY LOW QUEUE SIZE.   HTTP Requests Taking the strain of ingestion off of the Platform’s primary database frees its resources up for other activities. This in turn improves the performance and reliability of the Platform to respond to HTTP requests, those which in a typical application are used to aggregate data into smaller data stores (depending on the use case) and which render the mashups for the end users. The business logic and mashups can be more complex when there is one database designated for ingestion (InfluxDB) and one for everything else (PostgreSQL). FIGURE 3 – THE DARKER CHART SHOWS A LOT OF CHOPPINESS, MEANING THAT WHILE THE PLATFORM WAS RESPONDING THE WHOLE TIME, IT WAS NOT DOING SO RELIABLY. THE SMOOTHER SECOND CHART SHOWS HOW MUCH EASIER THE PLATFORM CAN HANDLE THESE REQUESTS WHEN THE LOAD IS DISTRUBITED INTELLIGENTLY ACROSS MULTIPLE SERVERS, EACH OPTIMIZED FOR THE TYPE OF DATA THEY RECEIVE. THE “STAIRCASE” SHAPE OCCURS BECAUSE THE SIMULATOR INCREASES THE WORK LOAD EVERY 10 MINUTES UNTIL IT BREAKS.   Likewise, the nature of Postgres lends well towards this differentiation, given that there are many more database tables required for supporting the HTTP requests, something Postgres does well. That leaves Influx to handle the time-series data and ingestion, and those are the primary strengths of that software as well. So, splitting the load across multiple servers in this way results in smaller server sizes overall, each which is stream-lined and optimized to handle exactly what it is given by the Platform.   Note that in both of these charts, there are no bad requests, so both would seem to be successful runs. However, as future charts will demonstrate more clearly, there is a catastrophic failure when the load is increased around 12:30p. The simulation ends before the server begins to show any real symptoms of the issue, and that is why there are no bad requests. The maximum Operations Per Second (OPS) in the Hardware Specifications and Performance section is taken from before the failure begins.   Clearly the InfluxDB instance has better performance given that the average Operations Per Second (OPS) is substantially higher, nearly 4 times what is seen in the PostgreSQL ONLY instance. Obviously how well the Platform manages the business logic and mashup loading will depend on a lot of factors. In this test scenario, the OPS was increased by increasing the mashup refresh rate on the InfluxDB instances (which could handle over double the operations). Likewise, the number of Stream writes to the PostgreSQL database could be double what it was when PostgreSQL was the only database. Therefore, configuring InfluxDB for the data ingestion and leaving Postgres for the rest of the application certainly makes the load much easier on the Platform, and the same would be true even in a much more complex scenario.   Memory and CPU The important thing here is to keep the memory use low enough that any spikes in usage won’t cause a server malfunction. CPU Usage should stay at or below around 75%, and Memory should never exceed around 80% of the total allocated to the server. The sizing guides can help determine what this allocation of memory needs to be.   Of note in these charts is the slight, upward slope of the CPU usage in the darker chart, indicating the start of a catastrophic failure, and the difference in the total memory needed for the ThingWorx Platform and Postgres servers when Influx is used or not. As is apparent, the servers use much less memory when the database load is split up intelligently across multiple servers.   FIGURE 4 – THE THINGWORX CPU IS ABOUT THE SAME HERE AS IN THE INFLUXDB CONFIGURATION BELOW BUT LOOK AT HOW MUCH MORE MEMORY BOTH THE PLATFORM AND THE POSTGRES DATABASE NEED ALLOCATED TO THEM IN THIS CONFIGURATION (64 GB A PIECE). ALSO NOTE THE JUMP IN CPU AND MEMORY USAGE AFTER 12:30P. THIS IS REFERENCED IN THE PREVIOUS SECTION, AND THE SLOPE UPWARD OF THE USAGE AFTER THAT POINT INDICATES THE START OF A CATASROPHIC FAILURE. THE TEST ENDS TOO SOON TO SEE ANY SYMPTOMS OF FAILURE, BUT IT IS A SURE THING AFTER THE INCREASE IN LOAD AROUND 12:30P. FIGURE 5 – INFLUX NEEDS AN EXTRA SERVER, BUT THE SIZE OF THE INFLUX AND POSTGRES SERVERS TOGETHER IS LESS THAN HALF THE SIZE AS THAT REQUIRED FOR THE SINGLE POSTGRES DATABASE IN THE POSTGRES ONLY CONFIGURATION (8 GB). THINGWORX IS SMALLER TOO (32 GB).     Hardware Specifications and Performance These are the exact specifications for each simulated instance, broken down by size and whether InfluxDB is configured or not. Note that some of the hardware specifications may be more than is necessary real-world use case depending. As stated previously, this document is not a sizing guide (use the official ThingWorx Sizing Guide). Note that the maximum number of WPS and OPS are shown here. The maximums are higher in the InfluxDB scenarios, meaning that even with smaller-sized servers, the InfluxDB configurations can handle much greater loads.   Summary In conclusion, if InfluxDB may at some point be needed in the lifecycle of an application, because the expected number of things or the number of properties on each thing is large enough that it will max the limitations of the Platform otherwise, then InfluxDB should be used from the very start. There are benefits to using InfluxDB for data ingestion at every size, from performance to reliability, and of course the obviously improved scalability as well.   Reworking the application for use with InfluxDB later on can be costly and cause delays. This is why the benefits and costs associated with an InfluxDB-centric hardware configuration should be considered from the start. More servers are required for InfluxDB, but each of these servers can be sized smaller (depending on the use case), and all of this will affect the overall cost of hosting the ThingWorx application. The benefits of InfluxDB are especially pronounced when used in conjunction with clusters, which will be demonstrate fully in the 9.0 Sizing Guide (soon to be released). If InfluxDB is used to interface with the clusters, then there are even more resources to spare for user requests.   It is considered ThingWorx best practice for high ingestion customers to make use of InfluxDB in applications of any size. Note, though, that this will mean the number of Value Streams per Influx Database will need to be limited to single digits. We hope this helps, and from everyone here at the EDC, happy developing!
View full tip
ThingWorx DevOps with Jenkins DevOps as a topic is vast and has been addressed at many times throughout the history of the PTC Community. Previous posts address what DevOps is, teach how to make use of DevOps like a pro,  announce updates to the PTC Git Extension, and explain why this extension is so helpful to achieving continuous Git integration with ThingWorx.   This post provides a PDF guide on Jenkins integration with ThingWorx, including tutorials with detailed information on how to setup your ThingWorx instance and how to configure your Jenkins Pipeline. The PDF is listed for download separately, but it is also included in the zip with the other required files for the tutorial. The Jenkins Pipeline provided here is intended as an example / starting point for managing your DevOps in ThingWorx and can easily be extended. Please note that this Pipeline is not officially supported by PTC. 
View full tip
JMeter for ThingWorx Overview Apache JMeter is an open-source tool designed for load testing and measuring the performance of a web application. JMeter has a wide range of features to facilitate this testing, including support for a variety of server and protocol types, a full-featured testing IDE with the ability to record the test steps from both a browser or a native application, and built-in debugging tools. Information about JMeter can be found on Apache’s website.   Working with JMeter is not always intuitive, but it also isn’t that much harder than regular software development. Take some time to explore the official Apache JMeter Documentation and figure out where things go and how to mechanically make use of the JMeter IDE. Then step through this tutorial to create a basic test that logins to ThingWorx, accesses a mashup, and clicks on a few widgets. This is the first in a series to come, courtesy of IoT EDC Engineer Tim Atwood ( @atwood ) and the whole EDC team.   Installation Download JMeter from Apache’s website. Unpack the archive and copy the files to a desired location. Run the application by double clicking on the “ApacheJMeter.jar” file within the bin directory. JMeter is now installed and ready to use. Creating a Test Set up a proxy in your browser of choice (or on the OS in settings).   Select the green “templates” icon in JMeter, and then select “Recording” for the template.   Configure the recording template to point towards your ThingWorx Navigate or Foundation server, then click “Create”. Hit “Start” under the “HTTP(S) Test Script Recorder” tab of the new JMeter project. Make sure the port is set correctly under Global Settings.   A pop-up box will appear that always stays visible on top of the active browser window, so that the recording can be controlled and stopped at any time. Leave the “Transaction name” field empty so that each transaction recorded by the software is automatically named after the web request (this helps differentiate one from the other, and they can each be renamed later).   Open your browser, and navigate (via direct URL if possible, to keep things simple) to the mashup you wish to test. Login and let the page load. Click on anything you’d like on the mashup to capture the activity of that test. Then click “Stop” on the pop-up recorder window to stop the recording. Each transaction will be assigned an index as well, and the source code behind each of these transactions can be reviewed and manually modified in the main JMeter window. Here is the login request for instance:   The HTTP Authorization Manager is used to automatically authorize a defined user login for the thread to any of the Base URLs listed. In this case, though, there are two separate servers being accessed during the test, and one may need to be added manually:   Save the project before continuing, as manual modifications come next.   Within the task page as you do the recording, a set of parameters or body data will be recorded. Modifying this is how you want to parametrize the test scenario, variables like the username and password. To simulate logging in as other users, you have to parameterize this, and not rely on the administrator account name and password entered into the browser.   Rename the task controller to “MyTasks” or something more easily identified than the long string it has now:   Some recorded items like static images and stylesheets will be non-essential, things the browser processes for better graphical representation, but which are often cached and do not greatly affect the scalability results of the test. These can be highlighted and disabled all at once:   Also ensure that any cascading stylesheets have been disabled. Enable the “View Results Tree” to ensure you can review the results of the test script during the editing phase. However, this “Listener” element has a high memory footprint during test execution, so it should be disabled before running an actual scale test.   Next we need to parametrize the user login information and pull it from a csv file.   The colon means that “Administrator” is the default user to use for login.   You can add other properties as well, like ramp up time, run time, number of users, and protocols to use. The ramp up time determines how quickly the threads are allocated for the test, which if done slowly enough, prevents the thundering herd scenario. In more complex scenarios, logic controllers can be inserted to control the flow of the test. This allows for options such as if-then conditions for different user permissions, or parameter-based routes for better randomization of actions in different threads. This will be covered in more detail in a future article.   Pre- and Post-Processors can be used as well, with the latter being used here much more than the former, to extract information from the response, in order to then use that as part of the variables going into one of the follow up requests. For example, see the script in this image: This one has a variable that it extracts from the object number property, defined in the CSV file, and converts it into another variable that is used in subsequent scripts. This script uses the object number reference to pull the name out of the body data and make the request, which is then post-processed by a bunch of these extractors. One is a JSON extractor which is trying to get an ID out of the JSON response. There is a regular expression extractor and a bean shell post-processor, which populates some variables based on what it responded with. Once it extracts all of the variables from the response to this particular request (GetSearchResults in this case), it then tailors the additional requests based on these. -   Customize the script according to the needs of your own application. Alternate between recording and manually modifying the recording code to ensure the test performs exactly as required and from the perspective of different users with different permissions. Also vary the type of activity performed on the mashup. Highlight the “View Results Tree” tab and click the green start button at the top of the window to see the results appear.     If you are getting an unauthorized message, ensure that the scope is right for the login information, which may require moving the “HTTP Authentication Manager” component around in the project. Be sure to check the URLs and credentials entered for each type of user. Occasionally the recorder will insert a long authentication string into the URL, and you want to manually set the URL for the credentials to the most generic URL possible for the server. This can be parametrized too: Referencing the CSV file defined here: Which looks like this for a more complicated scenario (covered in the future):  The columns here represent the username, password, object number in Windchill, and object name in Windchill, as well as the wait time used to vary the way the logic is executed and some extra variables which differentiate for the switches what to do to create a more varied and realistic test.   Conclusion Following these steps again and again on the various mashups throughout an application can ensure that a script for each web page and each type of user on each web page is created and added to the testing suite. This results in a load test that is perfectly representative of the real-world user load placed on an application. Load testing is a critical part of the development lifecycle in any application, and ThingWorx is no exception. Any further questions about the capabilities of JMeter not covered here, can be answered by the whole JMeter user manual, found on the Apache website. Future articles will include some basic scripts that test basic things, which can serve as an example for more complex ThingWorx JMeter script development. Here is an example of one tool PTC uses for internal QA of ThingWorx, designed to load test a Navigate application (specifically its built-in mashups):   Something similar to this tool may be available for public use later this summer. In the meantime, feel free to use the tutorial above to create scripts of your own. Any issues building your custom load tests in JMeter can be discussed right here on this thread with our JMeter experts. Happy developing!
View full tip
Load Testing through C SDK Remote Device Simulation in ThingWorx   As discussed in the EDC's previous article, load or stress testing a ThingWorx application is very important to the application development process and comes highly recommended by PTC best practices. This article will show how to do stress testing using the ThingWorx C SDK at the Edge side. Attached to this article is a download containing a generic C SDK application and accompanying simulator software written in python. This article will discuss how to unpack everything and move it to the right location on a Linux machine (Ubuntu 16.04 was used in this tutorial and sudo privileges will be necessary). To make this a true test of the Edge software, modify the C SDK code provided or substitute in any custom code used in the Edge devices which connect to the actual application.   It is assumed that ThingWorx is already installed and configured correctly. Anaconda will be downloaded and installed as a part of this tutorial. Note that the simulator only logs at the "error" level on the SDK side, and the data log has been disabled entirely to save resources. For any questions on this tutorial, reach out to the author Desheng Xu from the EDC team (@DeShengXu).   Background: Within ThingWorx, most things represent remote devices located at the Edge. These are pieces of physical equipment which are out in the field and which connect and transmit information to the ThingWorx Platform. Each remote device can have many properties, which can be bound to local properties. In the image below, the example property "Pressure" is bound to the local property "Pressure". The last column indicates whether the property value should be stored in a time series database when the value changes. Only "Pressure" and "TotalFlow" are stored in this way.  A good stress test will have many properties receiving updates simultaneously, so for this test, more properties will be added. An example shown here has 5 integers, 3 numbers, 2 strings, and 1 sin signal property.   Installation: Download Python 3 if it isn't already installed Download Anaconda version 5.2 Sometimes managing multiple Python environments is hard on Linux, especially in Ubuntu and when using an Azure VM. Anaconda is a very convenient way to manage it. Some commands which may help to download Anaconda are provided here, but this is not a comprehensive tutorial for Anaconda installation and configuration. Download Anaconda curl -O https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh  Install Anaconda (this may take 10+ minutes, depending on the hardware and network specifications) bash Anaconda3-5.2.0-Linux-x86_64.sh​ To activate the Anaconda installation, load the new PATH environment variable which was added by the Anaconda installer into the current shell session with the following command: source ~/.bashrc​ Create an environment for stress testing. Let's name this environment as "stress" conda create -n stress python=3.7​ Activate "stress" environment every time you need to use simulator.py source activate stress​  Install the required Python modules Certain modules are needed in the Python environment in order to run the simulator.py  file: psutil, requests. Use the following commands to install these (if using Anaconda as installed above): conda install -n stress -c anaconda psutil conda install -n stress -c anaconda requests​ Unpack the download attached here called csim.zip Unzip csim.zip  and move it into the /opt  folder (if another folder is used, remember to change the page in the simulator.json  file later) Assign your current user full access to this folder (this command assumes the current user is called ubuntu ) : sudo chown -R ubuntu:ubuntu /opt/csim   Move the C SDK source folder to the lib  folder Use the following command:  sudo mv /opt/csim/csdkbuild/libtwCSdk.so.2.2.4 /usr/lib​ You may have to also grant a+x permissions to all files in this folder Update the configuration file for the simulator Open /opt/csim/simulator.json  (or whatever path is used instead) Edit this file to meet your environment needs, based on the information below Familiarize yourself with the simulator.py file and its options Use the following command to get option information: python simulator.py --help​   Set-Up Test Scenario: Plan your test Each simulator instance will have 8 remote properties by default (as shown in the picture in the Background section). More properties can be added for stress test purposes in the simulator.json  file. For the simulator to run 1k writes per second to a time series database, use the following configuration information (note that for this test, a machine with 4 cores and 16G of memory was used. Greater hardware specifications may be required for a larger test): Forget about the default 8 properties, which have random update patterns and result in difficult results to check later. Instead, create "canary properties" for each thing (where canary refers to the nature of a thing to notify others of danger, in the same way canaries were used in mine shafts) Add 25 properties for each thing: 10 integer properties 5 number properties 5 string properties 5 sin properties (signals) Set the scan rate to 5000 ms, making it so that each of these 25 properties will update every 5 seconds. To get a writes per second rate of 1k, we therefore need 200 devices in total, which is specified by the start and end number lines of the configuration file The simulator.json  file should look like this: Canary_Int: 10 Canary_Num: 5 Canary_Str: 5 Canary_Sin: 5 Start_Number: 1 End_Number: 200​ Run the simulator Enter the /opt/csim  folder, and execute the following command: python simulator.py ./simulator.json -i​ You should be able to see a screen like this: Go to ThingWorx to check if there is a dummy thing (under Remote Things in the Monitoring section): This indicates that the simulator is running correctly and connected to ThingWorx Create a Value Stream and point it at the target database Create a new thing and call it "SimulatorDummyThing" Once this is created successfully and saved, a message should pop up to say that the device was successfully connected Bind the remote properties to the new thing Click the "Properties and Alerts" tab Click "Manage Bindings" Click "Add all properties" Click "Done" and then "Save" The properties should begin updating immediately (every 5 seconds), so click "Refresh" to check Create a Thing Template from this thing Click the "More" drop-down and select "Create ThingTemplate" Give the template a name (ensure it matches what is defined in the simulator.json  file) and save it Go back and delete the dummy thing created in Step 4, as now we no longer need it Clean up the simulator Use the following command: python simulator.py ./simulator.json -k​ Output will look like this: Create 200 things in ThingWorx for the stress test Verify the information in the simulator.json  file (especially the start and end numbers) is correct Use the following command to create all things: python simulator.py ./simulator.json -c​ The output will look like this: Verify the things have also been created in ThingWorx: Now you are ready for the stress test   Run Stress Test: Use the following command to start your test: python simulator.py ./simulator.json -l​ or python simulator.py ./simulator.json --launch The output in the simulator will look like this: Monitor the Value Stream writing status in the Monitoring section of ThingWorx:   Stop and Clean Up: Use the following command to stop running all instances: python simulator.py ./simulator -k​ If you want to clean up all created dummy things, then use this command: python simulator.py ./simulator -d​ To re-initiate the test at a later date, just repeat the steps in the "Run Stress Test" section above, or re-configure the test by reviewing the steps in the "Set-Up Test Scenario" section   That concludes the tutorial on how to use the C SDK in a stress or load test of a ThingWorx application. Be sure to modify the created Thing Template (created in step 6 of the "Set-Up Test Scenario" section) with any business logic required, for instance events and alerts, to ensure a proper test of the application. 
View full tip
By Tim Atwood and Dave Bernbeck, Edited by Tori Firewind Adapted from the March 2021 Expert Session Produced by the IoT Enterprise Deployment Center The primary purpose of monitoring is to determine when your application may be exhausting the available resources. Knowledge of the infrastructure limits help establish these monitoring boundaries, determining straightforward thresholds that indicate an app has gone too far. The four main areas to monitor in this way are CPU, Memory, Networking, and Disk.   For the CPU, we want to know how many cores are available to the application and potentially what the temperature is for each or other indicators of overtaxation. For Memory, we want to know how much RAM is available for the application. For Networking, we want to know the network throughput, the available bandwidth, and how capable the network cards are in general. For Disk, we keep track of the read and write rates of the disks used by the application as well as how much space remains on those.   There are several major infrastructure categories which reflect common modes of operation for ThingWorx applications. One is Bare Metal, which relies upon the traditional use of hardware to connect directly between operating system and hardware, with no intermediary. Limits of the hardware in this case can be found in manufacturing specifications, within the operating system settings, and listed somewhere within the IT department normally. The IT team is a great resource for obtaining these limits in general, also keeping track of such things in VMware and virtualized infrastructure models.   VMware is an intermediary between the operating system and the hardware, and often its limits are determined based on the sizing of the application and set by the IT team when the infrastructure is established. These can often be resized as needed, and the IT team will be well aware of the limits here, often monitoring some of the performance themselves already. This is especially so if Cloud Providers are used, given that these are scaled up virtualizations which are configured in easy-to-use cloud portals. These two infrastructure models can also be resized as needed.   Lastly Containers can be used to designate operating system resources as needed, in a much more specific way that better supports the sharing of resources across multiple systems. Here the limits are defined in configuration files or charts that define the container.   The difficulties here center around learning what the limits are, especially in the case of network and disk usage. Network bandwidth can fluctuate, and increased latency and network congestion can occur at random times for seemingly no reason. Most monitoring scenarios can therefore make due with collecting network send and receive rates, as well as disk read and write rates, performed on the server.   Cloud Providers like Azure provide VM and disk sizing options that allow you to select exactly what you need, but for network throughput or network IO, the choices are not as varied. Network IO tends to increase with the size of the VM, proportional to the number of CPU cores and the amount of Memory, so this may mean that a VM has to be oversized for the user load, for the bulk of the application, in order to accommodate a large or noisy edge fleet. The next few slides list the operating metrics and common thresholds used for each. We often use these thresholds in our own simulations here at PTC, but note that each use case is different, and each situation should be analyzed individually before determining set limits of performance.   Generally, you will want to monitor: % utilization of all CPU cores, leaving plenty of room for spikes in  activity; total and used memory, ensuring total memory remains constant throughout and used memory remains below a reasonable percentage of the total, which for smaller systems (16 GB and lower) means leaving around 20% Memory for the OS, and for larger systems, usually around 3-4 GB.    For disks, the read and write rates to ensure there is ample free space for spikes and to avoid any situation that might result in system down time;  and for networking, the send and receive rates which should be below 70% or so, again to leave room for spikes.   In any monitoring situation, high consistent utilization  should trigger concern and an investigation into  what’s happening. Were new assets added? Has any recent change caused regression or other issues?    Any resent changes should be inspected and the infrastructure sizing should be considered as well. For ThingWorx specific monitoring, we look at max queue sizes, entries performed, pool sizes, alerts, submitted task counts, and anything that might indicate some kind of data loss. We want the queues to be consistently cleared out to reduce the risk of losing data in the case of an interruption, and to ensure there is no reason for resource use to build up and cause issues over time. In order for a monitoring set-up to be truly helpful, it needs to make certain information easily accessible to administrative users of the application. Any metrics that are applicable to performance needs to be processed and recorded in a location that can be accessed quickly and easily from wherever the admins are. They should quickly and easily know the health of the application from a glance, without needing to drill down a lot to be made aware of issues. Likewise, the alerts that happen should be  meaningful, with minimal false alarms, and it is best if this is configurable by the admins from within the application via some sort of rules engine (see the DGIS guide, soon to be released in version 9.1). The  monitoring tool should also be able to save the system history and export it for further analysis, all in the name of reducing future downtime and creating a stable, enterprise system.     This dashboard (above) is a good example of how to  rollup a number of performance criteria into health indicators for various aspects of the application. Here there is a Green-Yellow-Red color-coding system for issues like web requests taking longer than 30s, 3 minutes, or more to respond.   Grafana is another application used for monitoring internally by our team. The easy dashboard creation feature and built-in chart modes make this tool  super easy to get started with, and certainly easy to refer to from a central location over time. Setting this up is helpful for load testing and making ready an application, but it is also beneficial for continued monitoring post-go-live, and hence why it is a worthy investment. Our team usually builds a link based on the start and end time of tests for each simulation performed, with all of the various servers being monitored by one Grafana server, one reference point.   Consider using PTC Performance Advisor to help monitor these kinds of things more easily (also called DynaTrace). When most administrators think of monitoring, they think of reading and reacting to dashboards, alerts, and reports. Rarely does the idea of benchmarking come to mind as a monitoring activity, and yet, having successful benchmarks of system performance can be a crucial part of knowing if an application is functioning as expected before there are major issues. Benchmarks also look at the response time of the server and can better enable  tracking of actual end user experience. The best  option is to automate such tests using JMeter or other applications, producing a daily snapshot of user performance that can anticipate future issues and create a more reliable experience for end users over time.   Another tool to make use of is JMeter, which has the option to build custom reports. JMeter is good for simulating the user load, which often makes up most of the server load of a ThingWorx application, especially considering that ingestion is typically optimized independently and given the most thought. The most unexpected issues tend to pop up within the application itself, after the project has gone live.   Shown here (right) is an example benchmark from a Windchill application, one which is published by PTC to facilitate comparison between optimized test systems and real life performance. Likewise, DynaTrace is depicted here, showing an automated baseline (using Smart URL Detection) on Response Time (Median and 90th percentile) as well as Failure Rate. We can also look at Throughput and compare it with the expected value range based on historical throughput data. Monitoring typically increases system performance  and availability, but its other advantage is to provide faster, more effective troubleshooting. Establish a systematic process or checklist to step through when problems occur, something that is organized to be done quickly, but still takes the time to find and fix the underlying problems. This will prevent issues from happening again and again and polish the system periodically as problems occur, so that the stability and integrity of the system only improves over time. Push for real solutions if possible, not band-aids, even if more downtime is needed up front; it is always better to have planned downtime up front than unplanned downtime down the line. Close any monitoring gaps when issues do occur, which is the valid RCA response if not enough information was captured to actually diagnose or resolve the issue.   PTC Tech Support developed a diagnostic data gathering query for Oracle that customers can use, found in our knowledgebase. This is an example of RCA troubleshooting that looks at different database factors, reporting on which queries perform the worst  based on inputted criteria. Another example of troubleshooting is for the Java JVM, where we look at all of the things listed here (below) in an automated, documented process that then generates a report for easy end user consumption.   Don’t hesitate to reach out to PTC Technical Support in advance to go over your RCA processes, to review benchmark discrepancies between what PTC publishes and what your real-life systems show, and to ensure your monitoring is adequate to maintain system stability and availability at all times.  
View full tip
Load Testing through Remote Device Simulation   Designing an enterprise-ready application requires extensive testing and quality assurance. This includes all sorts of tests, of course, from examining the user interface for flaws to verifying there is correct logic in all background services. However, no area of testing is more important than scalability. Load testing is how to test the application to ensure it still functions as desired when remote things are connected and streaming information to the Platform.   Load testing is considered a critical component of the change management process. It is mentioned numerous times throughout PTC best practice documentation. This tutorial will step you through designing a load test using Kepware as a simulator. Kepware is free to download and use in short demos, making it the perfect tool for this type of test.   Start by acquiring the latest version of Kepware from the download site. Click “Download Free Demo” if a license was not included in your PTC product package. The installation of Kepware is simple, and for details, see the Kepware Installation Guide. The tutorial shown here uses Kepware version 6.7 and ThingWorx version 8.4.4. Given that we are testing a ThingWorx application, this tutorial assumes ThingWorx is already installed and configured correctly.   Once Kepware is installed, follow these steps: (This tutorial was developed by Desheng Xu and edited by Victoria Tielebein. Exact specifications of the equipment used in both large scale and local tests are given in step VI, which discusses the size of the simulation)   Understand how to configure Kepware as a simulator Go to the Help menu within Kepware, and click on “Driver Help” Select “Simulator” in the pop-up window, and click “OK” Expand “Address Descriptions” and then “Simulation Functions” Select “Ramp Function” to review details about the function needed for this tutorial, as well as information about function syntax Close the window once this information has been reviewed Create a new project in Kepware Click “File” > “New” In case you are connected to runtime, Kepware will allow you to choose to edit this project offline Add a channel in Kepware Channels represent threads which Kepware will use to contact ThingWorx Under “Connectivity”, click “Click to add a channel.” From the drop-down list, select “Simulator” Use all the default settings, selecting “Next” all the way down to “Finish” Next, add one device to the channel Highlight the new channel and click “Click to add a device” (which will appear in the center of the screen) Once again, use the default settings, selecting “Next” all the way down to “Finish” Add a tag to this device Within Kepware, tags represent properties which bind to remote things on the Platform and update with new information over time. Each device will need several tags to simulate remote property updates. The easiest way to add many tags for testing is to create one, and then copy and paste it. Highlight the device created in the previous step and click “Click to add static tag”, which appears in the center of the screen For “name” type “tag1” For Address, enter the Ramp function: RAMP(1000,1,2000000,1) The first parameter is the update rate given in milliseconds The next two parameters are the range of values which can be sent The last parameter is the increment or step Together this means that every 1 second, this tag will send a new value that is 1 higher than the previous value to the Platform, starting at 1 and ending at 2 million Ensure the Data Type is given as “DWord” or any type which will be read as a “Number” (and NOT an “Integer”) on the Platform Change the Scan Rate to 250 Then click “OK” Add more devices to the test The most basic set-up is now done: if this project connected to the Platform, one remote thing with one remote property could be used to simulate property updates. That is not very useful for load testing, however. We need many more things than this, and many more properties. The number of tags on each device should match the expected number of remote properties in the application itself. The number of devices in each channel should be large enough that when more channels are created, the number of total devices is close to the target for the application. For example, to simulate 10,000 things, each with 25 remote properties, we need 25 tags per device, 200 devices per channel, and 50 channels. This would require a lot of memory to run and should not be attempted on a local machine. A full test of 40 channels each with 10 devices was performed as shown in the screenshots here. This simulates 10,000 writes per second to the Platform total, or about 400 remote device connections. This test used the following hardware specifications: Kepware machine running Windows 2016 64-bit, 2 cores, 8G ThingWorx Platform machine running Ubuntu 16.04, 4 cores, 16G PostgreSQL 9.6 machine running Ubuntu 16.04, 4 cores, 16G Influx 1.6.3 machine running Ubuntu 16.04, 4 cores, 16G A local test was also run on Windows 10 (64-bit), using the H2 database, with Kepware and ThingWorx running side by side on the machine, 4 cores, 16G. This test made use of only 2 channels, with 10 devices each. For local tests to see how the simulation works, this is fine, but a more robust set-up like the above will be needed in a true load test. If there is not enough memory on the machine hosting Kepware, errors like this will appear in the Kepware logs: One or more value change updates lost due to insufficient space in the connection buffer. Once you decide on the number of tags and devices needed, follow the steps below to add them.  To add more tags, copy and paste the existing tag (ctrl+c  and ctrl+v  work in Kepware for convenience) until there as many tags as desired To add more devices, highlight the device in Kepware and copy and paste it as well (click on the channel before pasting) Then, copy and paste the entire channel until the number of channels, devices, and tags totals the desired load (be sure to click on “Connectivity” before hitting paste this time)  Configure the ThingWorx connection Right click on Project in the left-hand navigation bar and in the pop-up window that appears, highlight ThingWorx Change the “Enable” field to “Yes” to activate the other fields Fill in the details for “Host”, “Port”, “Application Key”, and “Thing name” Note that the application key will need to be created in ThingWorx and then the value copied in here The certificate and encryption settings may also need to be adjusted to match your environment For local set-ups, it is likely that self-signed and all certificates will need to be accepted, so both of those fields will likely need to be set to “Yes” (Encryption may need to be disabled as well). In production systems, this should not be the case  Save the project It doesn’t matter too much if this project is saved as encrypted or not, so either enter a password to encrypt the save or select “No encryption” Connect to ThingWorx Click “Runtime” > “Connect…” A pop-up will appear asking if you want to load this project, click “Yes” The connection status should then appear in the bottom portion of the window where the logs are displayed Configure in ThingWorx Login to the ThingWorx Platform Under “Industrial Connections” a thing should appear which is named as indicated in the Kepware configuration step above Click to open this thing and save it Also create a new thing, a value stream for ingesting data from Kepware Create remote things in ThingWorx Import the provided entity into ThingWorx (should appear as a downloadable attachment to this post) Open the KepwareUtil thing and go to the services tab Run the AutoKepwareCreate service to generate remote things on the Platform Give the name of the stream created above so each thing has a place to store property information The IgnoreTemplate flag should be set to false. This allows for the service to create a thing template first, which is then passed to the remote devices. The only reason this would be set to true is if the devices need to be deleted and recreated, but the template does not (then set the flag to true). To delete the devices, use the AutoKepwareDelete service also provided on the KepwareUtil thing Note that the AutoKepwareCreate service is asynchronous, so once it is executed, close the window and check the script logs to see when it completes. The logs will look like: KepwareUtil AutoKepwareCreate task finished!!! Check status of remote things Once the things are created, they should automatically connect to the Platform Run the TotalDeviceByTemplateWithTemplate service to see if the things are connected The template given here could be the one created by the AutoKepwareCreate service, or just give it RemoteThings if this is a small local set-up without many remote things on it The number of devices will equal the number of devices per channel times the total number of channels, which in the test shown here, is 400 isConnected will be checked if all of the devices are connected without issue If some of them are not connected, verify in the logs if there are any errors and resolve those before moving on View Ingestion Rate Once the devices are created, their tags should show as numbers (NOT integers), and they should already be updating with new values every second To view the ingestion rate, run the KepwareUtil service AutoKepwareRateSummary Give the thing template name that is created by the AutoKepwareCreate service, which will look like the name of the Kepware thing itself with a “T-“ in the front The start time should be close to the current time, and the periodInMinutes should be large enough to include some of the test (periodInMinutes is used to calculate the end time within the service) Note in the results here that the Average Write Per Second is only 9975 wps, which is close but not exactly what we would expect. This means that there are properties not updating correctly, which requires us to look at the logs and restart some things. If nothing shows up here, despite the Total Connected Things showing correctly, then look at the type of the tags on one of the remote devices. The type must be NUMBER for the query within this service to work, and not INTEGER. If the type of the tags is incorrect, then the type of the tags within Kepware was probably given as something which is not interpreted as a number in ThingWorx. Ensure DWord is used for the tags in Kepware Within the script log, look for any devices which show errors as seen in the image below and restart them to get their properties updating correctly Once the ingestion rate equals what is expected (in the case of the test here, 10,000 wps), use the AutoKepwareIngestionStat service on the KepwareUtil thing to see details about each remote device The TimeGapAvg in this service represents the gap between two ingestions in milliseconds, showing any lag that may be present between Kepware and ThingWorx The TimeGapSTD shows the standard deviation of the time gap between two ingestions on any given thing, also indicating lag (the lower this number, the better) The StartTime and EndTime show the first and last timestamp observed in the ThingWorx database during the given duration The totalCount shows the total number of ingested records during the sampling cycle The StartValue and EndValue fields show the first and last value ingested into the tag during the given duration If the ingestion rate is working as expected, and the ramp function is actually sending an update on time (in this case, once each second), then the difference between the EndValue and StartValue should always be equal to the totalCount plus 1. If this doesn’t match up, then there may be data loss or something else wrong with the property updates, which will show as a checked box in the valueException column. It is not enough to ensure that the ingestion rate is correct, as sometimes the rate may fluctuate only by 1 or 2 wps and appear perfect, even while some data is lost. That is why it is important to ensure that there are no valueException boxes showing as checked in the test of the application. If none of these are marked as having failed, then the test was successful and this ingestion rate is acceptable for the application   This tutorial is a very basic way to simulate many remote devices ingesting data into the Platform. For this to be a true test of the application, the remote things created in this test will need to be given business logic tasks as well. The AutoKepwareCreate service can be modified to give any template (and not just RemoteThing) to the thing template which is created and subsequently passed into the demo devices. Likewise, the template itself can be created, and then manually modified to look like the actual remote device template in the application, before the rest of the things are created (using the IgnoreTemplate flag in the creation and deletion services, as discussed above).   Ensure that events are triggered as expected and that subscriptions to property updates are in place on the thing template before creating the demo things. Make use of the subsystem monitor to ensure that the event, value stream, and stream queues do not grow so large that the Platform cannot keep up with the requests (for details about tuning the stream and value stream processing subsystems, see PTC’s best practice documentation). Also be sure to load some of the mashups to see how they perform while the ingestion test is happening. This will test whether or not the ingestion rate and business logic of the application can function side by side without errors, data loss, or performance issues.
View full tip
Thread Safe Coding, Part 1: The Java Extension Approach Written by Desheng Xu and edited by @vtielebein    Overview Time and again, customers report that one of their favorite ThingWorx features is using However, the Javascript language doesn't have a built-in semaphore locker mechanism, nothing to enable thread-safe concurrent processing, like you find in the Java language. This article demonstrates why thread safe coding is necessary and how to use the Java Extension for this purpose. Part 2 presents an alternative approach using database lockers.   Demo Use Case Let's use a highly abstracted use case to demo thread-safe code practices: There are tens of machines in a factory, and PLC will emit a signal to indicate an issue happens during run-time. The customer expects to have a dashboard that shows today's total count of issues from all machines in real-time. The customer is also expecting that a timestamp of each issue can be logged (regardless of the machine). Similar use cases might be to: Show the total product counts from each sub-line in the current shift. Show the total rentals of bicycles from all remote sites. Show the total issues of distant cash machines across the country.   Modeling Thing: DashboardCounter, which includes: 1 Property: name:counter, type:integer, logged:true, default value:0 3 services: IncreaseCounter(): increase counter value 1 GetCounter(): return current counter value ResetCounter(): set counter value to 0 1 Subscription: a subscription to the data change event of the property counter, which will print the new value and timestamp to the log.   GetCounter var result = me.counter;   IncreaseCounter me.counter = me.counter + 1; var result = me.counter;   ResetCounter me.counter = 0; var result = 0;   Subscription MonitorCounter Logger.info(eventData.newValue.value+":"+eventData.newValue.time.getTime());   ValueStream For simplicity, the value stream entity is not included in the attachment. Please go ahead and assign a value stream to this Thing to monitor the property values.   Test Tool A small test tool mulreqs is attached here, along with some extensions and ThingWorx entities that are useful. The mulreqs tool uses a configuration file from the OS variable definition MULTI_REQUEST_CONFIG.   In Linux/MacOS: export MULTI_REQUEST_CONFIG="./config.json" in config.json file, you can use the following configuration:       { "host":"twx85.desheng.io", "port":443, "protocol":"https", "endpoint":"/Thingworx/Things/DashboardCounter/services/IncreaseCounter", "headers":{ "Content-Type":"application/json", "Accept": "application/json", "AppKey":"5cafe6eb-adba-41df-a7d6-4fc8088125c1" }, "payload":{}, "round_break":50000, "req_break":0, "round_size":50, "total_round":20 }       host, port, protocol, headers are very identical to define a ThingWorx server. endpoint defines which service is called during the test. payload is not in use at this moment but you have to keep it here. total_round is how many rounds of the test you want to run. round_size defines how many requests will be sent simultaneously during each round. round_break is the pause time during each round in Microseconds, so 50000 in the above example means 50ms. req_break is 0, this is the delay between requests. "0" means requests to the server will happen simultaneously.   The expectation from the above configuration is service execution a total of 20*50 times, 1000 times. So, we can expect that if the initial value is 0, then counter should be 1000 at the end, and if the value stream is clean initially, then the value stream should have a history from 1 to 1000.   Run Test Use the following command to perform the test: .<your path>/mulreqs Execution output will look like:   Check Result You will be surprised that the final value is 926 instead of 1000. (Caution: this value will be different in different tests and it can be any value in the range of 1 and 1000). Now, look at the value stream by using QueryPropertyHistory. There are many values missing here, and while the total count can vary in different tests, it is unlikely to be exactly the last value (926). Notice that the last 5 values are: 926, 925, 921, 918. The values 919, 920, 922, and 923 are all missing. So next we check if there are any errors in the script log, and there are none. There are only print statements we deliberately placed in the logs. So, we have observed two symptoms here: The final value from property counter doesn't have the expected value. The value stream doesn't have the expected history of the counter property changes. What's the reason behind each symptom, and which one is a thread-safe issue?   Understanding Timestamp Granularity ThingWorx facilitates the collection of time series data and solutions centered around such data by allowing for use of the timestamp as the primary key. However, a timestamp will always have a minimal granularity definition when you process it. In ThingWorx, the minimal granularity or unit of a timestamp is one millisecond.   Looking at the log we generated from the subscription again, we see that several data points (922, 923, 924, 925) have the same timestamp (1596089891147), which is GMT Thursday, July 30, 2020, 6:18:11.147 AM. When each of these data points is flushed into the database, the later data points overwrite the earlier ones since they all have the same timestamp. So, data point 922 went into the value stream first, and then was overwritten by data point 923, and then 924, and then 925. The next data point in the value stream is 926, which has a new timestamp (1596089891148), 1ms behind the previous one. Therefore, data points 925 and 926 are stored while 922, 923, 924 are not. These lost data points are therefore NOT a thread-safe issue.   The reason why some of these data points have the same timestamp in this example is because multiple machines write to the same value stream. The right approach is to log data points at the individual machine level, with a different value stream per machine.   However, what happens if one machine emits data too frequently? If data points from the same machine still have a timestamp clash issue, then the signal frequency is too high. The recommended approach would be to down-sample the update frequency, as any frequency higher than 1000Hz will result in unexpected results like these.   Real Thread Safe Issue from Demo Use Case The final value of the counter being an arbitrary random number is the real thread-safe coding issue. if we take a look at the code again: me.counter = me.counter + 1; This piece of code can be split into three-piece: Step 1: read current value of me.counter Step 2: increase this value Step 3: set me.counter with new value. In a multi-threaded environment, not performing the above three steps as a single operation will lead to a race issue. The way to solve this issue is to use a locking mechanism to serialize access to the property, which will acquire the lock, perform the three operations, and then release the lock. This can be done using either the Java Extension or the database thing to leverage the database lock mechanism.   Use Java Extension to Handle Thread Safe Challenge This tutorial assumes that the Eclipse plug-in for ThingWorx extension development is already installed. The following will guide you through creating a simple Java extension step by step: Create a Java Extension Project Choose the minimal ThingWorx version to support and select the corresponding SDK. Let's name it JavaExtLocker, though it’s best to use lower-case in the project name. Add a ThingWorx Template in the src Folder Right-click the src folder and a a Thing Template. Add a Thing property Right click on the Java source file created in the above step and click the menu option called Thingworx Source, then select Add Property. Add Three Services: IncreaseCounter, GetCounter, ResetCounter Right click the Java source file and select the menu option called Thingworx source, then select Add Service. See above for the IncreaseCounter service details. Repeat these same steps to add GetCounter and ResetCounter: (Optionally) Add a Generated Serial ID Add Code to the Three Services @SuppressWarnings("deprecation") @ThingworxServiceDefinition(name = "IncreaseCounter", description = "", category = "", isAllowOverride = false, aspects = {"isAsync:false" }) @ThingworxServiceResult(name = "Result", description = "", baseType = "INTEGER", aspects = {}) public synchronized Integer IncreaseCounter() throws Exception { _logger.trace("Entering Service: IncreaseCounter"); int current_value = ((IntegerPrimitive (this.getPropertyValue("Counter"))).getValue(); current_value ++; this.setPropertyValue("Counter", new IntegerPrimitive(current_value)); _logger.trace("Exiting Service: IncreaseCounter"); return current_value; } @ThingworxServiceDefinition(name = "GetCounter", description = "", category = "", isAllowOverride = false, aspects = {"isAsync:false" }) @ThingworxServiceResult(name = "Result", description = "", baseType = "INTEGER", aspects = {}) public synchronized Integer GetCounter() throws Exception { _logger.trace("Entering Service: GetCounter"); int current_value = ((IntegerPrimitive)(this.getPropertyValue("Counter"))).getValue(); _logger.trace("Exiting Service: GetCounter"); return current_value; } @SuppressWarnings("deprecation") @ThingworxServiceDefinition(name = "ResetCounter", description = "", category = "", isAllowOverride = false, aspects = {"isAsync:false" }) @ThingworxServiceResult(name = "Result", description = "", baseType = "INTEGER", aspects = {}) public synchronized Integer ResetCounter() throws Exception { _logger.trace("Entering Service: ResetCounter"); this.setPropertyValue("Counter", new IntegerPrimitive(0)); _logger.trace("Exiting Service: ResetCounter"); return 0; }​ The key here is the synchronized modifier, which is what allows for Java to control the multi-threading to prevent data loss. Build the Application Use 'gradle build' to generate a build of the extension. Import the Extension into ThingWorx Create Thing Based on New Thing Template Check the New Thing Property and Service Definition Use the Same Test Tool to Run the Test Again { "host":"twx85.desheng.io", "port":443, "protocol":"https", "endpoint":"/Thingworx/Things/DeoLockerThing/services/IncreaseCounter", "headers":{ "Content-Type":"application/json", "Accept": "application/json", "AppKey":"5cafe6eb-adba-41df-a7d6-4fc8088125c1" }, "payload":{}, "round_break":50000, "req_break":0, "round_size":50, "total_round":20 } ​ Just change the endpoint to point to the new thing.  Check the Test Result Repeat the same test several times to ensure the results are consistent and expected (and don't forget to reset the counter between tests). Summary of Java Extension Approach The Java extension approach shown here uses the synchronized keyword to thread-safe the operation of several actions. Other options are to use a ReentryLock or Semaphore locker for the same purpose, but the synchronized keyword approach is much cleaner.   However, the Java extension locker will NOT work in 9.0 horizontal architecture since Java doesn't a have distributed locker. IgniteLocker wouldn't work in the current horizontal architecture, either. So if using a thread-safe counter in version 9.0+ horizontal architecture, then leverage the database thing, as discussion below.
View full tip
Generating and Reviewing JMeter Results Overview The 4th in a series of articles on load testing with JMeter, this one covers pushing the limits of a test to see how much the application can handle, as well as generating and analyzing reports once the testing completes. This article rounds off the basics of JMeter, such that anyone should be able to perform enterprise-level load testing after reviewing the content here.    Multiple criteria can be used to evaluate results, including: response time (as monitored both by JMeter, and by some other tool on the system side) throughput number of errors resource saturation CPU, Memory, disk, and network utilization Depending on use case, some of these may be considered more important than others. For instance, some customers don't care if users wait a while for results to appear on the page (response time), because they set their users' expectations and mitigate the experience with well-designed loading graphics. With response times secondary, the real issues center around data loss or system outages, with resource utilization and number of errors becoming the more important indicators of system health. Request and database timeout errors are more important indicators, as they occur most often when resources are saturated and there is data loss.   It is typical for many customers to find preventing data loss and/or promoting data integrity to be more important than preventing long response times. Consider which of these factors is most important to your use case as you determine what kind of information to gather and review in your reports.   How to Create Client-Side Reports in JMeter Creating reports for the client-side data is very simple using JMeter, both from the command line and within the UI (as shown in the tutorial below). These reports have graphical displays of response times, information about the number and type of response errors, and other criteria of performance used to gauge the success or failure of a load test. Follow these steps to generate an index file, which when opened in your browser of choice, will show all of the relevant JMeter data. Tutorial: Create an empty directory in which to store reports: Start the JMeter test with these options, or run these commands after the fact, to generate the HTML report: Once the test completes, use: jmeter -g <outputfile.jtl/csv> -o <path to output folder for html report>​ To start a test with the correct command for report generation, use this command: jmeter -n -t <test JMX file> -l <outputfile.jtl/csv> -e -o <Path to output folder>​ Running the above commands will generate these files: When the test is complete, the many JMeter client consoles will look like this: Go ahead and close the windows to terminate once they are finished. Optionally you can run multiple tests sequentially using the same jmeter-server windows. Click on the “index.html” file to open the results viewing window:     At any time, modify the settings of this “HTML dashboard” using the details from the JMeter user manual. This citation describes many options for these dashboards, as well as recommendations on how to group and format the results in ways which best convey the success or failure of the test, based on the custom requirements of the application and how granular the view needs to be. Most of the time, the default settings work ok, showing something similar to this: The charts aren’t labeled very well here, so click on the Response Times submenu: This page may take some time to render if there is a lot of data: Next, scroll down to see all the requests that occurred and sort them by how long they took to complete. Anything which took over 5 seconds (or more depending on what is expected) should be investigated as part of the post-test analysis. Does something need to be tuned or optimized? This is how to tell which request is holding things up for your customers.  There is also a chart that shows the overview, grouping the response times by how long they took to demonstrate the health of the system more concretely. Typically, the bars look something like this:  This represents expected behavior, where most of the requests are quite fast, and then there are a few that had errors or took a bit longer. This is pretty typical for web activity. You can also generate the report through the main JMeter client: Give it a results file and an output directory to generate the same index file: There are log files in each of the JMeter client directories called “jmeter-server.log”: These files may show the wrong timezone, but the elapsed times are correct, and they will show when the JMeter clients started, how many threads they ran, which servers were which, and if there were any errors. Not all errors will mean a failed test, so review anything that appears and determine what is expected. Consider designing a batch script to gather all of these logs together, or even analyze them automatically to extract only relevant information.     How to Create Server-Side Results in DynaTrace Collecting data from the environment, including CPU usage, Memory utilization (used vs. total), Garbage Collection times and other metrics of system health on the server, will require the use of an external tool. PTC’s official tool for this is called DynaTrace (PTC System Monitor), shown here. PTC offers a runtime license for DynaTrace to anyone who buys certain products, including Kepware Server, ThingWorx Foundation and Navigate, Windchill, Integrity, and more. Read more information about DevOps on the PTC Community, and stay tuned for more articles on the subject to come from the EDC.   Another option would be something like telegraf and Grafana (from the previous blog post), which facilitate the option to create dashboards around the data output specific to the needs of the application, which can still be monitored even once the application goes live. It can certainly be worth it to use such a tool for monitoring the server-side, but the set-up takes more time. Likewise, many VMs have monitoring faculties for CPU usage and memory utilization built-in, but DynaTrace also has visualization, consolidation of system elements, and other features that make it easy to use right out of the box. See the screenshots below for some examples on how to use DynaTrace, and be sure to review PTC’s full documentation here.   The example shown here is a ThingWorx Navigate system, with Windchill and ThingWorx Foundation set up side-by-side. This chart shows the overall response times of the server-side of the system. JMeter collects the statistics on what the client looks like, while another tool is required to collect the server-side metrics like CPU usage and Memory utilization, things that indicate the health of the VM or computer hosting the clients. An older version of DynaTrace is depicted here, available for free for all ThingWorx customers from the PTC Downloads Site (under various product listings).   In DynaTrace, you can build new dashboards using PurePaths: You can also look at the response times for each service, but be sure to change the response limit to a large number so that all the results are returned. Changing the response limit to a large number to ensure all of the results show in the PurePaths dashboard.   Highlighted here in DynaTrace is the longest service that ran, which in this case took 95 seconds to fully respond: More specific analysis of this service can now begin. Perhaps it needs to be tuned, or otherwise optimized to handle the number of threads, i.e. the number of users. Perhaps the system needs more resources or the VM isn’t large enough for the test. Perhaps more JMeter clients and system resources are required. Something will explain this long response time, and that will inform as to what work might still remain before this system can scale up to the enterprise level.   How to Use the Test Results Load Testing often means scaling the test up a little more each time until the system eventually breaks, or the target performance is reached. Within JMeter, this won’t mean increasing the overall number of threads per one JMeter client, but instead, scaling horizontally to other JMeter clients (as covered in the previous blog post). Now that the remote or distributed clients are configured and the test running, how do we know when the test is beginning to fail?   It turns out that this answer is not a simple one. Which results are considered desirable will vary from one customer to the next based on many factors, and analyzing the test results is a massive topic all on its own. However, there is one thing that any customer would care to review, and that is the response time overview chart found within the JMeter reports. This chart can be used to compare the performance of the majority of threads against a baseline, indicating the point at which the test begins to fail, i.e. the point at which the limits of the system are reached.   The easiest way to determine a good standard response time for a load test, a baseline, is to start with a single JMeter client and record the response times for just 1-5 threads. You can record the response times for individual requests, particularly queries and other services with expected long response times, or the average response times across all requests or groups of requests, if the performance of some mashups are more important than others.   This approach is better than relying on the response times seen in a browser because HTML pages load differently when rendered in a browser, with differing graphical resource requirements than what is requested in JMeter. Note that some customers will also manually record response times within a separate browser-based test scenario during load testing as either a sanity check or as part of their overall benchmarking in order to further validate the scalability of the application, but this wouldn’t involve JMeter given that browsers load things differently and cross-comparison is a bad idea.   Once the baseline response times are established, start increasing the thread counts across the many JMeter clients until you see the response times go up on average. PTC’s standard criteria for load testing is exceeded when the average response times are roughly doubled, or when the system seems overwhelmed with the user load on the server side (which is what to look out for in DynaTrace or the external system monitor). At this point, the application is said to have reached a bottleneck, which could be a simple tuning problem, or it could be saturated by resource requirements. Either way, the bottleneck is proof that the system can’t take any more threads without users beginning to notice and the response times approaching an unreasonable delay.   Other criteria can be used as well, say if any one thread takes more than 5 seconds to respond. Also ensure there are no unexpected errors, as gateway errors represent failed tests too. Sometimes there will be errors even when the test is successful, though, so consider monitoring the error percentage, a column in the Summary Report tab of JMeter, to see what is normal. The throughput column may also be something to monitor. Many watch for increases in throughput as the thread count increases to ensure there is no degradation in performance (which may indicate hardware or sizing constraints).   The Summary Report will look something like this, with thread group results from all of the clients appearing side by side, differentiated from each other by the unique port: Conclusions Generating and reviewing reports within JMeter is straight-forward and easily customizable. Be sure to also monitor the system itself using an external tool like DynaTrace, PTC’s official System Monitor, which has a lot of value considering how easy it is to use out of the box. If the system looks healthy on the server side and the response times are within an acceptable range on the client side, then the application is ready for enterprise use. Be sure to generate a baseline for response times within JMeter, remembering that browsers have different loading processes than JMeter, and not to cross-compare.   This article constitutes the end of the basics. The final article to come will talk about more advanced test design features and best practices, so stay tuned!
View full tip
How to Scale Vertically and Horizontally, and When to Use Sharding Written by Mike Jasperson, VP of IOT EDC   Deployment architecture describes the way in which an IOT application is deployed, or where each of the components are hosted on the network. There are deployment architecture considerations to make when scaling up an application. Each approach to deployment expansion can be described by the “eggs in a basket” analogy: vertical scale is like one person carrying a bigger basket, horizontal scale is like one person carrying more baskets, and sharding is like more than one person carrying the baskets (see below).   All of these approaches result in the eggs getting from point A to point B (they all satisfy the use case), but the simplest (vertical scale) is not necessarily the best. Sure, it makes sense on paper for one person to carry everything in one big basket, but that doesn’t ensure that all of the eggs arrive intact. Selecting the right deployment architecture is a way to ensure the use cases are satisfied in the best and most efficient ways possible, with the least amount of application downtown or data loss.   Vertical Scale - a.k.a. "one person carrying a bigger basket" The most common scalability approach is to simply size the IOT server larger, or scale up the server. This might mean the server is given additional CPU cores, faster CPU clock speeds, more memory, faster disks, additional network bandwidth or improved network cards, and so on. This is a very good idea  when the application logic is increased in complexity, when more data is therefore needed in memory at a time, or when the processing of said data has to occur as quickly as possible. For example, adding additional devices to the fleet increases the size of the “Thing Model” in the process and will require additional heap memory be available to the Foundation server.   However, there are limitations to this approach. Only so many concurrent operations and threads can be performed at once by a single server. Operations trying to read and write to the disks at once can introduce bottlenecks and reduce server performance. Likewise, “one person, one basket” introduces a single-point-of-failure operating risk. If for some reason the server’s performance does degrade or cease altogether, then all of the “eggs” go down with it. Therefore, this approach is important, but usually not sufficient on its own for empowering an enterprise level deployment.   Horizontal Scale - a.k.a. "one person, with two or more baskets" As of ThingWorx version 9.0, Foundation servers can be deployed in clusters, meaning more baskets to carry the eggs. More baskets means that if even one of these servers is active, the application remains up in the event of an individual node failure or maintenance. So clustered deployments are those which facilitate High Availability.   Clustered servers save on some resources, but not others. For instance, every server in a cluster will need to have the same amount of memory, enough to store the entire Thing Model. Each of the multiple baskets in our analogy has to have the same type of eggs. One basket can’t have quail eggs if the rest have chicken eggs. So, each server has to have an identical version of the application, and therefore enough memory to store the entire application.   Also keep in mind that not all application business logic can scale horizontally. Event queues are local to each ThingWorx node, so the events generated within each node are processed locally by that particular node, and not the entire network (examples are timer and scheduler-based activities). Likewise, data ingestion done through an extension or other background process, like MQTT, emits events within a node that therefore must be processed by that particular node, since that's where the events are visible. On the other hand, load distribution that happens external to ThingWorx in either the Connection Server (for AlwaysOn based data coming from ThingWorx SDKs, EMS, eMessage agents, or Kepware) or REST API calls through a load balancer (i.e. user activity) will be distributed across the cluster, facilitating greater scaling potential in terms of userbase and mashup complexity. Also note that batched data will be processed by the node that received it, but different batches coming through a connection server or load balancer will still be distributed.   Another consideration with clusters pertains to failure modes. While each node in the cluster shares a cache for many things, Stream and Value Stream queues are only stored locally. In the event of a node failure, other nodes will pick up subsequent requests, but any activity already queued on the failed node will be lost. For use cases where each and every data point is critical, it is important to size each node large enough (in other words, to vertically scale each node) such that queue sizes are constantly kept low and the data within them processed as quickly as possible. Ensuring sufficient network and database throughput to handle concurrent writes from the many clustered nodes is key as well.   Once each node has enough resources to handle local queues, the system is highly available with low risk for outages or data loss. However, when multiple use cases become necessary on single deployments, horizontal scaling may no longer be enough to ensure things run smoothly. If one use case is logic-heavy, something non-time-critical which processes data for later consumption, it can use too many resources and interfere with other, lighter but more time-critical use cases. Clustering alone does not provide the flexibility to prioritize specific operations or use cases over others, but sharding does.   Sharding - a.k.a. "more than one person carrying baskets" “Sharding” generally refers to breaking up a larger IOT enterprise implementation into smaller ones, each with its own configuration and resources. More server maintenance and administration may be required for each ThingWorx implementation, but the reduction in risk is worth it. If each of the use cases mentioned above has its own implementation, then any unexpected issues with the more complex, analytical logic will not affect the reaction time of operators to time-sensitive matters in the other use case. In other words, “don’t keep all your eggs in one basket”.   The best places to break up an implementation lie along logical boundaries already accepted by the business. Breaking things down in other ways might look nice on paper, but encouraging widespread adoption in those cases tends to be an uphill battle.   In connected products use cases, options for boundaries could be regional, tied more towards business vertical, or centered around different products or models.  These options can be especially beneficial when data needs to stay in particular countries or regions due to regulatory requirements. In connected operations use cases, the most common logical boundaries would be site-based, with smaller IOT implementations serving just a smaller number of related factories in a particular area. Use-case or product-line boundaries can also make sense here, in-line with the above comments about keeping production-critical or time-sensitive use cases isolated from interference from business-support and analysis use cases.     Ideally, a shard model will put the IOT implementation “closer” to both the devices communicating with it and the users that interact with the data. This minimizes the amount of data to be sent or received over long distances, reducing the impact from bandwidth and latency on performance. When determining which approach is best, consider that smaller, more focused implementations offer more flexibility, but are harder to manage. Having different versions of the same applications deployed in multiple places can easily become a maintainability nightmare. It’s therefore best not to combine a regional model with a use case model when it comes to determining sharding boundaries. Also consider using deployment automation tools like Solution Central. These enable tracking and managing version-controlled deployments to multiple IOT implementations, whether they are deployed on-premise, or in the cloud, giving one central location of all source code.   Another benefit to sharding is the focused investment of server resources in a more targeted way. For instance, if one region is larger than another, it may need more CPU and memory. Or, perhaps only part of an application requires High Availability, the time-critical use cases which are best suited to small, clustered deployments. The larger, analysis-centered use cases can then remain non-clustered (but still vertically scaled of course). Sharding can also make access control simpler, as those who need access to only one region or use case can just be given a user account on that particular shard.   However, certain use cases need data from more than one shard in order to operate, turning the data storage and access control benefits into challenges. Luckily, ThingWorx has an excellent toolkit for overcoming such issues. For one thing, REST API calls are readily available in ThingWorx, allowing each shard to exchange information with each other, as well as other enterprise data systems, like ERP or Service Ticketing. Sometimes, lower-level data replication strategies are the way to go, say if downsampling or data transfer from one store to another is necessary, and built-in database tools can more easily handle the workload. Most of the time, however, REST API calls are used to define the business logic within the application layer so that copying data actively between shards is unnecessary, using fewer resources to control what information is shared overall.   There are several design approaches for REST API communication between shards, the two most common being the “peer” model and the “layered” model. In a peer model, one shard may call upon another shard using REST whenever it needs more information. In a layered model, there are “front-line” shards which handle most (if not all) of the device communication and time-critical use cases, things which require only the information in one shard to operate. Then there are also “back-line” shards that aggregate data from the many front-line shards, performing any operations that are less time-sensitive and more complex or analytical.   For any of these approaches, it remains important to keep your data archival and purge strategy in mind. It is a best practice in ThingWorx to only retain as much data as is absolutely necessary, purging the rest periodically. If the front-line shards only ever need the last 7 days of raw data for 5 properties, plus the last 52 weeks of min/max/average data, then implementing an approach where each shard computes the min/max/average values and then archives the older data to a shared “data lake” before purging it would be ideal. This data lake then serves as the data store for all back-line shard operations.   There is also the option to consider sharing some infrastructure between ThingWorx instances when using sharding in a deployment, which can create more flexible, scalable architectures, but can also introduce issues where more than one shard is affected when issues occur on only one shard. For instance, a common shared infrastructure piece is at the database level; each ThingWorx instance needs its own database, but a single database server instance (such as a PostgreSQL HA cluster) could serve separate database namespaces to multiple ThingWorx instances. This is an attractive option where an existing enterprise-scale database infrastructure with experienced DBAs is already in place.   Similarly, load balancers can often be configured to manage load for multiple servers or URLs. If properly configured, an experienced load balancer could direct traffic for multiple applications, but it can also create a bottleneck for inbound connections if not properly sized. Load balancers designed for High Availability can also be considered. Apache Zookeeper is another tool often deployed once for an entire cluster to monitor the health and availability of individual components, or to vote them in or out of operations if problems are detected. With all of these options, remember that sharing infrastructure increases the chances of sharing issues from one ThingWorx system to the next, which can reduce the overall infrastructure complexity at the cost of increased administrative complexity.   Bringing it All Together Vertical and Horizontal scale are both effective ways to add more capacity and availability to software implementations, but there are typically some diminishing returns in the investment of additional infrastructure. For example, consider two large, vertically-scaled implementations – one running on a VM with 64 vCPUs and 256 GiB RAM, and another running on a VM with 96 vCPUs and 384GiB of RAM. While the 96-core server has 1.5 times the compute capacity, in sizing tests with 1 million simulated assets, these two systems tend to fall behind on WebSocket execution at approximately the same point.     In a horizontal scale example, with two nodes each sized the same (64 vCPU and 256GiB RAM), one would expect High Availability to occur, where one node picks up the other’s slack in a failover scenario. However, what if that singular node can’t handle the entire workload? Should both machines be sized vertically such that either can take on the full load, and if so, then what is the cost-benefit of that? It would be less expensive in this case to have a third server.     Optimizing the deployment architecture for a ThingWorx application will therefore usually involve a blended approach. With more than two nodes, High Availability is more readily obtained, as two servers can almost certainly share the load of the third, failed node. Likewise, some workload aspects do not scale well until multiple additional nodes are added. For instance, spreading out the user load from mashup requests across multiple nodes to give the singleton more resources for the tasks it alone can perform doesn’t have much benefit if there are just two nodes.   However, with horizontal scaling alone, the servers may still need to be vertically scaled larger than is ideal in terms of cost. Each one has to hold the entire Thing Model in memory, which means that sometimes, some of the nodes may be oversized for the tasks actually performed there. Sharding allows for each node to have a different Thing Model as necessary based around what boundaries are selected, which can mean saving on costs by sizing each server only as large as it really needs to be.   So, a combination of approaches is typically the best when it comes to deployment architecture. The key is to break things up as much as possible, but in ways that make sense. Determine where the boundaries of the shards will be such that each machine can be as light and focused as possible, while still not introducing more work in terms of user effort (having to access two system to get the job done), application development (extra code used to maintain multiple systems or exchange information between them), and system administration (monitoring and maintaining multiple enterprise systems).   Find the right balance for your systems, and you’ll maximize your cost-benefit ratio and get the most out of your ThingWorx application. Happy developing!
View full tip
ThingWorx DevOps with Azure: The Comprehensive DevOps Guide Written by Tori Firewind, IoT EDC   As promised in a previous post,  attached here is a comprehensive guide to DevOps in ThingWorx, including tutorials and instructions for creating a continuous integration, continuous deployment (CI/CD) process for application development.  There are also updated scripts and entities attached, including an entire sample application for importing, exporting, and testing an application in ThingWorx. From Docker and Github to Azure DevOps and Solution Central, this guide has it all. Learn how to perform your role in the DevOps process whether an administrator or a developer, automate your deployments and testing, and create a more efficient process  for publication changes to production.  A complete DevOps process like this really does facilitate faster and easier updates with fewer risks, fewer delays, and a better pathway to success.  
View full tip
Remote Monitoring of Assets in Connected Factories   As stated in the previous reference benchmark, one of the missions of the IOT Enterprise Deployment Center (EDC) is to showcase how real-world IOT business problems are solved. Our goal is that these benchmarks can be used as a reference or baseline for architects working on their own implementations, showing not only a successful at-scale implementation, but also what happens when that same implementation is pushed to, or even past, it's limits.   The second in this series is attached here, this time reflecting a Connected Factory implementation. ThingWorx was deployed alongside Kepware Server, with the numbers of things, the number of properties, and the write rate for those properties being varied to once again test the capabilities of a remote monitoring use case, but this time in a Connected Factory setting. The business logic was kept simple to ensure it was not the limiting factor, as the throughput between Kepware Server and ThingWorx was pushed to the limit. See first hand the capabilities of Kepware Server and ThingWorx Foundation to handle implementations centered around real-time data reporting   More Connected Factory implementations will be added to this document in time, with multiple Kepware Server deployments and other scenarios to come. Please feel free to use this community post to ask any questions about our approach and discuss any design, deployment, and simulation factors. 
View full tip
Announcing the Final Installment   JMeter for ThingWorx, the Comprehensive Guide and Best Practice Tips This is the final post on using JMeter for ThingWorx. Below there are best practice tips for using JMeter and for load testing in general. Attached to this post is a comprehensive guide including all of the information from every post we've made on JMeter, including the tutorials. For a more central source, feel free to download the guide , or see the past posts here: JMeter for ThingWorx (original post) Building More Complex Tests in JMeter Distributed Testing with JMeter Generating and Reviewing JMeter Results   JMeter Best Practice Tips Use Distributed Testing As already mentioned in a previous post, each JMeter client can only handle about 150-250 threads depending on the complexity of the tests, and each client will need around 1 CPU and up to 8 GB of RAM for the Java heap. Some test plans will run with fewer host resources, so resizing the test client VM up or down is often required during test development. Create a batch or shell script to start the multiple JMeter clients for greater ease of use. Use Non-Graphical Mode Non-graphical mode allows the system to scale up higher; client processing uses up resources just to keep the simulation running, but with graphical mode turned off, there is less of an impact on the response times and other results. Graphical mode is essentially only used for debugging. Turn off Embedded Resources This setting reloads all of the typically cached requests over and over; there will be far more download requests, and to the exclusion of other requests, than is helpful. Ensure this box is not checked, especially in the HTTP Requests Defaults element:   Browser caching means that this setting doesn’t actually simulate a proper user load, given that many of the reloaded resources would not be reloaded by actual users. Use this incrementally, for one or two HTTP requests only, if there is a reason why those requests might need to download fresh images, scripts, or other resources with each call; for instance, simulate page timeouts using this once per hour or something similar. Using this across the whole project will prevent it from scaling well, while not actually simulating real-world conditions. Avoid Using Listeners For instance, the “View Results Tree”, which uses additional resources that may impact the results in disingenuous ways, based around the needs of the clients themselves and not the actual response times of the server. Many listeners are only for debugging a handful of threads while designing the tests. A list of recommended listeners for different purposes is in JMeter documentation. Summary Report is the only one you want enabled, as that exports the results as a csv or similarly formatted file, which can then be used to build reports. JMeter CAN handle SSO JMeter can authenticate into and test an SSO-enabled system. Sometimes the SSO configuration is essential for customers, and they may be quick to assume therefore that they cannot use JMeter, but that's not entirely true. Some external tools that might help with this are BlazeMeter (mentioned again in just a moment) and Fiddler, a good tool for decoding what data a particular SSO setup is exchanging during the authentication process. Use Logic Controllers for Parametrization Parametrization is critical to mirroring a proper user load, and allowing different data sets to be queried or created; the load should seem organic, random in the right ways, with actions occurring at random times, not predictable times, to prevent seeing artificial peaks of usage that don’t represent real usage of the Foundation server. Random order controllers direct the threads down different paths based on random dice rolls, allowing for a randomized collection of user activity each time, not something that has to be regenerated like a set of Boolean values that is specified in an input CSV and used to navigate a series of true or false switches. Switches just look for an environment variable to be either 1 or 0, and when it hits a switch that’s a 1, it triggers the switch below, running them in the order given under the transaction controller that goes with the switch. In this image, the 1’s and 0’s are given in the CSV input file; randomizing that input file therefore randomizes the execution of the switches too:   Use Commercial Add-Ons There are many external, add-on tools and plugins which enhance JMeter’s capabilities. One external tool that can enhance JMeter’s capabilities is Blazemeter, which has some free and some paid options to help create better reports, removing automatically much of the “garbage” REST calls (which would otherwise need to be manually deleted), and provide more consumable test reports right out of the box. Other tools and plugins include: Maven Netbeans SonarQube Jenkins Autometer Gradle Amazon EC2 Lightning IntelliJ IDEA Cassandra Grafana For more best practice information, see the JMeter Best Practice Manual.   General Load Testing Guidelines Concurrency Requirements – How to Properly Estimate the Size of the Load Test Take a brand new ThingWorx-based app. How people will be accessing the system and how often? How many are business users? How many are engineers? What do they do? Many assume that every named user in the corporate LDAP will need to access to the server, often 10s of thousands of users; this generally drastically oversizes the system. Load testing for many thousands of users is very hard and requires a lot of set-up, tuning, and optimization to get right; so if it seems that thousands of users are expected, then validating this claim is important: most customers don’t really have that many concurrent users in an engineering system. Use estimates based on how many people work at which offices, which time zones those people are in, and what kinds of users they are. Do they need access to engineering data? Perhaps there are simpler mashups for them that uses less resources. One tool for these sorts of estimations that PTC offers is the office time zone overlap Windchill Sizing Calculator (shown here) Other ways to estimate include: Analyzing the business processes, things like how long workloads typically take to complete and how many workloads are generated per day, converted into hour, minute, or second as desired for the peak duration, the length of the test. “Day in the Life” modelling, or considering things like “what does user X do in a day?” Maybe, user X checks out some drawings, edits them, and then checks them back in at 4:30. Maybe user Y actually digs into the underlying parts and assemblies, putting in change requests or orders throughout the day, instead of waiting for the end. Models are made based around the types of users. Also consider: What are worst case scenarios? What are the longest running activities? What produces the largest data transfers? What activities have large, heavy data base queries? When is the peak overlap of usage? Beginning and end of day downloads and check ins? Reports that are generated regularly? How do these impact the foreground users? For a simpler estimate, start with a percentage of the named user count, anywhere from 5-15% is a good ballpark percentage. Don’t overestimate to feel like the application has been financially worth it; even if everyone is logged in and using it all at once, which is unlikely, load testing for every single user doesn’t take into account the fact that people pause in between clicking on things to think, type emails, get coffee, and so forth. Fewer people than expected are actually doing concurrent activities like loading web pages and updating data streams. Whenever possible, use concurrency data from existing customer systems to guide the estimate for the new system. Legacy system are great places to start.   Use Grafana to monitor the system side throughout the load test, which is also required to know the test has been successful; also set up Grafana to monitor the application once it goes live, to both prevent and mitigate more rapidly any technical issues with the server. Also remember that PTC Technical Support is here to help! Provide thread dumps with an open case to any TSE, and they will help troubleshoot the tests and review any errors in the ThingWorx or Tomcat logs.    
View full tip
Setting Up the Azure Load Balancer with a ThingWorx High Availability Deployment Purpose In this post, one of PTC’s most experienced ThingWorx deployment architects, Desheng Xu, explains the steps to configure Azure Load Balancer with ThingWorx when deployed in a High Availability architectural model.   This approach has been used successfully on customer implementations for several ThingWorx 7.x and 8.x versions. However, with some of the improvements planned for ThingWorx High Availability architecture in the next major release, this best practice will likely change (so keep an eye out for updates to come).   Azure Load Balancer The overview article What is Azure Load Balancer? from Microsoft will give you a high-level understanding of load balancers in general, as well as the capabilities and limitations of Azure Load Balancer itself. For our purposes, we will leverage Azure Load Balancer's capability to manage incoming internet traffic to ThingWorx Platform virtual machine (VM) instances. This configuration is known as a Public Load Balancer.   Important Note: Different load balancers operate at different “layers” of the OSI Model. Azure Load Balancer operates at Layer 4 (Transport Layer) – it is indifferent to the specific TCP Payload. As a result, you must either configure both the front-end and back-end to work on SSL, or configure both of them to work on non-SSL communications. “SSL Termination” or “TLS Offload” is not supported by Azure Load Balancer.   Azure offers multiple different load balancing solutions. If you need some guidance on choosing the right one for you, I highly recommending reviewing the Microsoft DevBlog post Azure Load Balancing Solutions: A guide to help you choose the correct option.   High-Level Diagram: ThingWorx High Availability with Azure Load Balancer To keep this article focused, we will not go into the setup of ThingWorx in a High Availability architecture. It will be assumed that ThingWorx is working correctly and the ZooKeeper cluster is managing failover for the Platform instances as expected. For more details on setting up this configuration, the best place to start would be the High Availability Administrator’s Guide.   Planning In this installation, let's assume we have following plan (you will likely need to change these values for your own implementation): Azure Load Balancer will have a public facing domain name: edc.ptc.io Azure Load Balancer will have a public IP: 41.35.22.33 ThingWorx Platform VM instance 1 has a local computer name, like: vm1 ThingWorx Platform VM instance 2 has a local computer name, like: vm2   ThingWorx Preparation By default, the ThingWorx Platform provides a healthcheck end point at /Thingworx/Admin/HA/LeaderCheck , which can only be accessed with a credential configured in platform-settings.json : "HASettings": { "LoadBalancerBase64EncodedCredentials":"QWRtaW5pc3RyYXRvcjphZG1pbg==" } However, Azure Load Balancer does not permit this Health Check with a credential with current versions of ThingWorx. As a workaround, you can create a pings.jsp (using the attached JSP example code) in the Tomcat folder $CATALINA_HOME/webapps/docs . This workaround will no longer be needed in ThingWorx 8.5 and newer releases.   There are two lines that likely need to be modified to meet your situation: The hostname in final String probeURL (line 10) must match your end point domain name. It's edc.ptc.io in our example, don’t forget to replace this with your real hostname! You also need to add a line in your local hosts file and point this domain name to 127.0.0.1 . For example: 127.0.0.1 edc.ptc.io The credential in final String base64EncodedCredential (line 14) must match the credential configured in platform-settings.json. Additionally: Don't forget to make the JSP file accessible and executable by the user who starts Tomcat service for ThingWorx. These changes must be applied to both ThingWorx Platform VM instances. Tomcat needs to be configured to support SSL on a specific port. In this example, SSL will be enabled on port 8443. Please make sure similar configuration is included in $CATALINA_HOME/conf/server.xml <Connector protocol="org.apache.coyote.http11.Http11NioProtocol" port="8443" maxThreads="200" scheme="https" secure="true" SSLEnabled="true" keystoreFile="/opt/yourcertificate.pfx" keystorePass="dontguess" clientAuth="false" sslProtocol="TLS" keystoreType="PKCS12"/> The values in keystoreFile and keyStorePass will need to be changed for your implementation. While pkcs12 format is used in above example, you can use a different certificate formats, as long as it is supported by Tomcat (example: jks format). All other parameters, like maxThreads , are just examples - you should adjust them to meet your requirements.   How to Verify Before configuring the load balancer, verify that health check workaround is working as expected on both ThingWorx Platform instances. You can use following command to verify: curl -I https://edc.ptc.io:8443/docs/pings.jsp The expected result from active node should look like: HTTP/1.1 200 There will be three or more lines in output, depending on your instance configuration but you should be able to see the keyword: HTTP/1.1 200.   Expected result from passive node should look like: HTTP/1.1 503   Load Balancer Configuration Step 1: Select SKU Search for “load balancer” in the Azure market and select Load Balancer from Microsoft Verify the correct vendor before you create a Load Balancer. Step 2: Create load balancer To create a proper load balancer, make sure to read Microsoft’s What is Azure Load Balancer? overview to understand the differences between “basic” and “standard” SKU offerings. If your IT policy only requires SSL communication to the outside but doesn't require a SSL communication in a health probe, then the “basic” SKU should be adequate (not considering zone redundancy). You have to decide following parameters: Region Type (public or Internal) SKU (basic or standard) IP address Public IP address name Availability zone PTC cannot provide specific recommendations for these parameters – you will need to choose them based on your specific business needs, or consult Microsoft for available offerings in your region.   Step 3: Start to configure Once a load balancer is successfully created by Azure, You should be able to see:   Step 4: Confirm frontend IP Click frontend IP configuration at left side and you should be able to see public IP address configuration. Please make sure to register this IP with your domain name ( edc.ptc.io in our example) in your Domain Name Server (DNS). If you unfamiliar with DNS configuration, you should consult with the administrator of your DNS server. If you are using Azure DNS, this Quickstart article on creating Azure DNS Zones and records may help.   Step 5: Configure Backend pools Click Backend pools and click “Add” to add a backend pool definition. Select a name for your Backend pool (using ThingworxBackend in our example). Next step is to choose Virtual network.   Once you select Virtual network, then you can choose which VM (or VMs) you want to put behind this load balancer. The VM should be the ThingWorx VM instance.   In a high availability architecture, you will typically need to choose two instances to put behind this load balancer.   Please Note: The “Virtual machine status” column in this table only shows VM status, but not ThingWorx status. ThingWorx running status will be determined by the health probe configured in the next step.     Step 6: Configure Health Probe Health Probe will be used to determine the ThingWorx Platform’s running status. When a ThingWorx Platform instance is running as the leader, then it will give HTTP status code 200 during a health probe. The Azure Load Balancer will rely on this status code to determine if the platform is running properly.   When a ThingWorx platform VM is not responding, offline, or not the leader in a High Availability setup, then this health probe will provide response with a different HTTP status code other than 200.   For the health probe, select HTTPS for the protocol. In our example port 8443 is used, though another port can be selected if necessary. Then, provide the “/docs/pings.jsp” we created earlier as the probe’s path. You may need to change this path value if you put this file in a different location.   Step 7: Configure Load balancing rules. Select “Load balancing rules” from left side and click “Add”   Select TCP as protocol, in our example we are using 443 as front-end port and 8443 as back-end port. You can choose other port numbers if necessary.   Reminder: Azure Load Balancer is a layer 4 (Transport Layer) router – it cannot differentiate between HTTP or HTTPS requests. It will simply forward requests from front-end to back-end, based on port-forwarding rules defined.   Session persistence is not critical for current versions of ThingWorx as only one active node is currently permitted in a High Availability architecture. In the future, selecting Client IP may be required to support active-active architectures.   Step 8: Verify health probe Once you complete this configuration, you can go to the $CATALINA_HOME/logs folder and monitor latest local_access log. You should see similar entries as pictured below - HTTP 200 responses should be observed from the ThingWorx leader node, and HTTP 503 responses should be observed from the ThingWorx passive node. In the example below, 168.63.129.16 is the internal IP Address of the load balancer in the current region.   Step 9: Network Security Group rules to access Azure Load Balancer On its own, Azure Load Balancer does not have a network access policy – it simply forwards all requests to the back-end pool. Therefore, the appropriate Network Security Group associated with the backend resources within the resource group should have a policy to allow access to the destination port of the backend ThingWorx Foundation server (shown as 8443 here, for example). The following image displays an inbound security rule that will accept traffic from any source, and direct it to port 443 of the IP Address for the Azure Load Balancer.   Enjoy!! With the above settings, you should be able to access ThingWorx via: https://edc.ptc.io/Thingworx (replacing edc.ptc.io with the hostname you have selected).   Q&A Can I configure the health probe running on a port other than the traffic port (8443) in this case? Yes – if desired you can use a different port for the health probe configuration.   Can I use different protocol other than HTTPS for health probe? Yes – you can use different protocol in the health probe configuration, but you will need to develop your own functional equivalent to the pings.jsp example in this article for the protocol you choose.   Can I configure ZooKeeper to support the health probe? No – the purpose of the health probe is to inform the Load Balancer which node is providing service (the leader), not to select a leader. In a High Availability architecture, ZooKeeper determines which VM is the leader and talking with the database. This approach will change in future releases where multiple ThingWorx instances are actively processing requests.   How well does Azure Load Balancer scale? This question is best answered by Microsoft – as a starting point, we recommend reading the DevBlog post: Azure Load Balancing Solutions: A guide to help you choose the correct option.   How do I access logs for Azure Load Balancer? This question is best answered by Microsoft – as a starting point, we recommend reviewing the Microsoft article Azure Monitor logs for public Basic Load Balancer.   Do I need to configure specifically for Websocket and/or AlwaysOn communication? No – Azure Load Balancer is a Layer 4 (Transport Protocol) router - it only handles TCP traffic forwarding.   Can I leverage this load balancer to access all VMs behind it via ssh? Yes – you could configure Inbound NAT rules for this. If you require specific help in configuring this, the question is best answered by Microsoft. As a starting point, we recommend reviewing the Microsoft tutorial Configure port forwarding in Azure Load Balancer using the portal.   Can I view current health probe status on a portal? No – Unfortunately there is no current approach to do this with Azure Load Balancer.
View full tip
Introduction to Digital Performance Management (DPM) Written by: Tori Firewind, IoT EDC   “Digital Performance Management (DPM) is a closed-loop, problem solving solution that helps manufacturers identify, prioritize, and solve their biggest loss challenges, resulting in reduced cost, increased revenue, and improved service levels.” – DPM Help Center What is DPM? Digital Performance Manager (DPM) is an application which improves factory efficiency across a variety of different areas, namely “the four P’s” of Digital Transformation: products, processes, places, and people. Each performance issue in a factory can be mapped to at least one of these improvement categories in a new strategy for Continuous Improvement (CI) founded by PTC.   Figure 1 – Each performance issue in a factory can be mapped to at least one of 4 fundamental improvement categories: products, processes, places, and people. PTC’s new, industry-leading strategy for continuous improvement (CI) in factories is a “best practice” approach, taking the collective knowledge of many customers to form a focused, prescriptive path for success. 11 Closing the Loop Across Products, Processes, People, and Places, Manufacturing Leadership Journal   At PTC, CI in factories is driven by a “best practice”  approach, with years of experience in manufacturing solutions combining with the collective knowledge of the many diverse use cases PTC has encountered, to generate a focused, prescriptive path for improvement in any individual factory. Figure 2 – DPM is a closed loop for continuous improvement, a strategy built around industry standard best practices and years of experience.  PTC is also defining new industry standards for OEE analysis by using time as a currency within DPM. This standardization technique improves intuitive impact assessment and allows for direct comparison of metrics (see the Help Center for details on how each metric is calculated).   DPM creates a closed loop for CI, from the monitoring phase performed both automatically and through manual operator input, to the prioritization and analyzation phases performed by plant managers. DPM helps plant managers by tracking metrics of factory performance that often go overlooked by other systems. With Analytics, DPM can also do much of the analysis automatically, finding the root causes much more rapidly. Figure 3 – All levels of the company are involved in solving the same problems effectively and efficiently with DPM. Instead of 100 people working on 100 different problems, some of which might not significantly improve OEE anyway, these same 100 people can tackle the top few problems one at a time, knocking out barriers to continuous improvement together. Production supervisors who manage the entire production line then know which less-than-effective components on the line need help. They can quickly design and redesign solutions for specific production issues. Task management within DPM helps both the production manager and the maintenance engineer to complete the improvement process. Using other PTC tools like Creo and Vuforia make the path to improvement even faster and easier, requiring less expert knowledge from the front-line workers and empowering every level of participation in the digital transformation process to make a direct, measurable impact on physical production.       How Does DPM Work? DPM as an IoT application sits on top of the ThingWorx Foundation server, a platform for IoT development that is extensible and customizable. Manufacturers therefore find they rarely have to rip and replace existing systems and assets to reap the benefits of DPM, which gathers, aggregates, and stores production data (both automatically and through manual input on the Production Dashboard), so that it can be analyzed using time as a currency. DPM also manages the process of implementing improvements (using the Action Tracker) based on the collected data, and provides an easy way to confirm that the improvements make a real difference in the overall OEE (through the Performance Analysis Dashboard). Because the analysis occurs before and after the steps to improve are taken, manufacturers can rest assured that any resources invested on the improvements aren’t done so in vain; DPM is a predictive and prescriptive analysis process.   DPM makes use of an external SQL Server to run queries against collected data and perform aggregation and analysis tasks in the background, on a separate server location than the thing model and ingestion database. This ensures that use cases involving real-time alerts and events, high-capacity ingestion, or others are still possible on the ThingWorx Foundation server.   The IoT EDC is focusing in on DPM alone for a series of  technical briefs which provide insight and expert level recommendations regarding DPM usage and configuration.  Stay tuned into the PTC Community for more updates to come.
View full tip
  The IoT Enterprise Deployment Center’s goal is to create and share knowledge around the best practices for architecting, designing, and deploying successful, enterprise-scale Thingworx IoT Solutions.    To accomplish this goal, the EDC team takes a “real world” approach, using simulated IoT assets and users to benchmark the capabilities of different Thingworx deployment configurations. First, each implementation is pushed to its limit in an effort to establish real-world baselines, metrics which can be used to help customers determine which architecture choices will work for their custom needs. Then, each implementation is pushed beyond its limits, providing useful insight into where and why things fail, and illuminating potential implementation changes which could push the boundaries further.   Through the simulations testing to come, the EDC will be publishing the resulting benchmarks for all to see! These benchmarks will include details on implementation goals and performance metrics for different stages of deployment. Additionally, best-practice articles which illustrate how to deploy the different architectural components (those referenced within the benchmarks) will also be posted, highlighting the optimal approach to integrating everything into the Thingworx platform.   Stay tuned to see more about just how versatile the ThingWorx Platform can be! We look forward to discussing these findings as they are published right here on the PTC Community. 
View full tip