IoT Tips

Load Testing through C SDK Remote Device Simulation in ThingWorx As discussed in the EDC's previous article, load or stress testing a ThingWorx application is very important to the application development process and comes highly recommended by PTC best practices. This article will show how to do stress testing using the ThingWorx C SDK at the Edge side. Attached to this article is a download containing a generic C SDK application and accompanying simulator software written in python. This article will discuss how to unpack everything and move it to the right location on a Linux machine (Ubuntu 16.04 was used in this tutorial and sudo privileges will be necessary). To make this a true test of the Edge software, modify the C SDK code provided or substitute in any custom code used in the Edge devices which connect to the actual application. It is assumed that ThingWorx is already installed and configured correctly. Anaconda will be downloaded and installed as a part of this tutorial. Note that the simulator only logs at the "error" level on the SDK side, and the data log has been disabled entirely to save resources. For any questions on this tutorial, reach out to the author Desheng Xu from the EDC team (@DeShengXu). Background: Within ThingWorx, most things represent remote devices located at the Edge. These are pieces of physical equipment which are out in the field and which connect and transmit information to the ThingWorx Platform. Each remote device can have many properties, which can be bound to local properties. In the image below, the example property "Pressure" is bound to the local property "Pressure". The last column indicates whether the property value should be stored in a time series database when the value changes. Only "Pressure" and "TotalFlow" are stored in this way. A good stress test will have many properties receiving updates simultaneously, so for this test, more properties will be added. An example shown here has 5 integers, 3 numbers, 2 strings, and 1 sin signal property. Installation: Download Python 3 if it isn't already installed Download Anaconda version 5.2 Sometimes managing multiple Python environments is hard on Linux, especially in Ubuntu and when using an Azure VM. Anaconda is a very convenient way to manage it. Some commands which may help to download Anaconda are provided here, but this is not a comprehensive tutorial for Anaconda installation and configuration. Download Anaconda curl -O https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh Install Anaconda (this may take 10+ minutes, depending on the hardware and network specifications) bash Anaconda3-5.2.0-Linux-x86_64.sh To activate the Anaconda installation, load the new PATH environment variable which was added by the Anaconda installer into the current shell session with the following command: source ~/.bashrc Create an environment for stress testing. Let's name this environment as "stress" conda create -n stress python=3.7 Activate "stress" environment every time you need to use simulator.py source activate stress Install the required Python modules Certain modules are needed in the Python environment in order to run the simulator.py file: psutil, requests. Use the following commands to install these (if using Anaconda as installed above): conda install -n stress -c anaconda psutil conda install -n stress -c anaconda requests Unpack the download attached here called csim.zip Unzip csim.zip and move it into the /opt folder (if another folder is used, remember to change the page in the simulator.json file later) Assign your current user full access to this folder (this command assumes the current user is called ubuntu ) : sudo chown -R ubuntu:ubuntu /opt/csim Move the C SDK source folder to the lib folder Use the following command: sudo mv /opt/csim/csdkbuild/libtwCSdk.so.2.2.4 /usr/lib You may have to also grant a+x permissions to all files in this folder Update the configuration file for the simulator Open /opt/csim/simulator.json (or whatever path is used instead) Edit this file to meet your environment needs, based on the information below Familiarize yourself with the simulator.py file and its options Use the following command to get option information: python simulator.py --help Set-Up Test Scenario: Plan your test Each simulator instance will have 8 remote properties by default (as shown in the picture in the Background section). More properties can be added for stress test purposes in the simulator.json file. For the simulator to run 1k writes per second to a time series database, use the following configuration information (note that for this test, a machine with 4 cores and 16G of memory was used. Greater hardware specifications may be required for a larger test): Forget about the default 8 properties, which have random update patterns and result in difficult results to check later. Instead, create "canary properties" for each thing (where canary refers to the nature of a thing to notify others of danger, in the same way canaries were used in mine shafts) Add 25 properties for each thing: 10 integer properties 5 number properties 5 string properties 5 sin properties (signals) Set the scan rate to 5000 ms, making it so that each of these 25 properties will update every 5 seconds. To get a writes per second rate of 1k, we therefore need 200 devices in total, which is specified by the start and end number lines of the configuration file The simulator.json file should look like this: Canary_Int: 10 Canary_Num: 5 Canary_Str: 5 Canary_Sin: 5 Start_Number: 1 End_Number: 200 Run the simulator Enter the /opt/csim folder, and execute the following command: python simulator.py ./simulator.json -i You should be able to see a screen like this: Go to ThingWorx to check if there is a dummy thing (under Remote Things in the Monitoring section): This indicates that the simulator is running correctly and connected to ThingWorx Create a Value Stream and point it at the target database Create a new thing and call it "SimulatorDummyThing" Once this is created successfully and saved, a message should pop up to say that the device was successfully connected Bind the remote properties to the new thing Click the "Properties and Alerts" tab Click "Manage Bindings" Click "Add all properties" Click "Done" and then "Save" The properties should begin updating immediately (every 5 seconds), so click "Refresh" to check Create a Thing Template from this thing Click the "More" drop-down and select "Create ThingTemplate" Give the template a name (ensure it matches what is defined in the simulator.json file) and save it Go back and delete the dummy thing created in Step 4, as now we no longer need it Clean up the simulator Use the following command: python simulator.py ./simulator.json -k Output will look like this: Create 200 things in ThingWorx for the stress test Verify the information in the simulator.json file (especially the start and end numbers) is correct Use the following command to create all things: python simulator.py ./simulator.json -c The output will look like this: Verify the things have also been created in ThingWorx: Now you are ready for the stress test Run Stress Test: Use the following command to start your test: python simulator.py ./simulator.json -l or python simulator.py ./simulator.json --launch The output in the simulator will look like this: Monitor the Value Stream writing status in the Monitoring section of ThingWorx: Stop and Clean Up: Use the following command to stop running all instances: python simulator.py ./simulator -k If you want to clean up all created dummy things, then use this command: python simulator.py ./simulator -d To re-initiate the test at a later date, just repeat the steps in the "Run Stress Test" section above, or re-configure the test by reviewing the steps in the "Set-Up Test Scenario" section That concludes the tutorial on how to use the C SDK in a stress or load test of a ThingWorx application. Be sure to modify the created Thing Template (created in step 6 of the "Set-Up Test Scenario" section) with any business logic required, for instance events and alerts, to ensure a proper test of the application.

Nov 6, 2019

ThingWorx Performance Monitoring with Grafana authored by EDC team member Desheng Xu ( @xudesheng ) Monitoring ThingWorx performance is crucially important, both during the load testing of a newly completed application, and after the deployment of new code in an existing application. Monitoring performance ensures that everything works as expected at the Enterprise level. This tutorial steps you through configuring and installing a tool which runs on the same network as the ThingWorx instance. This tool collects data from the Platform and translates it into something visual and easy to understand via Grafana. tsample is small and customizable, and it plays a similar role to telegraf. Its focus is on gathering ThingWorx performance metrics. Historically, this tool also supported collecting OS level performance metrics, but this is no longer supported. It is highly recommended to collect OS level performance metrics by using telegraf, a tool designed specifically for that purpose (and not discussed here). This is not the only way to go about monitoring ThingWorx performance, but this tool uses a very good approach that has been proven effective both at customer sites and internally by PTC to monitor scale tests. Find the most recent release here. Recommended Deployment Architecture tsample can be deployed in the same box where ThingWorx Tomcat is running, but it's recommended to deploy it on a separated box to minimize any performance impact caused by the collector. tsample supports export to InfluxDB and/or local file. In this document, it is assumed that InfluxDB will be used for monitoring purpose. Please note that this is not the same instance of InfluxDB being used by ThingWorx (if configured). This article will not cover setting up InfluxDB or NGINX (if necessary), so please configure these before beginning this tutorial. Supported Platform tsample has been tested on Windows 2016, MacOS 10.15, Ubuntu 16.04, and Redhat 7.x. It's anticipated to work on a more general Ubuntu/Redhat/Mac/Windows release as well. Please leave a comment or contact the author, @xudesheng , if Raspberry Pi support is needed. Configuration File Where to Store the Configuration File tsample will pick up the configuration file in the following sequence: from the command line... ./tsample -c <path to configuration file> from the environment... Linux: export TSAMPLE_CONFIG=<path to configuration file> ./tsample Windows: set TSAMPLE_CONFIG=<path to configuration file> tsample.exe from a default location... tsample will try to find a file with the name "config.toml " from the same folder in which it starts. How to Craft a Configuration File You can use following command to generate a sample file: ./tsample -c config.toml -e or: ./tsample -c config.toml --export A file with the name "config.toml " will be generated with a sample configuration. You can then adjust its content in accordance with the following. Configuration File Content Format Configuration file must be in toml format. title and owner sections Both sections are optional. The intention of these two sections is to support doc tool in future. TestMachine section This is section is required, and it defines where this tool will run. thingworx_servers section This section is where you define targeted ThingWorx applications. Multiple ThingWorx servers can be defined with the same or different metrics to be collected. thingworx_servers.metrics sections Underneath each thingworx_servers section, there are several metrics. In default example, following metrics have been included: ValueStreamProcessingSubsystem DataTableProcessingSubsystem EventProcessingSubsystem PlatformSubsystem StreamProcessingSubsystem WSCommunicationsSubsystem WSExecutionProcessingSubsystem TunnelSubsystem AlertProcessingSubsystem FederationSubsystem You can add your own customized metrics, as long as the result follows the same Data Shape. The default Data Shape has 3 columns: If the output Data Shape exceeds this limit, the tool will likely not work properly. result_export_to_db section This section defines the target InfluxDB as a sink of collected performance metrics. result_export_to_file section This section defines the target file storage for collected performance metrics. Grafana Configuration Example Monitor Value Stream Step 1. Connect Grafana to InfluxDB Step 2: Create a New Dashboard Step 3. Create a New Query Depending on which metrics you defined to collect in the tsample configuration file, you will see a different choice of measurement in Grafana. Here, we will use ValueStreamProcessingSubsystem as an example. Step 4. Choose the Right Platform and Storage Provider Some metrics depend on the database storage provider, like Value Stream and Stream. Step 5. Choose the Metrics Figures Select "remove" to get rid of the default 'mean' calculation. Select "non_negative_difference" from Transformations. Using this transformation, Grafana can show us the speed of writes. Then, remove the default GROUP BY "time" clause. Assign a meaningful alias of this query. Step 6. Add Another Query You can add another query as 'Value Stream Queued Speed' by following the same steps. Step 7. Assign a Panel Title Step 8. Review the Result Let's go back to the dashboard page and select "last 15 minutes" or "last 5 minutes" from the top right corner. It should show a result similar to the chart below. Step 9. Save the Dashboard Don't forget to save your dashboard before we add more panels. Step 10. Refine the Panel It's difficult to figure out the high-level write speed from the above panel, so let's enhance it. Add a new query with the following configuration: In the above query, there are two additional figures: 20s and 1m... How do you choose? 20s should be the same as sampling_cycle_inseconds in your tsample configuration file. If you choose a different value, then you could end up with misleading results. Larger values such as "1m" may give you a smoother result, but they could also hide system instability. Going larger than 1m is not recommended in most cases. With this new query, it's much easier to figure out what the average write speed in current testing is. Tips: if your sampling_Cycle_inseconds is 30s, then you may not need this additional query. The following image is a sample at the 30s interval time. You would not need an additional average query to get a smooth write speed. The next example is a sample at the 10s interval time. Without additional queries, you may not be able to get a meaningful understanding of the write speed. From the above three examples, it's recommended to configure the sampling interval time at 30s, or anything larger than 20s. You can then choose whether you need additional queries based on the visualization result. Step 11. Further Refinement The above charts illustrate the queuing and writing speed. However, it is possible that the Value Stream may perform at a reasonable speed, but the Value Stream queue may be growing and could exceed its capacity. Let's add another query to monitor this: However, it is difficult to read this chart, since it has a different value range on the y-axis: Let's move this query to a second y-axis on the right: This will make the view much easier to see: The current queue size or remaining queue size will always move up and down; it is healthy as long as it does not continue to grow to a high level. What Else Can Be Monitored? The following metrics would be monitored by most customers: Value Stream write and queue speed Value Stream queue size Stream write and queue speed Stream queue size Event performed speed (completedTaskCount) Event submitted speed (submittedTaskCount) Event queue size Websocket communication Websocket connection ThingWorx Memory Usage Monitoring Create a new panel and add a new query: In a running system, memory usage will always move up and down - at times sharply or quickly - when the system is busy. The system is healthy as long as memory doesn't go up continuously or stay at a maximum for a long period of time. Conclusion Setting up monitoring is absolutely crucial to managing the performance of an enterprise ThingWorx application. Using Grafana makes tracking and visualizing the performance much easier. Stay tuned to the EDC tag for more monitoring tips to come!

Mar 27, 2020

ThingWorx DevOps By Victoria Firewind, IoT EDC This presentation accompanies a recent Expert Session, with video content including demos of the following topics: found here! DevOps is a process for taking planned changes through development, through testing, and into production, where they can be accessed by end users. One test instance typically has automated tests (integration testing) which ensure application logic is preserved in spite of whatever changes the developers are making, and often there is another test instance to ensure the application is usable (UAT testing) and able to handle a production load (load testing). So, a DevOps Pipeline starts with a task manager tasking out planned changes, where each task will become a branch in the repository. Each time a new branch is created, a new pipe is needed, which in this case, is produced by Docker Hub. Developers then make changes within that pipe, which then flow along the pipe into testing. In this diagram, testing is shown as the valve which when open (i.e. when tests all pass) then allows the changes to flow along the pipe into production. A good DevOps process has good flow along the various pipelines, with as much automated or scripted as possible to reduce the chances for errors in deployments. In order to create a seamless pipeline, whether or not it winds up automated, several third party tools are useful: b Container software is a very good way to improve the maintainability and updatability of a ThingWorx instance, while minimizing the amount of resources needed to host each component. n 1. Create Docker Image Consult the Help Center if need be. Update your YML file with everything you need before starting the image: see the example in the PTC community. License the instance using the license management website. Follow the instructions from Docker for installing those tools: Docker itself (docker) and Docker Compose (docker-compose). n 2. Save Docker Image in Docker Repo Docker Hub has some free options, and if a license is purchased, can host more than a single Docker image and tag. It is also possible to set up your own Docker registry. n 3. Access the image in Docker Desktop Download Docker Desktop and sign-in to the Docker account which hosts the repository. Create some folders for storing the h2.env file and the ThingworxStorage and ThingWorxPlatform mounted folders. Remember to license these containers as well. Developers login to the license management site themselves and put those into the ThingWorxPlatform mounted folder (“license_capability_response.bin”). Git is a very versatile tool that can be used through many different mediums, like Azure DevOps or Github Desktop. To get started as a totally new Git user, try downloading Github Desktop on your local machine and create a local repository with the provided sample code. This can then be cloned on a Linux machine, presumably whichever instance hosts the integration ThingWorx instance, using the provided scripts (once they are configured). Remember to install Git on the Linux machine, if necessary (sudo apt-get install git). A sample ThingWorx application (which is not officially supported, and provided just as an example on how to do DevOps related tasks in ThingWorx) is attached to this post in a zip file, containing two directories, one for scripts and one for ThingWorx entities. Copy the Git scripts and config file into the top level, above the repository folder, and update the GitConfig.sh file with the URL for your Git repository and your login credentials. Then these scripts can be used to sync your Linux server with your Git repository, which any developer could easily update from their local machine. This also ensures changes are secure, and enables the potential use of other DevOps procedures like tasks, epics, and corresponding branches of code. Steps to DevOps using the provided code as an example: Clone the repository into the SystemRepository or any other created repository, use the provided scripts in a Linux environment. Import the DeploymentUtilities entity, which again is scripted for Linux or for use with a development IDE with bash support. Then import the ThingWorx application from source control or use the script (which itself makes use of that DeploymentUtilities entity). Now create some local changes, add things, etc. and try out the UpdateApplication script or export to source control and then push to the Git repo. Data and localization table exports are also possible. Run the tests using the provided IntegrationTester thing or create your own by overriding the IntegrationTestTS thing shape, or use the TestTwxApplication script from a Linux terminal. Design a process for your application which allows for easy application exports and updates to and from a repository, so that developers can easily send in their changes, which can then be easily loaded and tested in another environment. In Conclusion: DevOps is a complex topic and every PTC customer will have their own process based around their unique requirements and applications. In the future, more mature pipeline solutions will be covered, ones that involve also publishing to Solution Central for easier deployment between various testing instances and production.

Sep 30, 2021

Introduction to Digital Performance Management (DPM) Written by: Tori Firewind, IoT EDC “Digital Performance Management (DPM) is a closed-loop, problem solving solution that helps manufacturers identify, prioritize, and solve their biggest loss challenges, resulting in reduced cost, increased revenue, and improved service levels.” – DPM Help Center What is DPM? Digital Performance Manager (DPM) is an application which improves factory efficiency across a variety of different areas, namely “the four P’s” of Digital Transformation: products, processes, places, and people. Each performance issue in a factory can be mapped to at least one of these improvement categories in a new strategy for Continuous Improvement (CI) founded by PTC. Figure 1 – Each performance issue in a factory can be mapped to at least one of 4 fundamental improvement categories: products, processes, places, and people. PTC’s new, industry-leading strategy for continuous improvement (CI) in factories is a “best practice” approach, taking the collective knowledge of many customers to form a focused, prescriptive path for success. 11 Closing the Loop Across Products, Processes, People, and Places, Manufacturing Leadership Journal At PTC, CI in factories is driven by a “best practice” approach, with years of experience in manufacturing solutions combining with the collective knowledge of the many diverse use cases PTC has encountered, to generate a focused, prescriptive path for improvement in any individual factory. Figure 2 – DPM is a closed loop for continuous improvement, a strategy built around industry standard best practices and years of experience. PTC is also defining new industry standards for OEE analysis by using time as a currency within DPM. This standardization technique improves intuitive impact assessment and allows for direct comparison of metrics (see the Help Center for details on how each metric is calculated). DPM creates a closed loop for CI, from the monitoring phase performed both automatically and through manual operator input, to the prioritization and analyzation phases performed by plant managers. DPM helps plant managers by tracking metrics of factory performance that often go overlooked by other systems. With Analytics, DPM can also do much of the analysis automatically, finding the root causes much more rapidly. Figure 3 – All levels of the company are involved in solving the same problems effectively and efficiently with DPM. Instead of 100 people working on 100 different problems, some of which might not significantly improve OEE anyway, these same 100 people can tackle the top few problems one at a time, knocking out barriers to continuous improvement together. Production supervisors who manage the entire production line then know which less-than-effective components on the line need help. They can quickly design and redesign solutions for specific production issues. Task management within DPM helps both the production manager and the maintenance engineer to complete the improvement process. Using other PTC tools like Creo and Vuforia make the path to improvement even faster and easier, requiring less expert knowledge from the front-line workers and empowering every level of participation in the digital transformation process to make a direct, measurable impact on physical production. How Does DPM Work? DPM as an IoT application sits on top of the ThingWorx Foundation server, a platform for IoT development that is extensible and customizable. Manufacturers therefore find they rarely have to rip and replace existing systems and assets to reap the benefits of DPM, which gathers, aggregates, and stores production data (both automatically and through manual input on the Production Dashboard), so that it can be analyzed using time as a currency. DPM also manages the process of implementing improvements (using the Action Tracker) based on the collected data, and provides an easy way to confirm that the improvements make a real difference in the overall OEE (through the Performance Analysis Dashboard). Because the analysis occurs before and after the steps to improve are taken, manufacturers can rest assured that any resources invested on the improvements aren’t done so in vain; DPM is a predictive and prescriptive analysis process. DPM makes use of an external SQL Server to run queries against collected data and perform aggregation and analysis tasks in the background, on a separate server location than the thing model and ingestion database. This ensures that use cases involving real-time alerts and events, high-capacity ingestion, or others are still possible on the ThingWorx Foundation server. The IoT EDC is focusing in on DPM alone for a series of technical briefs which provide insight and expert level recommendations regarding DPM usage and configuration. Stay tuned into the PTC Community for more updates to come.

Feb 28, 2022

Thread Safe Coding, Part 1: The Java Extension Approach Written by Desheng Xu and edited by @vtielebein Overview Time and again, customers report that one of their favorite ThingWorx features is using However, the Javascript language doesn't have a built-in semaphore locker mechanism, nothing to enable thread-safe concurrent processing, like you find in the Java language. This article demonstrates why thread safe coding is necessary and how to use the Java Extension for this purpose. Part 2 presents an alternative approach using database lockers. Demo Use Case Let's use a highly abstracted use case to demo thread-safe code practices: There are tens of machines in a factory, and PLC will emit a signal to indicate an issue happens during run-time. The customer expects to have a dashboard that shows today's total count of issues from all machines in real-time. The customer is also expecting that a timestamp of each issue can be logged (regardless of the machine). Similar use cases might be to: Show the total product counts from each sub-line in the current shift. Show the total rentals of bicycles from all remote sites. Show the total issues of distant cash machines across the country. Modeling Thing: DashboardCounter, which includes: 1 Property: name:counter, type:integer, logged:true, default value:0 3 services: IncreaseCounter(): increase counter value 1 GetCounter(): return current counter value ResetCounter(): set counter value to 0 1 Subscription: a subscription to the data change event of the property counter, which will print the new value and timestamp to the log. GetCounter var result = me.counter; IncreaseCounter me.counter = me.counter + 1; var result = me.counter; ResetCounter me.counter = 0; var result = 0; Subscription MonitorCounter Logger.info(eventData.newValue.value+":"+eventData.newValue.time.getTime()); ValueStream For simplicity, the value stream entity is not included in the attachment. Please go ahead and assign a value stream to this Thing to monitor the property values. Test Tool A small test tool mulreqs is attached here, along with some extensions and ThingWorx entities that are useful. The mulreqs tool uses a configuration file from the OS variable definition MULTI_REQUEST_CONFIG. In Linux/MacOS: export MULTI_REQUEST_CONFIG="./config.json" in config.json file, you can use the following configuration: { "host":"twx85.desheng.io", "port":443, "protocol":"https", "endpoint":"/Thingworx/Things/DashboardCounter/services/IncreaseCounter", "headers":{ "Content-Type":"application/json", "Accept": "application/json", "AppKey":"5cafe6eb-adba-41df-a7d6-4fc8088125c1" }, "payload":{}, "round_break":50000, "req_break":0, "round_size":50, "total_round":20 } host, port, protocol, headers are very identical to define a ThingWorx server. endpoint defines which service is called during the test. payload is not in use at this moment but you have to keep it here. total_round is how many rounds of the test you want to run. round_size defines how many requests will be sent simultaneously during each round. round_break is the pause time during each round in Microseconds, so 50000 in the above example means 50ms. req_break is 0, this is the delay between requests. "0" means requests to the server will happen simultaneously. The expectation from the above configuration is service execution a total of 20*50 times, 1000 times. So, we can expect that if the initial value is 0, then counter should be 1000 at the end, and if the value stream is clean initially, then the value stream should have a history from 1 to 1000. Run Test Use the following command to perform the test: .<your path>/mulreqs Execution output will look like: Check Result You will be surprised that the final value is 926 instead of 1000. (Caution: this value will be different in different tests and it can be any value in the range of 1 and 1000). Now, look at the value stream by using QueryPropertyHistory. There are many values missing here, and while the total count can vary in different tests, it is unlikely to be exactly the last value (926). Notice that the last 5 values are: 926, 925, 921, 918. The values 919, 920, 922, and 923 are all missing. So next we check if there are any errors in the script log, and there are none. There are only print statements we deliberately placed in the logs. So, we have observed two symptoms here: The final value from property counter doesn't have the expected value. The value stream doesn't have the expected history of the counter property changes. What's the reason behind each symptom, and which one is a thread-safe issue? Understanding Timestamp Granularity ThingWorx facilitates the collection of time series data and solutions centered around such data by allowing for use of the timestamp as the primary key. However, a timestamp will always have a minimal granularity definition when you process it. In ThingWorx, the minimal granularity or unit of a timestamp is one millisecond. Looking at the log we generated from the subscription again, we see that several data points (922, 923, 924, 925) have the same timestamp (1596089891147), which is GMT Thursday, July 30, 2020, 6:18:11.147 AM. When each of these data points is flushed into the database, the later data points overwrite the earlier ones since they all have the same timestamp. So, data point 922 went into the value stream first, and then was overwritten by data point 923, and then 924, and then 925. The next data point in the value stream is 926, which has a new timestamp (1596089891148), 1ms behind the previous one. Therefore, data points 925 and 926 are stored while 922, 923, 924 are not. These lost data points are therefore NOT a thread-safe issue. The reason why some of these data points have the same timestamp in this example is because multiple machines write to the same value stream. The right approach is to log data points at the individual machine level, with a different value stream per machine. However, what happens if one machine emits data too frequently? If data points from the same machine still have a timestamp clash issue, then the signal frequency is too high. The recommended approach would be to down-sample the update frequency, as any frequency higher than 1000Hz will result in unexpected results like these. Real Thread Safe Issue from Demo Use Case The final value of the counter being an arbitrary random number is the real thread-safe coding issue. if we take a look at the code again: me.counter = me.counter + 1; This piece of code can be split into three-piece: Step 1: read current value of me.counter Step 2: increase this value Step 3: set me.counter with new value. In a multi-threaded environment, not performing the above three steps as a single operation will lead to a race issue. The way to solve this issue is to use a locking mechanism to serialize access to the property, which will acquire the lock, perform the three operations, and then release the lock. This can be done using either the Java Extension or the database thing to leverage the database lock mechanism. Use Java Extension to Handle Thread Safe Challenge This tutorial assumes that the Eclipse plug-in for ThingWorx extension development is already installed. The following will guide you through creating a simple Java extension step by step: Create a Java Extension Project Choose the minimal ThingWorx version to support and select the corresponding SDK. Let's name it JavaExtLocker, though it’s best to use lower-case in the project name. Add a ThingWorx Template in the src Folder Right-click the src folder and a a Thing Template. Add a Thing property Right click on the Java source file created in the above step and click the menu option called Thingworx Source, then select Add Property. Add Three Services: IncreaseCounter, GetCounter, ResetCounter Right click the Java source file and select the menu option called Thingworx source, then select Add Service. See above for the IncreaseCounter service details. Repeat these same steps to add GetCounter and ResetCounter: (Optionally) Add a Generated Serial ID Add Code to the Three Services @SuppressWarnings("deprecation") @ThingworxServiceDefinition(name = "IncreaseCounter", description = "", category = "", isAllowOverride = false, aspects = {"isAsync:false" }) @ThingworxServiceResult(name = "Result", description = "", baseType = "INTEGER", aspects = {}) public synchronized Integer IncreaseCounter() throws Exception { _logger.trace("Entering Service: IncreaseCounter"); int current_value = ((IntegerPrimitive (this.getPropertyValue("Counter"))).getValue(); current_value ++; this.setPropertyValue("Counter", new IntegerPrimitive(current_value)); _logger.trace("Exiting Service: IncreaseCounter"); return current_value; } @ThingworxServiceDefinition(name = "GetCounter", description = "", category = "", isAllowOverride = false, aspects = {"isAsync:false" }) @ThingworxServiceResult(name = "Result", description = "", baseType = "INTEGER", aspects = {}) public synchronized Integer GetCounter() throws Exception { _logger.trace("Entering Service: GetCounter"); int current_value = ((IntegerPrimitive)(this.getPropertyValue("Counter"))).getValue(); _logger.trace("Exiting Service: GetCounter"); return current_value; } @SuppressWarnings("deprecation") @ThingworxServiceDefinition(name = "ResetCounter", description = "", category = "", isAllowOverride = false, aspects = {"isAsync:false" }) @ThingworxServiceResult(name = "Result", description = "", baseType = "INTEGER", aspects = {}) public synchronized Integer ResetCounter() throws Exception { _logger.trace("Entering Service: ResetCounter"); this.setPropertyValue("Counter", new IntegerPrimitive(0)); _logger.trace("Exiting Service: ResetCounter"); return 0; } The key here is the synchronized modifier, which is what allows for Java to control the multi-threading to prevent data loss. Build the Application Use 'gradle build' to generate a build of the extension. Import the Extension into ThingWorx Create Thing Based on New Thing Template Check the New Thing Property and Service Definition Use the Same Test Tool to Run the Test Again { "host":"twx85.desheng.io", "port":443, "protocol":"https", "endpoint":"/Thingworx/Things/DeoLockerThing/services/IncreaseCounter", "headers":{ "Content-Type":"application/json", "Accept": "application/json", "AppKey":"5cafe6eb-adba-41df-a7d6-4fc8088125c1" }, "payload":{}, "round_break":50000, "req_break":0, "round_size":50, "total_round":20 } Just change the endpoint to point to the new thing. Check the Test Result Repeat the same test several times to ensure the results are consistent and expected (and don't forget to reset the counter between tests). Summary of Java Extension Approach The Java extension approach shown here uses the synchronized keyword to thread-safe the operation of several actions. Other options are to use a ReentryLock or Semaphore locker for the same purpose, but the synchronized keyword approach is much cleaner. However, the Java extension locker will NOT work in 9.0 horizontal architecture since Java doesn't a have distributed locker. IgniteLocker wouldn't work in the current horizontal architecture, either. So if using a thread-safe counter in version 9.0+ horizontal architecture, then leverage the database thing, as discussion below.

Nov 30, 2020

When to Include InfluxDB in the ThingWorx Development Lifecycle (this article is also available for download as a PDF attached) The Short Answer InfluxDB is a time series database designed specifically for data ingestion. Historically, InfluxDB has been viewed as a high-scale expansion option for ThingWorx: a way to ensure the application works as intended, even when scaled up to the enterprise level. This is certainly one way to view it, because when there are many, many remote things, each with a lot of properties writing to the Platform at short intervals, then InfluxDB is a sure choice. However, what about in smaller applications? Is there still a benefit to using an optimized data ingestion tool in any case? The short answer is: yes, there is! Using InfluxDB for optimized data ingestion is a good idea even in smaller-sized applications, especially if there are plans to scale the application up in the future. It is far better to design the application around InfluxDB from the start than to adjust the data model of the application later on when an optimized data ingestion process is required. PostgreSQL and InfluxDB simply handle the storage of data in different ways, with the former functioning better with many Value Streams, and the latter with fewer Value Streams. Switching the way data is retained and referenced later, when the application is already on the larger side, causes delays in growing the application larger and adding more devices. Likewise, if the Platform reaches its ingestion limits in a production environment, there can be costly downtime and data loss while a proper solution (which likely involves reworking the application to work optimally with InfluxDB) is implemented. Don’t think that InfluxDB is for expansion only; it is an optimized ingestion database that has benefits at every level of the application development lifecycle. From the end to end, InfluxDB can ensure reliable data ingestion, reduced risk of data loss, and reduced memory and CPU used by the deployment overall. Preliminary sizing and benchmark data is provided in this article to explain these recommendations. Consider how ThingWorx is ingesting data now, how much CPU and other resources are used just for acquiring the data, and perhaps InfluxDB would seem a benefit to improve application performance. The Long Answer In order to uncover just how beneficial InfluxDB can be in any size application, the IoT Enterprise Deployment Center has run some simulations with small and medium sized applications. The use case in the simulation is simple with user requests coming from a collection of basic mashups and data ingestion coming from various numbers of things, each with a collection of “fast” and “slow” properties which update at different rates. This synthetic load of data does not include a more complete application scenario, so the memory and CPU usage shown here should not be used as sizing recommendations. For those types of recommendations, stay tuned for the soon-to-release ThingWorx 9.0 Sizing (or check out the current 8.5 Sizing Guide). Comparing Runs When determining the health of the ThingWorx Platform, there are several categories to inspect: Value Stream Queue Rate and Queue Size, HTTP Requests, and the overall Memory and CPU use for each server. Using Grafana to store the metrics results in charts like those below which can easily be compared and contrasted, and used to evaluate which hardware configuration results in the best performance. The size of the numbers on the vertical axis indicate total numbers of resources used for that metric, while the slope or trend of each chart indicates bottlenecks and inadequate resource allocation for the use case. In this case, all darker charts represent data from PostgreSQL ONLY configurations, while the lighter charts represent the InfluxDB instances. Because this is not a sizing guide, whether each of these charts comes from the small or medium run is unimportant as long as they match (for valid comparisons between with Influx and without it). The smaller run had something like 20k Things, and the larger closer to 60k, both with 275 total Platform users (25 Admins) and 3 mashups, which were each called at various refresh rates over the course of the 1-hour testing period. Note that in the PostgreSQL ONLY instances, there were more Thing Templates and corresponding Value Streams. This change is necessary between runs because only with fewer Value Streams does InfluxDB begin to demonstrate notable improvements. The most important thing to note is that the lighter charts clearly demonstrate better performance for both size runs. Each section below will break down what the improvement looks like in the charts to show how to use Grafana to verify the best performance. Value Stream Queue The vertical axis on the Value Stream Queue Rate chart shows how many total writes per minute (WPS) the Platform can handle. The average is 10 WPS higher using InfluxDB in both scenarios, and InfluxDB is also much more stable, meaning that the writes happen more reliably. The Value Stream Queue Size chart demonstrates how well the writes within the queue are processed. Both of these are necessary to determine the health of data ingestion. If the queue size were to increase and trend upward in the lighter Queue Size chart, then that would mean the Platform couldn’t handle the higher ingestion rate. However, since the Queue Size is stable and close to 0 the entire time, it is clear that the Platform is capable of clearing out the Value Stream Queue immediately and reliably throughout the entire test. FIGURE 1 – THESE REPRESENT THE DATA GETTING STORED INTO THE DATA PROVIDER. NOTE: THE FORMER IS MUCH LOWER THAN THE LATTER. FIGURE 2 – NOTE THE DATA LOSS IN THE NON-INFLUX INSTANCE (THE QUEUE IN GREEN REACHES THE MAX IN YELLOW). THE INFLUX INSTANCE HAS LESS TROUBLE CLEARING OUT THE QUEUE, AS DEMONSTRATED BY THE CONSISTENTLY LOW QUEUE SIZE. HTTP Requests Taking the strain of ingestion off of the Platform’s primary database frees its resources up for other activities. This in turn improves the performance and reliability of the Platform to respond to HTTP requests, those which in a typical application are used to aggregate data into smaller data stores (depending on the use case) and which render the mashups for the end users. The business logic and mashups can be more complex when there is one database designated for ingestion (InfluxDB) and one for everything else (PostgreSQL). FIGURE 3 – THE DARKER CHART SHOWS A LOT OF CHOPPINESS, MEANING THAT WHILE THE PLATFORM WAS RESPONDING THE WHOLE TIME, IT WAS NOT DOING SO RELIABLY. THE SMOOTHER SECOND CHART SHOWS HOW MUCH EASIER THE PLATFORM CAN HANDLE THESE REQUESTS WHEN THE LOAD IS DISTRUBITED INTELLIGENTLY ACROSS MULTIPLE SERVERS, EACH OPTIMIZED FOR THE TYPE OF DATA THEY RECEIVE. THE “STAIRCASE” SHAPE OCCURS BECAUSE THE SIMULATOR INCREASES THE WORK LOAD EVERY 10 MINUTES UNTIL IT BREAKS. Likewise, the nature of Postgres lends well towards this differentiation, given that there are many more database tables required for supporting the HTTP requests, something Postgres does well. That leaves Influx to handle the time-series data and ingestion, and those are the primary strengths of that software as well. So, splitting the load across multiple servers in this way results in smaller server sizes overall, each which is stream-lined and optimized to handle exactly what it is given by the Platform. Note that in both of these charts, there are no bad requests, so both would seem to be successful runs. However, as future charts will demonstrate more clearly, there is a catastrophic failure when the load is increased around 12:30p. The simulation ends before the server begins to show any real symptoms of the issue, and that is why there are no bad requests. The maximum Operations Per Second (OPS) in the Hardware Specifications and Performance section is taken from before the failure begins. Clearly the InfluxDB instance has better performance given that the average Operations Per Second (OPS) is substantially higher, nearly 4 times what is seen in the PostgreSQL ONLY instance. Obviously how well the Platform manages the business logic and mashup loading will depend on a lot of factors. In this test scenario, the OPS was increased by increasing the mashup refresh rate on the InfluxDB instances (which could handle over double the operations). Likewise, the number of Stream writes to the PostgreSQL database could be double what it was when PostgreSQL was the only database. Therefore, configuring InfluxDB for the data ingestion and leaving Postgres for the rest of the application certainly makes the load much easier on the Platform, and the same would be true even in a much more complex scenario. Memory and CPU The important thing here is to keep the memory use low enough that any spikes in usage won’t cause a server malfunction. CPU Usage should stay at or below around 75%, and Memory should never exceed around 80% of the total allocated to the server. The sizing guides can help determine what this allocation of memory needs to be. Of note in these charts is the slight, upward slope of the CPU usage in the darker chart, indicating the start of a catastrophic failure, and the difference in the total memory needed for the ThingWorx Platform and Postgres servers when Influx is used or not. As is apparent, the servers use much less memory when the database load is split up intelligently across multiple servers. FIGURE 4 – THE THINGWORX CPU IS ABOUT THE SAME HERE AS IN THE INFLUXDB CONFIGURATION BELOW BUT LOOK AT HOW MUCH MORE MEMORY BOTH THE PLATFORM AND THE POSTGRES DATABASE NEED ALLOCATED TO THEM IN THIS CONFIGURATION (64 GB A PIECE). ALSO NOTE THE JUMP IN CPU AND MEMORY USAGE AFTER 12:30P. THIS IS REFERENCED IN THE PREVIOUS SECTION, AND THE SLOPE UPWARD OF THE USAGE AFTER THAT POINT INDICATES THE START OF A CATASROPHIC FAILURE. THE TEST ENDS TOO SOON TO SEE ANY SYMPTOMS OF FAILURE, BUT IT IS A SURE THING AFTER THE INCREASE IN LOAD AROUND 12:30P. FIGURE 5 – INFLUX NEEDS AN EXTRA SERVER, BUT THE SIZE OF THE INFLUX AND POSTGRES SERVERS TOGETHER IS LESS THAN HALF THE SIZE AS THAT REQUIRED FOR THE SINGLE POSTGRES DATABASE IN THE POSTGRES ONLY CONFIGURATION (8 GB). THINGWORX IS SMALLER TOO (32 GB). Hardware Specifications and Performance These are the exact specifications for each simulated instance, broken down by size and whether InfluxDB is configured or not. Note that some of the hardware specifications may be more than is necessary real-world use case depending. As stated previously, this document is not a sizing guide (use the official ThingWorx Sizing Guide). Note that the maximum number of WPS and OPS are shown here. The maximums are higher in the InfluxDB scenarios, meaning that even with smaller-sized servers, the InfluxDB configurations can handle much greater loads. Summary In conclusion, if InfluxDB may at some point be needed in the lifecycle of an application, because the expected number of things or the number of properties on each thing is large enough that it will max the limitations of the Platform otherwise, then InfluxDB should be used from the very start. There are benefits to using InfluxDB for data ingestion at every size, from performance to reliability, and of course the obviously improved scalability as well. Reworking the application for use with InfluxDB later on can be costly and cause delays. This is why the benefits and costs associated with an InfluxDB-centric hardware configuration should be considered from the start. More servers are required for InfluxDB, but each of these servers can be sized smaller (depending on the use case), and all of this will affect the overall cost of hosting the ThingWorx application. The benefits of InfluxDB are especially pronounced when used in conjunction with clusters, which will be demonstrate fully in the 9.0 Sizing Guide (soon to be released). If InfluxDB is used to interface with the clusters, then there are even more resources to spare for user requests. It is considered ThingWorx best practice for high ingestion customers to make use of InfluxDB in applications of any size. Note, though, that this will mean the number of Value Streams per Influx Database will need to be limited to single digits. We hope this helps, and from everyone here at the EDC, happy developing!

May 31, 2020

By Tim Atwood and Dave Bernbeck, Edited by Tori Firewind Adapted from the March 2021 Expert Session Produced by the IoT Enterprise Deployment Center The primary purpose of monitoring is to determine when your application may be exhausting the available resources. Knowledge of the infrastructure limits help establish these monitoring boundaries, determining straightforward thresholds that indicate an app has gone too far. The four main areas to monitor in this way are CPU, Memory, Networking, and Disk. For the CPU, we want to know how many cores are available to the application and potentially what the temperature is for each or other indicators of overtaxation. For Memory, we want to know how much RAM is available for the application. For Networking, we want to know the network throughput, the available bandwidth, and how capable the network cards are in general. For Disk, we keep track of the read and write rates of the disks used by the application as well as how much space remains on those. There are several major infrastructure categories which reflect common modes of operation for ThingWorx applications. One is Bare Metal, which relies upon the traditional use of hardware to connect directly between operating system and hardware, with no intermediary. Limits of the hardware in this case can be found in manufacturing specifications, within the operating system settings, and listed somewhere within the IT department normally. The IT team is a great resource for obtaining these limits in general, also keeping track of such things in VMware and virtualized infrastructure models. VMware is an intermediary between the operating system and the hardware, and often its limits are determined based on the sizing of the application and set by the IT team when the infrastructure is established. These can often be resized as needed, and the IT team will be well aware of the limits here, often monitoring some of the performance themselves already. This is especially so if Cloud Providers are used, given that these are scaled up virtualizations which are configured in easy-to-use cloud portals. These two infrastructure models can also be resized as needed. Lastly Containers can be used to designate operating system resources as needed, in a much more specific way that better supports the sharing of resources across multiple systems. Here the limits are defined in configuration files or charts that define the container. The difficulties here center around learning what the limits are, especially in the case of network and disk usage. Network bandwidth can fluctuate, and increased latency and network congestion can occur at random times for seemingly no reason. Most monitoring scenarios can therefore make due with collecting network send and receive rates, as well as disk read and write rates, performed on the server. Cloud Providers like Azure provide VM and disk sizing options that allow you to select exactly what you need, but for network throughput or network IO, the choices are not as varied. Network IO tends to increase with the size of the VM, proportional to the number of CPU cores and the amount of Memory, so this may mean that a VM has to be oversized for the user load, for the bulk of the application, in order to accommodate a large or noisy edge fleet. The next few slides list the operating metrics and common thresholds used for each. We often use these thresholds in our own simulations here at PTC, but note that each use case is different, and each situation should be analyzed individually before determining set limits of performance. Generally, you will want to monitor: % utilization of all CPU cores, leaving plenty of room for spikes in activity; total and used memory, ensuring total memory remains constant throughout and used memory remains below a reasonable percentage of the total, which for smaller systems (16 GB and lower) means leaving around 20% Memory for the OS, and for larger systems, usually around 3-4 GB. For disks, the read and write rates to ensure there is ample free space for spikes and to avoid any situation that might result in system down time; and for networking, the send and receive rates which should be below 70% or so, again to leave room for spikes. In any monitoring situation, high consistent utilization should trigger concern and an investigation into what’s happening. Were new assets added? Has any recent change caused regression or other issues? Any resent changes should be inspected and the infrastructure sizing should be considered as well. For ThingWorx specific monitoring, we look at max queue sizes, entries performed, pool sizes, alerts, submitted task counts, and anything that might indicate some kind of data loss. We want the queues to be consistently cleared out to reduce the risk of losing data in the case of an interruption, and to ensure there is no reason for resource use to build up and cause issues over time. In order for a monitoring set-up to be truly helpful, it needs to make certain information easily accessible to administrative users of the application. Any metrics that are applicable to performance needs to be processed and recorded in a location that can be accessed quickly and easily from wherever the admins are. They should quickly and easily know the health of the application from a glance, without needing to drill down a lot to be made aware of issues. Likewise, the alerts that happen should be meaningful, with minimal false alarms, and it is best if this is configurable by the admins from within the application via some sort of rules engine (see the DGIS guide, soon to be released in version 9.1). The monitoring tool should also be able to save the system history and export it for further analysis, all in the name of reducing future downtime and creating a stable, enterprise system. This dashboard (above) is a good example of how to rollup a number of performance criteria into health indicators for various aspects of the application. Here there is a Green-Yellow-Red color-coding system for issues like web requests taking longer than 30s, 3 minutes, or more to respond. Grafana is another application used for monitoring internally by our team. The easy dashboard creation feature and built-in chart modes make this tool super easy to get started with, and certainly easy to refer to from a central location over time. Setting this up is helpful for load testing and making ready an application, but it is also beneficial for continued monitoring post-go-live, and hence why it is a worthy investment. Our team usually builds a link based on the start and end time of tests for each simulation performed, with all of the various servers being monitored by one Grafana server, one reference point. Consider using PTC Performance Advisor to help monitor these kinds of things more easily (also called DynaTrace). When most administrators think of monitoring, they think of reading and reacting to dashboards, alerts, and reports. Rarely does the idea of benchmarking come to mind as a monitoring activity, and yet, having successful benchmarks of system performance can be a crucial part of knowing if an application is functioning as expected before there are major issues. Benchmarks also look at the response time of the server and can better enable tracking of actual end user experience. The best option is to automate such tests using JMeter or other applications, producing a daily snapshot of user performance that can anticipate future issues and create a more reliable experience for end users over time. Another tool to make use of is JMeter, which has the option to build custom reports. JMeter is good for simulating the user load, which often makes up most of the server load of a ThingWorx application, especially considering that ingestion is typically optimized independently and given the most thought. The most unexpected issues tend to pop up within the application itself, after the project has gone live. Shown here (right) is an example benchmark from a Windchill application, one which is published by PTC to facilitate comparison between optimized test systems and real life performance. Likewise, DynaTrace is depicted here, showing an automated baseline (using Smart URL Detection) on Response Time (Median and 90th percentile) as well as Failure Rate. We can also look at Throughput and compare it with the expected value range based on historical throughput data. Monitoring typically increases system performance and availability, but its other advantage is to provide faster, more effective troubleshooting. Establish a systematic process or checklist to step through when problems occur, something that is organized to be done quickly, but still takes the time to find and fix the underlying problems. This will prevent issues from happening again and again and polish the system periodically as problems occur, so that the stability and integrity of the system only improves over time. Push for real solutions if possible, not band-aids, even if more downtime is needed up front; it is always better to have planned downtime up front than unplanned downtime down the line. Close any monitoring gaps when issues do occur, which is the valid RCA response if not enough information was captured to actually diagnose or resolve the issue. PTC Tech Support developed a diagnostic data gathering query for Oracle that customers can use, found in our knowledgebase. This is an example of RCA troubleshooting that looks at different database factors, reporting on which queries perform the worst based on inputted criteria. Another example of troubleshooting is for the Java JVM, where we look at all of the things listed here (below) in an automated, documented process that then generates a report for easy end user consumption. Don’t hesitate to reach out to PTC Technical Support in advance to go over your RCA processes, to review benchmark discrepancies between what PTC publishes and what your real-life systems show, and to ensure your monitoring is adequate to maintain system stability and availability at all times.

Mar 24, 2021

ThingWorx DevOps with Azure: The Comprehensive DevOps Guide Written by Tori Firewind, IoT EDC As promised in a previous post, attached here is a comprehensive guide to DevOps in ThingWorx, including tutorials and instructions for creating a continuous integration, continuous deployment (CI/CD) process for application development. There are also updated scripts and entities attached, including an entire sample application for importing, exporting, and testing an application in ThingWorx. From Docker and Github to Azure DevOps and Solution Central, this guide has it all. Learn how to perform your role in the DevOps process whether an administrator or a developer, automate your deployments and testing, and create a more efficient process for publication changes to production. A complete DevOps process like this really does facilitate faster and easier updates with fewer risks, fewer delays, and a better pathway to success.

Nov 18, 2021

The natively exposed ThingWorx Platform performance metrics can be extremely valuable to understanding overall platform performance and certain of the core subsystem operations, however as a development platform this doesn't give any visibility into what your built solution is or is not doing. Here is an amazing little trick that you can use to embed custom performance metrics into your application so that they show up automatically in your Prometheus monitoring system. What you do with these metrics is up to your creativity (with some constraints of course). Imaging a request counter for specific services which may be incredibly important or costly to run, or an exception metric that is incremented each time you catch an exception, or a query result size metric that informs you of how much data is being queried from the database. Refer to Resources > MetricsServices: GetCounterMetric GetGaugeMetric IncrementCounterMetric DecrementCounterMetric SetGaugeMetric You'll need to give your metric a name - identified by key - and this is meant to be dotted notation* which will then be converted to underscores when the metric is exposed on the OpenMetrics endpoint. Use sections/domains in the dotted notation to structure your metrics in-line with your application design. COUNTER type metrics are the most commonly used and relate to things happening through time. They are an index which will get timestamped as they're collected by Prometheus so that you will be able to look back in time and analyse and investigate what happened when and what the scale or impact was. After the fact functions and queries will need to be applied to make these metrics most useful (delta over time, increase, rate per second). Common examples of counter type metrics are: requests, executions, bytes transferred, rows queried, seconds elapsed, execution time. Resources["MetricServices"].IncrementCounterMetric({ basetype: "LONG", value: 1, key: "__PTC_Reported.integration.mes.requests", aggregate: false }); GAUGE type metrics are point-in-time status of some thing being measured. Common gauge type metrics are: CPU load/utilization, memory utilization, free disk space, used disk space, busy/active threads. Resources["MetricServices"].SetGaugeMetric({ basetype: "NUMBER", value: 12, key: "__PTC_Reported.Users.ConnectedOperatorCount", aggregate: true }); Be aware of the aggregate flag, as it will make this custom metric cluster level which can have some unintended consequences. Normally you always want performance metrics for the specific node as you then see what work is happening where and can confirm that it is being properly distributed within the cluster. There are some situations however where you might want the cluster aggregation however, like with this concurrently connected operators. Happy Monitoring!

Jul 4, 2024

Persistent vs. Logged Properties By Mike Jasperson, VP of IoT EDC Executive Summary ThingWorx provides several different “aspects” (or storage options) for how property values are saved. These options each have different implications for performance and scalability. Understanding those implications is important for designing a scalable IOT solution. Persistent Properties are best used for non-telemetry data which will change infrequently (for example only a few times in a day) and where historical values are not required. When overused, Persistent properties can put significant pressure on the database layer of your ThingWorx implementation, leading to poor performance of your IOT application. As the number of Things in your IOT application scales up, the quantity or frequency of persistent properties per Thing needs to be carefully considered. Logged Properties are best used for telemetry data where historical values need to be retained, but also for any other value that is expected to change frequently. Logged properties can create some additional requirements: a process for handling null/default values after restarts, more disk space, and a data retention policy. There are benefits as well, though, like more flexibility and scalability for the ingestion of larger volumes of data. Persistent + Logged Properties perform database operations of both aspects. Combined use should be very limited – only properties that update infrequently (a few times a day), and that must be in-memory in the event of a ThingWorx restart. In-Memory Only Properties are neither persistent nor logged – they are not stored to the database. These properties can greatly improve scale for values that need to be available for the application to drive UIs or compute other derived values that will be stored. However, high-frequency updates of in-memory properties can create scale challenges in HA (high availability) ThingWorx configurations where memory state needs to be constantly shared between multiple ThingWorx nodes. Find a complete summary as well as example cases in the document attached.

Oct 14, 2021

How to Scale Vertically and Horizontally, and When to Use Sharding Written by Mike Jasperson, VP of IOT EDC Deployment architecture describes the way in which an IOT application is deployed, or where each of the components are hosted on the network. There are deployment architecture considerations to make when scaling up an application. Each approach to deployment expansion can be described by the “eggs in a basket” analogy: vertical scale is like one person carrying a bigger basket, horizontal scale is like one person carrying more baskets, and sharding is like more than one person carrying the baskets (see below). All of these approaches result in the eggs getting from point A to point B (they all satisfy the use case), but the simplest (vertical scale) is not necessarily the best. Sure, it makes sense on paper for one person to carry everything in one big basket, but that doesn’t ensure that all of the eggs arrive intact. Selecting the right deployment architecture is a way to ensure the use cases are satisfied in the best and most efficient ways possible, with the least amount of application downtown or data loss. Vertical Scale - a.k.a. "one person carrying a bigger basket" The most common scalability approach is to simply size the IOT server larger, or scale up the server. This might mean the server is given additional CPU cores, faster CPU clock speeds, more memory, faster disks, additional network bandwidth or improved network cards, and so on. This is a very good idea when the application logic is increased in complexity, when more data is therefore needed in memory at a time, or when the processing of said data has to occur as quickly as possible. For example, adding additional devices to the fleet increases the size of the “Thing Model” in the process and will require additional heap memory be available to the Foundation server. However, there are limitations to this approach. Only so many concurrent operations and threads can be performed at once by a single server. Operations trying to read and write to the disks at once can introduce bottlenecks and reduce server performance. Likewise, “one person, one basket” introduces a single-point-of-failure operating risk. If for some reason the server’s performance does degrade or cease altogether, then all of the “eggs” go down with it. Therefore, this approach is important, but usually not sufficient on its own for empowering an enterprise level deployment. Horizontal Scale - a.k.a. "one person, with two or more baskets" As of ThingWorx version 9.0, Foundation servers can be deployed in clusters, meaning more baskets to carry the eggs. More baskets means that if even one of these servers is active, the application remains up in the event of an individual node failure or maintenance. So clustered deployments are those which facilitate High Availability. Clustered servers save on some resources, but not others. For instance, every server in a cluster will need to have the same amount of memory, enough to store the entire Thing Model. Each of the multiple baskets in our analogy has to have the same type of eggs. One basket can’t have quail eggs if the rest have chicken eggs. So, each server has to have an identical version of the application, and therefore enough memory to store the entire application. Also keep in mind that not all application business logic can scale horizontally. Event queues are local to each ThingWorx node, so the events generated within each node are processed locally by that particular node, and not the entire network (examples are timer and scheduler-based activities). Likewise, data ingestion done through an extension or other background process, like MQTT, emits events within a node that therefore must be processed by that particular node, since that's where the events are visible. On the other hand, load distribution that happens external to ThingWorx in either the Connection Server (for AlwaysOn based data coming from ThingWorx SDKs, EMS, eMessage agents, or Kepware) or REST API calls through a load balancer (i.e. user activity) will be distributed across the cluster, facilitating greater scaling potential in terms of userbase and mashup complexity. Also note that batched data will be processed by the node that received it, but different batches coming through a connection server or load balancer will still be distributed. Another consideration with clusters pertains to failure modes. While each node in the cluster shares a cache for many things, Stream and Value Stream queues are only stored locally. In the event of a node failure, other nodes will pick up subsequent requests, but any activity already queued on the failed node will be lost. For use cases where each and every data point is critical, it is important to size each node large enough (in other words, to vertically scale each node) such that queue sizes are constantly kept low and the data within them processed as quickly as possible. Ensuring sufficient network and database throughput to handle concurrent writes from the many clustered nodes is key as well. Once each node has enough resources to handle local queues, the system is highly available with low risk for outages or data loss. However, when multiple use cases become necessary on single deployments, horizontal scaling may no longer be enough to ensure things run smoothly. If one use case is logic-heavy, something non-time-critical which processes data for later consumption, it can use too many resources and interfere with other, lighter but more time-critical use cases. Clustering alone does not provide the flexibility to prioritize specific operations or use cases over others, but sharding does. Sharding - a.k.a. "more than one person carrying baskets" “Sharding” generally refers to breaking up a larger IOT enterprise implementation into smaller ones, each with its own configuration and resources. More server maintenance and administration may be required for each ThingWorx implementation, but the reduction in risk is worth it. If each of the use cases mentioned above has its own implementation, then any unexpected issues with the more complex, analytical logic will not affect the reaction time of operators to time-sensitive matters in the other use case. In other words, “don’t keep all your eggs in one basket”. The best places to break up an implementation lie along logical boundaries already accepted by the business. Breaking things down in other ways might look nice on paper, but encouraging widespread adoption in those cases tends to be an uphill battle. In connected products use cases, options for boundaries could be regional, tied more towards business vertical, or centered around different products or models. These options can be especially beneficial when data needs to stay in particular countries or regions due to regulatory requirements. In connected operations use cases, the most common logical boundaries would be site-based, with smaller IOT implementations serving just a smaller number of related factories in a particular area. Use-case or product-line boundaries can also make sense here, in-line with the above comments about keeping production-critical or time-sensitive use cases isolated from interference from business-support and analysis use cases. Ideally, a shard model will put the IOT implementation “closer” to both the devices communicating with it and the users that interact with the data. This minimizes the amount of data to be sent or received over long distances, reducing the impact from bandwidth and latency on performance. When determining which approach is best, consider that smaller, more focused implementations offer more flexibility, but are harder to manage. Having different versions of the same applications deployed in multiple places can easily become a maintainability nightmare. It’s therefore best not to combine a regional model with a use case model when it comes to determining sharding boundaries. Also consider using deployment automation tools like Solution Central. These enable tracking and managing version-controlled deployments to multiple IOT implementations, whether they are deployed on-premise, or in the cloud, giving one central location of all source code. Another benefit to sharding is the focused investment of server resources in a more targeted way. For instance, if one region is larger than another, it may need more CPU and memory. Or, perhaps only part of an application requires High Availability, the time-critical use cases which are best suited to small, clustered deployments. The larger, analysis-centered use cases can then remain non-clustered (but still vertically scaled of course). Sharding can also make access control simpler, as those who need access to only one region or use case can just be given a user account on that particular shard. However, certain use cases need data from more than one shard in order to operate, turning the data storage and access control benefits into challenges. Luckily, ThingWorx has an excellent toolkit for overcoming such issues. For one thing, REST API calls are readily available in ThingWorx, allowing each shard to exchange information with each other, as well as other enterprise data systems, like ERP or Service Ticketing. Sometimes, lower-level data replication strategies are the way to go, say if downsampling or data transfer from one store to another is necessary, and built-in database tools can more easily handle the workload. Most of the time, however, REST API calls are used to define the business logic within the application layer so that copying data actively between shards is unnecessary, using fewer resources to control what information is shared overall. There are several design approaches for REST API communication between shards, the two most common being the “peer” model and the “layered” model. In a peer model, one shard may call upon another shard using REST whenever it needs more information. In a layered model, there are “front-line” shards which handle most (if not all) of the device communication and time-critical use cases, things which require only the information in one shard to operate. Then there are also “back-line” shards that aggregate data from the many front-line shards, performing any operations that are less time-sensitive and more complex or analytical. For any of these approaches, it remains important to keep your data archival and purge strategy in mind. It is a best practice in ThingWorx to only retain as much data as is absolutely necessary, purging the rest periodically. If the front-line shards only ever need the last 7 days of raw data for 5 properties, plus the last 52 weeks of min/max/average data, then implementing an approach where each shard computes the min/max/average values and then archives the older data to a shared “data lake” before purging it would be ideal. This data lake then serves as the data store for all back-line shard operations. There is also the option to consider sharing some infrastructure between ThingWorx instances when using sharding in a deployment, which can create more flexible, scalable architectures, but can also introduce issues where more than one shard is affected when issues occur on only one shard. For instance, a common shared infrastructure piece is at the database level; each ThingWorx instance needs its own database, but a single database server instance (such as a PostgreSQL HA cluster) could serve separate database namespaces to multiple ThingWorx instances. This is an attractive option where an existing enterprise-scale database infrastructure with experienced DBAs is already in place. Similarly, load balancers can often be configured to manage load for multiple servers or URLs. If properly configured, an experienced load balancer could direct traffic for multiple applications, but it can also create a bottleneck for inbound connections if not properly sized. Load balancers designed for High Availability can also be considered. Apache Zookeeper is another tool often deployed once for an entire cluster to monitor the health and availability of individual components, or to vote them in or out of operations if problems are detected. With all of these options, remember that sharing infrastructure increases the chances of sharing issues from one ThingWorx system to the next, which can reduce the overall infrastructure complexity at the cost of increased administrative complexity. Bringing it All Together Vertical and Horizontal scale are both effective ways to add more capacity and availability to software implementations, but there are typically some diminishing returns in the investment of additional infrastructure. For example, consider two large, vertically-scaled implementations – one running on a VM with 64 vCPUs and 256 GiB RAM, and another running on a VM with 96 vCPUs and 384GiB of RAM. While the 96-core server has 1.5 times the compute capacity, in sizing tests with 1 million simulated assets, these two systems tend to fall behind on WebSocket execution at approximately the same point. In a horizontal scale example, with two nodes each sized the same (64 vCPU and 256GiB RAM), one would expect High Availability to occur, where one node picks up the other’s slack in a failover scenario. However, what if that singular node can’t handle the entire workload? Should both machines be sized vertically such that either can take on the full load, and if so, then what is the cost-benefit of that? It would be less expensive in this case to have a third server. Optimizing the deployment architecture for a ThingWorx application will therefore usually involve a blended approach. With more than two nodes, High Availability is more readily obtained, as two servers can almost certainly share the load of the third, failed node. Likewise, some workload aspects do not scale well until multiple additional nodes are added. For instance, spreading out the user load from mashup requests across multiple nodes to give the singleton more resources for the tasks it alone can perform doesn’t have much benefit if there are just two nodes. However, with horizontal scaling alone, the servers may still need to be vertically scaled larger than is ideal in terms of cost. Each one has to hold the entire Thing Model in memory, which means that sometimes, some of the nodes may be oversized for the tasks actually performed there. Sharding allows for each node to have a different Thing Model as necessary based around what boundaries are selected, which can mean saving on costs by sizing each server only as large as it really needs to be. So, a combination of approaches is typically the best when it comes to deployment architecture. The key is to break things up as much as possible, but in ways that make sense. Determine where the boundaries of the shards will be such that each machine can be as light and focused as possible, while still not introducing more work in terms of user effort (having to access two system to get the job done), application development (extra code used to maintain multiple systems or exchange information between them), and system administration (monitoring and maintaining multiple enterprise systems). Find the right balance for your systems, and you’ll maximize your cost-benefit ratio and get the most out of your ThingWorx application. Happy developing!

Jan 31, 2021

JMeter for ThingWorx Overview Apache JMeter is an open-source tool designed for load testing and measuring the performance of a web application. JMeter has a wide range of features to facilitate this testing, including support for a variety of server and protocol types, a full-featured testing IDE with the ability to record the test steps from both a browser or a native application, and built-in debugging tools. Information about JMeter can be found on Apache’s website. Working with JMeter is not always intuitive, but it also isn’t that much harder than regular software development. Take some time to explore the official Apache JMeter Documentation and figure out where things go and how to mechanically make use of the JMeter IDE. Then step through this tutorial to create a basic test that logins to ThingWorx, accesses a mashup, and clicks on a few widgets. This is the first in a series to come, courtesy of IoT EDC Engineer Tim Atwood ( @atwood ) and the whole EDC team. Installation Download JMeter from Apache’s website. Unpack the archive and copy the files to a desired location. Run the application by double clicking on the “ApacheJMeter.jar” file within the bin directory. JMeter is now installed and ready to use. Creating a Test Set up a proxy in your browser of choice (or on the OS in settings). Select the green “templates” icon in JMeter, and then select “Recording” for the template. Configure the recording template to point towards your ThingWorx Navigate or Foundation server, then click “Create”. Hit “Start” under the “HTTP(S) Test Script Recorder” tab of the new JMeter project. Make sure the port is set correctly under Global Settings. A pop-up box will appear that always stays visible on top of the active browser window, so that the recording can be controlled and stopped at any time. Leave the “Transaction name” field empty so that each transaction recorded by the software is automatically named after the web request (this helps differentiate one from the other, and they can each be renamed later). Open your browser, and navigate (via direct URL if possible, to keep things simple) to the mashup you wish to test. Login and let the page load. Click on anything you’d like on the mashup to capture the activity of that test. Then click “Stop” on the pop-up recorder window to stop the recording. Each transaction will be assigned an index as well, and the source code behind each of these transactions can be reviewed and manually modified in the main JMeter window. Here is the login request for instance: The HTTP Authorization Manager is used to automatically authorize a defined user login for the thread to any of the Base URLs listed. In this case, though, there are two separate servers being accessed during the test, and one may need to be added manually: Save the project before continuing, as manual modifications come next. Within the task page as you do the recording, a set of parameters or body data will be recorded. Modifying this is how you want to parametrize the test scenario, variables like the username and password. To simulate logging in as other users, you have to parameterize this, and not rely on the administrator account name and password entered into the browser. Rename the task controller to “MyTasks” or something more easily identified than the long string it has now: Some recorded items like static images and stylesheets will be non-essential, things the browser processes for better graphical representation, but which are often cached and do not greatly affect the scalability results of the test. These can be highlighted and disabled all at once: Also ensure that any cascading stylesheets have been disabled. Enable the “View Results Tree” to ensure you can review the results of the test script during the editing phase. However, this “Listener” element has a high memory footprint during test execution, so it should be disabled before running an actual scale test. Next we need to parametrize the user login information and pull it from a csv file. The colon means that “Administrator” is the default user to use for login. You can add other properties as well, like ramp up time, run time, number of users, and protocols to use. The ramp up time determines how quickly the threads are allocated for the test, which if done slowly enough, prevents the thundering herd scenario. In more complex scenarios, logic controllers can be inserted to control the flow of the test. This allows for options such as if-then conditions for different user permissions, or parameter-based routes for better randomization of actions in different threads. This will be covered in more detail in a future article. Pre- and Post-Processors can be used as well, with the latter being used here much more than the former, to extract information from the response, in order to then use that as part of the variables going into one of the follow up requests. For example, see the script in this image: This one has a variable that it extracts from the object number property, defined in the CSV file, and converts it into another variable that is used in subsequent scripts. This script uses the object number reference to pull the name out of the body data and make the request, which is then post-processed by a bunch of these extractors. One is a JSON extractor which is trying to get an ID out of the JSON response. There is a regular expression extractor and a bean shell post-processor, which populates some variables based on what it responded with. Once it extracts all of the variables from the response to this particular request (GetSearchResults in this case), it then tailors the additional requests based on these. - Customize the script according to the needs of your own application. Alternate between recording and manually modifying the recording code to ensure the test performs exactly as required and from the perspective of different users with different permissions. Also vary the type of activity performed on the mashup. Highlight the “View Results Tree” tab and click the green start button at the top of the window to see the results appear. If you are getting an unauthorized message, ensure that the scope is right for the login information, which may require moving the “HTTP Authentication Manager” component around in the project. Be sure to check the URLs and credentials entered for each type of user. Occasionally the recorder will insert a long authentication string into the URL, and you want to manually set the URL for the credentials to the most generic URL possible for the server. This can be parametrized too: Referencing the CSV file defined here: Which looks like this for a more complicated scenario (covered in the future): The columns here represent the username, password, object number in Windchill, and object name in Windchill, as well as the wait time used to vary the way the logic is executed and some extra variables which differentiate for the switches what to do to create a more varied and realistic test. Conclusion Following these steps again and again on the various mashups throughout an application can ensure that a script for each web page and each type of user on each web page is created and added to the testing suite. This results in a load test that is perfectly representative of the real-world user load placed on an application. Load testing is a critical part of the development lifecycle in any application, and ThingWorx is no exception. Any further questions about the capabilities of JMeter not covered here, can be answered by the whole JMeter user manual, found on the Apache website. Future articles will include some basic scripts that test basic things, which can serve as an example for more complex ThingWorx JMeter script development. Here is an example of one tool PTC uses for internal QA of ThingWorx, designed to load test a Navigate application (specifically its built-in mashups): Something similar to this tool may be available for public use later this summer. In the meantime, feel free to use the tutorial above to create scripts of your own. Any issues building your custom load tests in JMeter can be discussed right here on this thread with our JMeter experts. Happy developing!

Jun 30, 2020

Remote Monitoring of Assets Benchmark As @ttielebein introduced previously, one of the missions of the IOT Enterprise Deployment Center (EDC) is to publish benchmarks that showcase the ThingWorx Platform deployed to solve real-world IOT business problems. Our goal is that these benchmarks can be used as a reference or baseline for architects working on their own implementations... showing not only a successful at-scale implementation, but also what happens when that same implementation is pushed to ...or even past... it's limits. Please find the first installment attached - a reference benchmark demonstrating ThingWorx deployed to monitor 15,000 assets with a high-volume of data properties per asset. Over 250 hours of simulations were conducted as part of producing this benchmark. The IOT EDC team will be monitoring this post (as well as our other posts in the IOT Tech Tips forum) to answer any questions we can about the approaches taken in designing, deploying and simulating this implementation. As the team will publish more benchmarks like this will be published in the future, we also greatly value any feedback you have that can help us to improve the content for future documents.

Dec 16, 2019

Smoothing Large Data Sets Purpose In this post, learn how to smooth large data sources down into what can be rendered and processed more easily on Mashups. Note that the Time Series Chart widget is limited to load 8,000 points (hard-coded). This is because rendering more points than this is almost never necessary or beneficial, given that the human eye can only discern so many points and the average monitor can only render so many pixels. Reducing large data sources through smoothing is a recommended best practice for ThingWorx, and for data analysis in general. To show how this is done, there are sample entities provided which can be downloaded and imported into ThingWorx. These demonstrate the capacity of ThingWorx to reduce tens of thousands of data points based on a "smooth factor" live on Mashups, without much added load time required. The tutorial below steps through setting these entities up, including the code used to generate the dummy data. Smoothing the Data on Mashups Create a Value Stream for storing the historical data. Create a Data Shape for use in the queries. The fields should be: TestProperty - NUMBER timestamp - DATETIME Create a Thing (TestChartCapacityThing) for simulating property updates and therefore Value Stream updates. There is one property: TestProperty - NUMBER - not persistent - logged The custom query service on this Thing (QueryNamedPropertyHistory) will have the logic for smoothing the data. Essentially, many points are averaged into one point, reducing the overall size, before the data is returned to the mashup. Unfortunately, there is no service built-in to do this (nothing OOTB service). The code is here (input parameters are to - DATETIME; from - DATETIME; SmoothFactor - INTEGER): // This is just for passing the property name into the query var infotable = Resources["InfoTableFunctions"].CreateInfoTable({infotableName: "NamedProperties"}); infotable.AddField({name: "name", baseType: "STRING"}); infotable.AddRow({name: "TestProperty"}); var queryResults = me.QueryNamedPropertyHistory({ maxItems: 9999999, endDate: to, propertyNames: infotable, startDate: from }); // This will be filled in below, based on the smoothing calculation var result = Resources["InfoTableFunctions"].CreateInfoTable({infotableName: "SmoothedQueryResults"}); result.AddField({name: "TestProperty", baseType: "NUMBER"}); result.AddField({name: "timestamp", baseType: "DATETIME"}); // If there is no smooth factor, then just return everything if(SmoothFactor === 0 || SmoothFactor === undefined || SmoothFactor === "") result = queryResults; else { // Increment by smooth factor for(var i = 0; i < queryResults.rows.length; i += SmoothFactor) { var sum = 0; var count = 0; // Increment by one to average all points in this interval for(var j = i; j < (i+SmoothFactor); j++) { if(j < queryResults.rows.length) if(j === i) { // First time set sum equal to first property value sum = queryResults.getRow(j).TestProperty; count++; } else { // All other times, add property values to first value sum += queryResults.getRow(j).TestProperty; count++; } } var average = sum / count; // Use count because the last interval may not equal smooth factor result.AddRow({TestProperty: average, timestamp: queryResults.getRow(i).timestamp}); } } Create a Timer for updating the property values on the Thing. The Timer should subscribe to itself, containing this code (ensure it is enabled as well): var now = new Date(); if(now.getMilliseconds() % 3 === 0) // Randomly reset the number to simulate outliers Things["TestChartCapacityThing"].TestProperty = Math.random()*100; else if(Things["TestChartCapacityThing"].TestProperty > 100) Things["TestChartCapacityThing"].TestProperty -= Math.random()*10; else Things["TestChartCapacityThing"].TestProperty += Math.random()*10; Don't forget to set the runAsUser in the Timer configuration. To generate many properties, set the updateRate to a small value, like 10 milliseconds. Disable the Timer after many thousands of properties are logged in the Value Stream. Create a Mashup for displaying the property data and capacity of the query to smooth the data. The Mashup should run the service created in step 4 on load. The service input comes from widgets on the mashup: Bindings: Place a Time Series Chart widget in the bottom of the Mashup layout. Bind the data from the query to the chart. View the Mashup. Note the difference in the data... All points in one minute: And a smooth factor of 10 in one minute: Note that the outliers still appear, and the peaks are much easier to see. With fewer points, trends become easier to spot and data is easier to understand. For monitoring the specific nature of the outliers, utilize alerts and other types of displays. Alternative forms of data reduction could involve using the mean of each interval (given by the smoothing factor) or the min or max, as needed for the specific use case. Display multiple types of these options for an even more detailed view. Remember, though, the more data needs to be processed, the slower the Mashup will load. As usual, ensure all mashups are load tested and that the number of end users per Mashup is considered during application design.

Sep 3, 2019

Developing Great IoT Solutions Brought to you once again by your EDC team, find attached here a brand-new, comprehensive overview of ThingWorx best practices! This guide was crafted by combining all available feedback, from support cases to PTC Community threads, and tapping all internal resources. Let this guide serve to bridge the knowledge gaps ThingWorx developers most commonly see. The Developing Great IoT Solutions (DGIS) Guide is a great way to inform both business and technically minded folks about the capabilities of the ThingWorx Platform. Learn how to design good solutions from a high-level, an overview designed specifically with the business audience in mind. Or, learn how to implement good IoT designs through a series of technical examples. Start from very little knowledge of the Platform and end up understanding data structures and aggregation, how to use the collection widget, and how to build a fully functional rules engine for sending and acknowledging alerts in ThingWorx. For the more advanced among us, check out the Appendix. Find here a handy list of do's and don'ts surrounding ThingWorx best practice in development, with links to KCS, Help Center, and Community content. Reinforce your understanding of the capabilities of the ThingWorx Platform with this guide, today! A big thanks to all who were involved on this project! Happy developing!

Jul 31, 2019

Is your team operating an effective DevOps pipeline? DevOps is an important part of a mature, enterprise ready application, but the process isn’t simple. This expert session will focus on how a full DevOps pipeline looks like and how PTC can help to build a seamless pipeline. Join us for our upcoming Expert Session to learn how to create a Docker image, integrate Azure with Docker and Git, and set up a seamless DevOps pipeline. When? Thursday, September 30th 2021 | 11 AM EST Host: Tori FIrewind, Senior Engineer in PTC IOT Enterprise Deployment Center Registration link: https://www.ptc.com/en/resources/iiot/webcast/devops-pipeline-thingworx

Sep 22, 2021

The New and Improved DGIS Guide to ThingWorx Development Written by Victoria Firewind of the IoT EDC The classic Developing Great IoT Solutions guide has been reskinned and revamped for newer versions of ThingWorx! The same information on how to build a quality IoT application is now available for versions of ThingWorx 9.1+, and now, a complete sample application is included to demonstrate these ideas. Find within the attached archive a PDF with high-level overview information on development and application design geared towards managers and business users, so that everyone can understand the necessary requirements, common terms, and key tips on how to ensure an application is scalable and maintainable right from the very start. Reduce your chances of running into issues between PoC and Go Live by reviewing this information today! Also find within this PDF a series of tutorials which teach not just how to use the ThingWorx software, but which also educate on how to make good application design choices. A basic rules engine for sending real-time notifications is included here, as well as a complete demo application which illustrates each concept in a real-world use case. This Coffee Machine Demo App relies upon the tutorial entities, which can also now be imported directly using the other XML files provided here. This ensures that anyone can review these concepts, regardless of how much time one can commit or how much knowledge one already has on the subject. This is a complex guide, and any issues, questions, or bugs found within can be reported right here on this thread. Happy developing from the IoT EDC!

Apr 13, 2021

Hi All Our expert session: Thingworx Flow Overview is tomorrow!!! Click the link below to register and remember to talk about it to colleagues that might benefit from its content. Expert Session: Thingworx Flow Overview Date and Time: December 10th, 8h00 EST Duration: 1 hour Host: Antony Moffa; Vinay Vaidya - Thingworx IoT Platfom Senior Directors Registration Here: https://www.ptc.com/en/customer-success/expert-sessions-for-thingworx-foundation-webcasts See you there! Here are other upcoming sessions that might be of your interest: Upgrade to Thingworx 9 – How to Plan / Evaluate Impacts This session will highlight the key points you should evaluate to properly plan your upgrade to Thingworx 9 Register Here Active Active Clustering This session will cover the main aspects of the High Availability Clustering feature launched with the ThingWorx 9.0 release Register Here

Dec 9, 2020

ThingWorx DevOps with Jenkins DevOps as a topic is vast and has been addressed at many times throughout the history of the PTC Community. Previous posts address what DevOps is, teach how to make use of DevOps like a pro, announce updates to the PTC Git Extension, and explain why this extension is so helpful to achieving continuous Git integration with ThingWorx. This post provides a PDF guide on Jenkins integration with ThingWorx, including tutorials with detailed information on how to setup your ThingWorx instance and how to configure your Jenkins Pipeline. The PDF is listed for download separately, but it is also included in the zip with the other required files for the tutorial. The Jenkins Pipeline provided here is intended as an example / starting point for managing your DevOps in ThingWorx and can easily be extended. Please note that this Pipeline is not officially supported by PTC.

Aug 26, 2020

May 15, 2020

Load Testing through Remote Device Simulation Designing an enterprise-ready application requires extensive testing and quality assurance. This includes all sorts of tests, of course, from examining the user interface for flaws to verifying there is correct logic in all background services. However, no area of testing is more important than scalability. Load testing is how to test the application to ensure it still functions as desired when remote things are connected and streaming information to the Platform. Load testing is considered a critical component of the change management process. It is mentioned numerous times throughout PTC best practice documentation. This tutorial will step you through designing a load test using Kepware as a simulator. Kepware is free to download and use in short demos, making it the perfect tool for this type of test. Start by acquiring the latest version of Kepware from the download site. Click “Download Free Demo” if a license was not included in your PTC product package. The installation of Kepware is simple, and for details, see the Kepware Installation Guide. The tutorial shown here uses Kepware version 6.7 and ThingWorx version 8.4.4. Given that we are testing a ThingWorx application, this tutorial assumes ThingWorx is already installed and configured correctly. Once Kepware is installed, follow these steps: (This tutorial was developed by Desheng Xu and edited by Victoria Tielebein. Exact specifications of the equipment used in both large scale and local tests are given in step VI, which discusses the size of the simulation) Understand how to configure Kepware as a simulator Go to the Help menu within Kepware, and click on “Driver Help” Select “Simulator” in the pop-up window, and click “OK” Expand “Address Descriptions” and then “Simulation Functions” Select “Ramp Function” to review details about the function needed for this tutorial, as well as information about function syntax Close the window once this information has been reviewed Create a new project in Kepware Click “File” > “New” In case you are connected to runtime, Kepware will allow you to choose to edit this project offline Add a channel in Kepware Channels represent threads which Kepware will use to contact ThingWorx Under “Connectivity”, click “Click to add a channel.” From the drop-down list, select “Simulator” Use all the default settings, selecting “Next” all the way down to “Finish” Next, add one device to the channel Highlight the new channel and click “Click to add a device” (which will appear in the center of the screen) Once again, use the default settings, selecting “Next” all the way down to “Finish” Add a tag to this device Within Kepware, tags represent properties which bind to remote things on the Platform and update with new information over time. Each device will need several tags to simulate remote property updates. The easiest way to add many tags for testing is to create one, and then copy and paste it. Highlight the device created in the previous step and click “Click to add static tag”, which appears in the center of the screen For “name” type “tag1” For Address, enter the Ramp function: RAMP(1000,1,2000000,1) The first parameter is the update rate given in milliseconds The next two parameters are the range of values which can be sent The last parameter is the increment or step Together this means that every 1 second, this tag will send a new value that is 1 higher than the previous value to the Platform, starting at 1 and ending at 2 million Ensure the Data Type is given as “DWord” or any type which will be read as a “Number” (and NOT an “Integer”) on the Platform Change the Scan Rate to 250 Then click “OK” Add more devices to the test The most basic set-up is now done: if this project connected to the Platform, one remote thing with one remote property could be used to simulate property updates. That is not very useful for load testing, however. We need many more things than this, and many more properties. The number of tags on each device should match the expected number of remote properties in the application itself. The number of devices in each channel should be large enough that when more channels are created, the number of total devices is close to the target for the application. For example, to simulate 10,000 things, each with 25 remote properties, we need 25 tags per device, 200 devices per channel, and 50 channels. This would require a lot of memory to run and should not be attempted on a local machine. A full test of 40 channels each with 10 devices was performed as shown in the screenshots here. This simulates 10,000 writes per second to the Platform total, or about 400 remote device connections. This test used the following hardware specifications: Kepware machine running Windows 2016 64-bit, 2 cores, 8G ThingWorx Platform machine running Ubuntu 16.04, 4 cores, 16G PostgreSQL 9.6 machine running Ubuntu 16.04, 4 cores, 16G Influx 1.6.3 machine running Ubuntu 16.04, 4 cores, 16G A local test was also run on Windows 10 (64-bit), using the H2 database, with Kepware and ThingWorx running side by side on the machine, 4 cores, 16G. This test made use of only 2 channels, with 10 devices each. For local tests to see how the simulation works, this is fine, but a more robust set-up like the above will be needed in a true load test. If there is not enough memory on the machine hosting Kepware, errors like this will appear in the Kepware logs: One or more value change updates lost due to insufficient space in the connection buffer. Once you decide on the number of tags and devices needed, follow the steps below to add them. To add more tags, copy and paste the existing tag (ctrl+c and ctrl+v work in Kepware for convenience) until there as many tags as desired To add more devices, highlight the device in Kepware and copy and paste it as well (click on the channel before pasting) Then, copy and paste the entire channel until the number of channels, devices, and tags totals the desired load (be sure to click on “Connectivity” before hitting paste this time) Configure the ThingWorx connection Right click on Project in the left-hand navigation bar and in the pop-up window that appears, highlight ThingWorx Change the “Enable” field to “Yes” to activate the other fields Fill in the details for “Host”, “Port”, “Application Key”, and “Thing name” Note that the application key will need to be created in ThingWorx and then the value copied in here The certificate and encryption settings may also need to be adjusted to match your environment For local set-ups, it is likely that self-signed and all certificates will need to be accepted, so both of those fields will likely need to be set to “Yes” (Encryption may need to be disabled as well). In production systems, this should not be the case Save the project It doesn’t matter too much if this project is saved as encrypted or not, so either enter a password to encrypt the save or select “No encryption” Connect to ThingWorx Click “Runtime” > “Connect…” A pop-up will appear asking if you want to load this project, click “Yes” The connection status should then appear in the bottom portion of the window where the logs are displayed Configure in ThingWorx Login to the ThingWorx Platform Under “Industrial Connections” a thing should appear which is named as indicated in the Kepware configuration step above Click to open this thing and save it Also create a new thing, a value stream for ingesting data from Kepware Create remote things in ThingWorx Import the provided entity into ThingWorx (should appear as a downloadable attachment to this post) Open the KepwareUtil thing and go to the services tab Run the AutoKepwareCreate service to generate remote things on the Platform Give the name of the stream created above so each thing has a place to store property information The IgnoreTemplate flag should be set to false. This allows for the service to create a thing template first, which is then passed to the remote devices. The only reason this would be set to true is if the devices need to be deleted and recreated, but the template does not (then set the flag to true). To delete the devices, use the AutoKepwareDelete service also provided on the KepwareUtil thing Note that the AutoKepwareCreate service is asynchronous, so once it is executed, close the window and check the script logs to see when it completes. The logs will look like: KepwareUtil AutoKepwareCreate task finished!!! Check status of remote things Once the things are created, they should automatically connect to the Platform Run the TotalDeviceByTemplateWithTemplate service to see if the things are connected The template given here could be the one created by the AutoKepwareCreate service, or just give it RemoteThings if this is a small local set-up without many remote things on it The number of devices will equal the number of devices per channel times the total number of channels, which in the test shown here, is 400 isConnected will be checked if all of the devices are connected without issue If some of them are not connected, verify in the logs if there are any errors and resolve those before moving on View Ingestion Rate Once the devices are created, their tags should show as numbers (NOT integers), and they should already be updating with new values every second To view the ingestion rate, run the KepwareUtil service AutoKepwareRateSummary Give the thing template name that is created by the AutoKepwareCreate service, which will look like the name of the Kepware thing itself with a “T-“ in the front The start time should be close to the current time, and the periodInMinutes should be large enough to include some of the test (periodInMinutes is used to calculate the end time within the service) Note in the results here that the Average Write Per Second is only 9975 wps, which is close but not exactly what we would expect. This means that there are properties not updating correctly, which requires us to look at the logs and restart some things. If nothing shows up here, despite the Total Connected Things showing correctly, then look at the type of the tags on one of the remote devices. The type must be NUMBER for the query within this service to work, and not INTEGER. If the type of the tags is incorrect, then the type of the tags within Kepware was probably given as something which is not interpreted as a number in ThingWorx. Ensure DWord is used for the tags in Kepware Within the script log, look for any devices which show errors as seen in the image below and restart them to get their properties updating correctly Once the ingestion rate equals what is expected (in the case of the test here, 10,000 wps), use the AutoKepwareIngestionStat service on the KepwareUtil thing to see details about each remote device The TimeGapAvg in this service represents the gap between two ingestions in milliseconds, showing any lag that may be present between Kepware and ThingWorx The TimeGapSTD shows the standard deviation of the time gap between two ingestions on any given thing, also indicating lag (the lower this number, the better) The StartTime and EndTime show the first and last timestamp observed in the ThingWorx database during the given duration The totalCount shows the total number of ingested records during the sampling cycle The StartValue and EndValue fields show the first and last value ingested into the tag during the given duration If the ingestion rate is working as expected, and the ramp function is actually sending an update on time (in this case, once each second), then the difference between the EndValue and StartValue should always be equal to the totalCount plus 1. If this doesn’t match up, then there may be data loss or something else wrong with the property updates, which will show as a checked box in the valueException column. It is not enough to ensure that the ingestion rate is correct, as sometimes the rate may fluctuate only by 1 or 2 wps and appear perfect, even while some data is lost. That is why it is important to ensure that there are no valueException boxes showing as checked in the test of the application. If none of these are marked as having failed, then the test was successful and this ingestion rate is acceptable for the application This tutorial is a very basic way to simulate many remote devices ingesting data into the Platform. For this to be a true test of the application, the remote things created in this test will need to be given business logic tasks as well. The AutoKepwareCreate service can be modified to give any template (and not just RemoteThing) to the thing template which is created and subsequently passed into the demo devices. Likewise, the template itself can be created, and then manually modified to look like the actual remote device template in the application, before the rest of the things are created (using the IgnoreTemplate flag in the creation and deletion services, as discussed above). Ensure that events are triggered as expected and that subscriptions to property updates are in place on the thing template before creating the demo things. Make use of the subsystem monitor to ensure that the event, value stream, and stream queues do not grow so large that the Platform cannot keep up with the requests (for details about tuning the stream and value stream processing subsystems, see PTC’s best practice documentation). Also be sure to load some of the mashups to see how they perform while the ingestion test is happening. This will test whether or not the ingestion rate and business logic of the application can function side by side without errors, data loss, or performance issues.

Oct 28, 2019

IoT Tips

Load Testing through C SDK Remote Device Simulation in ThingWorx

ThingWorx Performance Monitoring with Grafana

ThingWorx DevOps

Introduction to Digital Performance Management (DPM)

Thread Safe Coding in ThingWorx, Part 1: The Java Extension Approach

Why Use InfluxDB in a Small ThingWorx Application

Top 5 Monitoring Best Practices

ThingWorx DevOps with Azure: The Comprehensive DevOps Guide

Custom Solution Performance Metrics, Right Inside ThingWorx!

Persistent vs. Logged Properties

How to Scale Vertically and Horizontally, and When to Use Sharding

JMeter for ThingWorx

IOT EDC Reference Benchmark - Remote Monitoring of Assets

Smoothing Large Data Sets

Developing Great IoT Solutions - A Best Practice Guide

Live Webinar: Setting up a DevOps pipeline in ThingWorx on September 30th

Introducing the New and Improved DGIS Guide to ThingWorx Development

Live Expert Session: Thingworx Flow Overview is tomorrow (December 10th)!!!

ThingWorx DevOps with Jenkins

ThingWorx 8.5 Architecture Deployment Guide Update

Load Testing through Remote Device Simulation using Kepware in ThingWorx

ThingWorx Learning Paths

Getting Started on the ThingWorx Platform Learning Path