IoT & Connectivity Tips

ThingWorx DevOps with Azure: The Comprehensive DevOps Guide Written by Tori Firewind, IoT EDC As promised in a previous post, attached here is a comprehensive guide to DevOps in ThingWorx, including tutorials and instructions for creating a continuous integration, continuous deployment (CI/CD) process for application development. There are also updated scripts and entities attached, including an entire sample application for importing, exporting, and testing an application in ThingWorx. From Docker and Github to Azure DevOps and Solution Central, this guide has it all. Learn how to perform your role in the DevOps process whether an administrator or a developer, automate your deployments and testing, and create a more efficient process for publication changes to production. A complete DevOps process like this really does facilitate faster and easier updates with fewer risks, fewer delays, and a better pathway to success.

Nov 18, 2021

ThingWorx Monitoring and Alerting, Part 2 Using Prometheus and Grafana By Tori Firewind, IoT EDC Building Dashboards To add a panel which monitors some component of the ThingWorx application to a dashboard in Grafana, click to add a new panel. Under “Metrics” in the box at the bottom of the screen, select what ThingWorx metrics you wish to monitor (type “thingworx” in the search box to see them all). For example, select the Platform Subsystem memory in use: Label filters aren’t necessary, though you may want to sort by instance if you are monitoring multiple ones with the same dashboard. You may also want to take some time to format the Y axis, which by default will show in bytes. Go to the formatting panel on the right side and scroll down to the section called “Standard options”. For the Unit dropdown, start typing “data” and then select “bytes (SI)”. This will automatically determine if the bytes you’ve provided should really print as MB or GB based on how large the numbers are. Rename the panel, modify it in any other way desired, and then click Apply (last 5 minutes): Once you add the panel, you can watch the memory usage as it is scraped by selecting the refresh option (10s or 30s, whatever makes sense based on your scrape interval). The viewing window is stored in the URL, so that you can generate a report for a specific interval (like when a test was occurring), and then store that result or share it in a more compact way: http://localhost:3000/d/nleucPv4k/thingworx-monitoring?orgId=1&from=1668528038732&to=1668536503953 (absolute timestamps): Dashboards are just collections of panels which report on all of the various metrics of performance and stability that exist for single components of a system. This is because there can be quite a few metrics worth watching for each individual component. Most of the third-party tools come with their own dashboards, but the ThingWorx component is one which for now, requires some thought and creativity. Consider your use case carefully and look over the various subsystems contained within ThingWorx. Each part of an application is localized to specific subsystems, and some are more business critical than others. What will go at the top of your dashboard? Add rows, add panels per row, and see what the many choices are for watching your system. Don’t forget that with Telegraf running, VM or machine usage metrics are also available for display on a dashboard. Things like overall CPU and Memory usage are critical to determining the health of a system, as we have demonstrated in our own reasoning in past benchmarks and scale tests. You can create a panel to monitor the mem_used versus mem_total, like so: Another metric from Telegraf worth adding is the CPU usage, which should be given “percentage” for the units and which needs a label filter of cpu = cpu-total. If we do some resizing and drag-and-dropping, then we now have the first row of a dashboard: See how the Platform usage climbs steadily and is purged in a cycle? That is the Java Garbage Collection mechanism, and it’s important to remember to leave room for spikes on top of those peaks. Data can also be calculated or processed in some way to make it more useful for determining system health and stability. The data in the picture below uses the formula submitted = completed + number queued + number failed. It shows the current queue on the left Y-axis and the max queue on the right (since the two numbers usually are drastically different). It looks pretty, but it doesn’t really tell us much about the system in this format, so let’s do some math and find a representation that is a bit more helpful. Performing a “non-negative derivative” calculation over the submitted and the completed queue counts over time allows for us to look at the status of the queue as a velocity. When the “complete” speed appears behind the “submitted” speed for too long at a time, then that means the queue is filling up and will eventually result in data loss. If we take this one step further and calculate the average of the submitted minus the completed over time, then we can actually predict approximately when the queue will fill up. This can then be displayed on a dashboard in Grafana, or used as the basis for an alert. What to Monitor In addition to monitoring the system which ThingWorx runs upon, ThingWorx itself can easily be monitored down to the subsystems level by Prometheus due to the Metrics endpoint. Many applications have support built into the way they format the data for scraping, including the JVM (which exposes Prometheus-formatted metrics with the JMX Exporter) and the OS (which can use the Node Exporter or Telegraf for the same purpose). For these more generic components, there are popular community dashboards which can be downloaded and used in Grafana for data analysis and review. For ThingWorx, there’s different kinds of data to track: subsystem data (see the list on the right) and non-subsystem data. There’s queue based versus non-queue data. These different metrics can collectively characterize the overall health of the application, depending on the use case. For instance, if this is a system with very many connected devices, one metric which may be important to track is the number of total devices defined on the Foundation server vs. the number of devices which are currently connected. If there are relays involved, then many devices suddenly going offline can mean a relay has failed. Another example is if the system sizing depends on an assumption that there will only ever be a fraction of the total number of devices connected at a time. Use cases like these could be monitored easily by keeping track of the total vs. the number of connected devices. Other common indicators of a healthy ThingWorx application might include the value stream and stream queues. These queues should fluctuate over time as the data is ingested and processed, but they should never be growing in size. If the stream queues are growing, then that means the data is writing to the queues faster than the queues can write to the database. Eventually, when the system runs out of resources to keep track of the queues, data will be lost. Having the stream information displayed in a chart can make it very easy to spot an upward trend in resource usage early on, which can catch a blockage or bottleneck that needs attention before it starts to affect the larger system in catastrophic ways later. Memory usage information from the various subsystems might be something worth tracking, as well as the event queue. These can indicate that the business logic is functioning with room to handle spikes, and that the server has enough memory to service all three dimensions of an IoT application: the ingestion, the business logic and thing-based alerting, and the user experience and UI. If file transfers are a key part of the use case, then the number of concurrent transfers, the average speed of them, the size of the files, all of this kind of stuff can be tracked and charted in Grafana by making use of the ThingWorx metrics which automatically show up there once you import the Prometheus data source. A mature dashboard used for a production environment might look a little like this: For further reading about subsystem monitoring, check out the Help Center. How to Alert The alerting mechanism built-in to Prometheus is incredibly easy to configure, so it might be tempting to generate tons of alert rules. However, remember that the more noise a system makes, the harder it is for those monitoring that system to know when action is really required. Playbooks which document how to respond to alerts, who to contact, how to act, and all the information necessary to handle an alert, should be created as an ongoing part of the DevOps process. Alerts should fire with the right severity in the subject line, as well as all of the information about the issue that is currently known, presented in a concise way, so that whoever receives the alert starts thinking about the root cause sooner and recovers the system faster. Those who receive the alert should have the ability to facilitate its resolution, and know who is expected to react to any alerts which come in. In the ThingWorx monitoring stack, Prometheus handles the alert rules and the generation of alerts, but alert filtering and delivery is managed in an external alerts manager. Generally, you want your alerts to follow a curve. If the current queue size exceeds 50% of the maximum, perhaps that isn’t a huge deal, if the application catches up quickly. How long are spikes in queue processing expected to last? Perhaps if the queue size is over half-full for 10 seconds, 30 seconds, then that means the queue is falling behind and not catching up. Ok, so this might be a warning level alert. When does this become an error? Well, let’s say the queue exceeds 90% of the max queue size. This might want to alert the moment it hits the mark. Now, farther along the curve, it may not take as long before data gets lost. As the severity of the situation increases, the threshold for alerting should increase as well. That way when errors do alert, it is a sure thing that they require a response immediately. The alerts are then pushed into the “Alerts Manager” for delivery based on your management rules. The Alerts Manager may decide to withhold warnings altogether, or send them to a much smaller mailing list, whatever filtering helps to ensure the right people receive the right alerts, right when they need them. In Conclusion, A Healthy Application... Has stable memory usage that fluctuates predictably and doesn’t grow over time. In a system experiencing mild issues, the memory starts to trend upward: If left unattended, systems like this may eventually experience outages. Finding the issue this early means there is even time to do some digging, debugging, taking of stack traces, and other such troubleshooting steps before the system must be restarted or recovered. That can really make the difference in identifying and resolving before there are real problems. One metric which makes for good alerting is the total number of failed stream entries, which can indicate there’s an issue writing to the database even before the queue has started to fill up. Other alerts may include warnings and errors based around percentages of memory used or queues filled, which depend on how long the queues take to fill up and how long the state has been at its increased usage. Prometheus has all of the tools necessary to make this possible across a variety of infrastructures and use cases. Set it up on a local machine and poke around at what ThingWorx metrics are available to meet your monitoring needs.

Dec 23, 2022

How to Scale Vertically and Horizontally, and When to Use Sharding Written by Mike Jasperson, VP of IOT EDC Deployment architecture describes the way in which an IOT application is deployed, or where each of the components are hosted on the network. There are deployment architecture considerations to make when scaling up an application. Each approach to deployment expansion can be described by the “eggs in a basket” analogy: vertical scale is like one person carrying a bigger basket, horizontal scale is like one person carrying more baskets, and sharding is like more than one person carrying the baskets (see below). All of these approaches result in the eggs getting from point A to point B (they all satisfy the use case), but the simplest (vertical scale) is not necessarily the best. Sure, it makes sense on paper for one person to carry everything in one big basket, but that doesn’t ensure that all of the eggs arrive intact. Selecting the right deployment architecture is a way to ensure the use cases are satisfied in the best and most efficient ways possible, with the least amount of application downtown or data loss. Vertical Scale - a.k.a. "one person carrying a bigger basket" The most common scalability approach is to simply size the IOT server larger, or scale up the server. This might mean the server is given additional CPU cores, faster CPU clock speeds, more memory, faster disks, additional network bandwidth or improved network cards, and so on. This is a very good idea when the application logic is increased in complexity, when more data is therefore needed in memory at a time, or when the processing of said data has to occur as quickly as possible. For example, adding additional devices to the fleet increases the size of the “Thing Model” in the process and will require additional heap memory be available to the Foundation server. However, there are limitations to this approach. Only so many concurrent operations and threads can be performed at once by a single server. Operations trying to read and write to the disks at once can introduce bottlenecks and reduce server performance. Likewise, “one person, one basket” introduces a single-point-of-failure operating risk. If for some reason the server’s performance does degrade or cease altogether, then all of the “eggs” go down with it. Therefore, this approach is important, but usually not sufficient on its own for empowering an enterprise level deployment. Horizontal Scale - a.k.a. "one person, with two or more baskets" As of ThingWorx version 9.0, Foundation servers can be deployed in clusters, meaning more baskets to carry the eggs. More baskets means that if even one of these servers is active, the application remains up in the event of an individual node failure or maintenance. So clustered deployments are those which facilitate High Availability. Clustered servers save on some resources, but not others. For instance, every server in a cluster will need to have the same amount of memory, enough to store the entire Thing Model. Each of the multiple baskets in our analogy has to have the same type of eggs. One basket can’t have quail eggs if the rest have chicken eggs. So, each server has to have an identical version of the application, and therefore enough memory to store the entire application. Also keep in mind that not all application business logic can scale horizontally. Event queues are local to each ThingWorx node, so the events generated within each node are processed locally by that particular node, and not the entire network (examples are timer and scheduler-based activities). Likewise, data ingestion done through an extension or other background process, like MQTT, emits events within a node that therefore must be processed by that particular node, since that's where the events are visible. On the other hand, load distribution that happens external to ThingWorx in either the Connection Server (for AlwaysOn based data coming from ThingWorx SDKs, EMS, eMessage agents, or Kepware) or REST API calls through a load balancer (i.e. user activity) will be distributed across the cluster, facilitating greater scaling potential in terms of userbase and mashup complexity. Also note that batched data will be processed by the node that received it, but different batches coming through a connection server or load balancer will still be distributed. Another consideration with clusters pertains to failure modes. While each node in the cluster shares a cache for many things, Stream and Value Stream queues are only stored locally. In the event of a node failure, other nodes will pick up subsequent requests, but any activity already queued on the failed node will be lost. For use cases where each and every data point is critical, it is important to size each node large enough (in other words, to vertically scale each node) such that queue sizes are constantly kept low and the data within them processed as quickly as possible. Ensuring sufficient network and database throughput to handle concurrent writes from the many clustered nodes is key as well. Once each node has enough resources to handle local queues, the system is highly available with low risk for outages or data loss. However, when multiple use cases become necessary on single deployments, horizontal scaling may no longer be enough to ensure things run smoothly. If one use case is logic-heavy, something non-time-critical which processes data for later consumption, it can use too many resources and interfere with other, lighter but more time-critical use cases. Clustering alone does not provide the flexibility to prioritize specific operations or use cases over others, but sharding does. Sharding - a.k.a. "more than one person carrying baskets" “Sharding” generally refers to breaking up a larger IOT enterprise implementation into smaller ones, each with its own configuration and resources. More server maintenance and administration may be required for each ThingWorx implementation, but the reduction in risk is worth it. If each of the use cases mentioned above has its own implementation, then any unexpected issues with the more complex, analytical logic will not affect the reaction time of operators to time-sensitive matters in the other use case. In other words, “don’t keep all your eggs in one basket”. The best places to break up an implementation lie along logical boundaries already accepted by the business. Breaking things down in other ways might look nice on paper, but encouraging widespread adoption in those cases tends to be an uphill battle. In connected products use cases, options for boundaries could be regional, tied more towards business vertical, or centered around different products or models. These options can be especially beneficial when data needs to stay in particular countries or regions due to regulatory requirements. In connected operations use cases, the most common logical boundaries would be site-based, with smaller IOT implementations serving just a smaller number of related factories in a particular area. Use-case or product-line boundaries can also make sense here, in-line with the above comments about keeping production-critical or time-sensitive use cases isolated from interference from business-support and analysis use cases. Ideally, a shard model will put the IOT implementation “closer” to both the devices communicating with it and the users that interact with the data. This minimizes the amount of data to be sent or received over long distances, reducing the impact from bandwidth and latency on performance. When determining which approach is best, consider that smaller, more focused implementations offer more flexibility, but are harder to manage. Having different versions of the same applications deployed in multiple places can easily become a maintainability nightmare. It’s therefore best not to combine a regional model with a use case model when it comes to determining sharding boundaries. Also consider using deployment automation tools like Solution Central. These enable tracking and managing version-controlled deployments to multiple IOT implementations, whether they are deployed on-premise, or in the cloud, giving one central location of all source code. Another benefit to sharding is the focused investment of server resources in a more targeted way. For instance, if one region is larger than another, it may need more CPU and memory. Or, perhaps only part of an application requires High Availability, the time-critical use cases which are best suited to small, clustered deployments. The larger, analysis-centered use cases can then remain non-clustered (but still vertically scaled of course). Sharding can also make access control simpler, as those who need access to only one region or use case can just be given a user account on that particular shard. However, certain use cases need data from more than one shard in order to operate, turning the data storage and access control benefits into challenges. Luckily, ThingWorx has an excellent toolkit for overcoming such issues. For one thing, REST API calls are readily available in ThingWorx, allowing each shard to exchange information with each other, as well as other enterprise data systems, like ERP or Service Ticketing. Sometimes, lower-level data replication strategies are the way to go, say if downsampling or data transfer from one store to another is necessary, and built-in database tools can more easily handle the workload. Most of the time, however, REST API calls are used to define the business logic within the application layer so that copying data actively between shards is unnecessary, using fewer resources to control what information is shared overall. There are several design approaches for REST API communication between shards, the two most common being the “peer” model and the “layered” model. In a peer model, one shard may call upon another shard using REST whenever it needs more information. In a layered model, there are “front-line” shards which handle most (if not all) of the device communication and time-critical use cases, things which require only the information in one shard to operate. Then there are also “back-line” shards that aggregate data from the many front-line shards, performing any operations that are less time-sensitive and more complex or analytical. For any of these approaches, it remains important to keep your data archival and purge strategy in mind. It is a best practice in ThingWorx to only retain as much data as is absolutely necessary, purging the rest periodically. If the front-line shards only ever need the last 7 days of raw data for 5 properties, plus the last 52 weeks of min/max/average data, then implementing an approach where each shard computes the min/max/average values and then archives the older data to a shared “data lake” before purging it would be ideal. This data lake then serves as the data store for all back-line shard operations. There is also the option to consider sharing some infrastructure between ThingWorx instances when using sharding in a deployment, which can create more flexible, scalable architectures, but can also introduce issues where more than one shard is affected when issues occur on only one shard. For instance, a common shared infrastructure piece is at the database level; each ThingWorx instance needs its own database, but a single database server instance (such as a PostgreSQL HA cluster) could serve separate database namespaces to multiple ThingWorx instances. This is an attractive option where an existing enterprise-scale database infrastructure with experienced DBAs is already in place. Similarly, load balancers can often be configured to manage load for multiple servers or URLs. If properly configured, an experienced load balancer could direct traffic for multiple applications, but it can also create a bottleneck for inbound connections if not properly sized. Load balancers designed for High Availability can also be considered. Apache Zookeeper is another tool often deployed once for an entire cluster to monitor the health and availability of individual components, or to vote them in or out of operations if problems are detected. With all of these options, remember that sharing infrastructure increases the chances of sharing issues from one ThingWorx system to the next, which can reduce the overall infrastructure complexity at the cost of increased administrative complexity. Bringing it All Together Vertical and Horizontal scale are both effective ways to add more capacity and availability to software implementations, but there are typically some diminishing returns in the investment of additional infrastructure. For example, consider two large, vertically-scaled implementations – one running on a VM with 64 vCPUs and 256 GiB RAM, and another running on a VM with 96 vCPUs and 384GiB of RAM. While the 96-core server has 1.5 times the compute capacity, in sizing tests with 1 million simulated assets, these two systems tend to fall behind on WebSocket execution at approximately the same point. In a horizontal scale example, with two nodes each sized the same (64 vCPU and 256GiB RAM), one would expect High Availability to occur, where one node picks up the other’s slack in a failover scenario. However, what if that singular node can’t handle the entire workload? Should both machines be sized vertically such that either can take on the full load, and if so, then what is the cost-benefit of that? It would be less expensive in this case to have a third server. Optimizing the deployment architecture for a ThingWorx application will therefore usually involve a blended approach. With more than two nodes, High Availability is more readily obtained, as two servers can almost certainly share the load of the third, failed node. Likewise, some workload aspects do not scale well until multiple additional nodes are added. For instance, spreading out the user load from mashup requests across multiple nodes to give the singleton more resources for the tasks it alone can perform doesn’t have much benefit if there are just two nodes. However, with horizontal scaling alone, the servers may still need to be vertically scaled larger than is ideal in terms of cost. Each one has to hold the entire Thing Model in memory, which means that sometimes, some of the nodes may be oversized for the tasks actually performed there. Sharding allows for each node to have a different Thing Model as necessary based around what boundaries are selected, which can mean saving on costs by sizing each server only as large as it really needs to be. So, a combination of approaches is typically the best when it comes to deployment architecture. The key is to break things up as much as possible, but in ways that make sense. Determine where the boundaries of the shards will be such that each machine can be as light and focused as possible, while still not introducing more work in terms of user effort (having to access two system to get the job done), application development (extra code used to maintain multiple systems or exchange information between them), and system administration (monitoring and maintaining multiple enterprise systems). Find the right balance for your systems, and you’ll maximize your cost-benefit ratio and get the most out of your ThingWorx application. Happy developing!

Jan 31, 2021

New Scenario Using Multi-Kepware for Asset Monitoring in Connected Factories A new scenario has been completed for Connected Factory implementations, furthering the IOT EDC's goal of providing a reference library of ThingWorx performance. This scenario builds upon the first, with additional tests being performed to demonstrate the capabilities of multiple Kepware Servers running side-by-side. Horizontal scaling is very common for multi-line factory implementations, so be sure to check out the new scenario in this ever-expanding benchmark document. Note that tests below 10,000 writes per second were not repeated with multiple Kepware Servers, since there is little reason to desire such a configuration in implementations that small. ThingWorx deployment sizing was also held constant throughout these tests to demonstrate the limits of a given configuration. Changes that may improve the results of a failed test (such as adding CPUs or Memory) will be mentioned but not validated as part of this benchmark. Let us know about your applications and how they compare with the data shared here. Happy developing!

Sep 30, 2020

Distributed Testing with JMeter Overview Running JMeter to the scale required by most customers is something that demands additional considerations than discussed in the previous two articles. At scale, a test may need to simulate thousands of users, which will require more than just one JMeter client be set-up on one or many hosts, as shown in the 3rd JMeter article here, in a tutorial on Distributed Testing. Distributed Testing Remote Testing configuration in which the main JMeter client is located at one IP address, controlling the rest as they step through their own copies of the JMeter tests, based on their own unique data files as necessary, to simulate a user load across a network, a series of regions, or simply across many machines if limited by the size of the physical hardware [JMeter link for this image in text body below] One key aspect of a proper JMeter load test is distributed or remote testing, i.e. making use of more than one JMeter client at a time to simulate the user load on the Application server. There are many reasons to make use of a network of clients such as this, like mimicking cross-region user access to the Foundation server, simulating different levels of latency for different users, and increasing the overall number of users which can contribute to the load test, while minimizing the performance cost of hosting that many threads on any single server. A single JMeter client has a practical limit of 150-250 threads across all groups and requires about 1 CPU and 8 Gb of RAM. After this point, the amount of garbage collection and other processing there is for each client to do is substantial. As the client processes its own data and sends requests to the Application server at the same time, there are diminishing returns, and the responses begin to take longer (or errors start occurring) simply because of resource starvation within the client process rather than on the Application server. Therefore, distributed testing is required for most customers doing larger load tests using JMeter. Many applications will have more than a few hundred users and/or will have users accessing the system from a variety of regions and networks, each of which could have significantly different network latency. So, in order to work with the limitations of the JMeter executable and address regional concerns, distributed or remote testing is typically required for almost all of PTC’s customers who scale test with JMeter. With a simple (monolithic) distributed test, all of the JMeter clients are located on the same host and share an IP address, but each must be configured with a unique RMI port to connect to the controlling process. If these are located on a VM, then the resource specifications can merely be increased and the VM sized larger as necessary to ensure the network of JMeter clients runs as expected. Each JMeter client requires around 8 GB for its heap size and 1 CPU (with some additional resources for the host operating system). Multi-hosted testing becomes the required option when limited by physical hardware (or a relatively small VM hardware host). If there are only 4-core, 32-GB machines, then plan for a machine per every 3 JMeter clients. If simulating thousands of users, this could mean half a dozen machines or more are required, which can still sometimes work out to be more cost efficient than one large, 256 GB, VM hosted in the cloud. Using many hosts in physical locations can also simulate regions with different network characteristics. A tutorial for distributed testing across one host is shown here. For more information, see the Apache web articles on each topic: Remote Testing and Distributed Testing Step by Step. Tutorial: Step Up Distributed Test on One Host Copy the source directory for the whole JMeter project and rename it however many times as required. Here there are 22 JMeter clients side-by-side on a single, 256-GB VM (3000+ users): Each directory (shown above) is identical, except that the “jmeter.properties” files (found in the bin directory in each project) have unique settings, namely the RMI port: Each JMeter client must contain a copy of the same test scripts found on the main server: In the “jmeter.properties” file for the main server, specify the IPs and ports for each remote/distributed client (under remote_hosts), as shown: In this image, the IPs are all the same, with just the port differing from client to client. Here only 4 clients are in use, with the rest commented out for future tests. This is how to scale up and test incrementally more users each time. Just add another server to add another 150-250 users, until eventually the target number of users is reached, or the server is saturated. These IPs will differ if doing a true remote test, with each being the server location of the JMeter client within the same network. The combination of IP address and port will all still need to be unique, and communication between the overall jmeter controller and the clients over the RMI ports needs to be allowed by the network/firewalls. Note that the number of users is set using the parameter under “Test Plan” which was set-up last time. This value represents the number of users by specifying the number of threads per thread group, and it can remain the same for every client or vary accordingly, if for instance one region is smaller than another. The “Test Plan” parameters are shown here: To optionally start all of the clients at once in preparation for test execution, create a basic batch or shell script which goes to the bin directory of each agent and calls the start command: “jmeter-server”. In this image from a Windows JMeter host, only the first few agents are in use, but removing the “rem” to uncomment the other start command lines in this file would add more servers to be started. Note how the Java parameter for java.rmi.server.hostname must match the main JMeter client network configuration here for them to connect (see Apache links above for more information). This will start each of them in their own CMD window, which once closed, will terminate the JMeter client processes. Parameter like rampUp time within the main test script will scale with the number of client processes. For example, 100 users and 300 seconds rampUp with 4 clients results in 400 overall user threads that are all logged in after 300 seconds. Once all clients are running, then click Remote Start All to start the test across every server from a GUI (usually for debugging) or execute the test using command line: jmeter -n -r -t <test.jmx> -l <results.jtl> The main server sends the actions to the remote clients to run, so all the clients need is input parameters. For instance, a CSV file may exist in each directory which has different data from client to client, to create pseudo-random user loads and represent different kinds of user activity. The file shown in this image is different, and unique, in each of the client directories: Conclusion Here, we learned how to horizontally scale the load test, setting up more JMeter clients to facilitate larger, more complete user loads. We also discussed the difference between distributed and remote testing, and how the former is easier to set up and use, especially on VMs, but the latter might be better for simulating region differences and the impact of network latency. The latter will likely also be required if there are hardware constraints to consider, since each JMeter client needs about 8 GB for its heap, and another 8 GBs, or a core or two of similar size, is needed per every 3 JMeter clients for the communication and processing of data. Stay tuned for the next article on generating and reviewing the results of the load tests.

Aug 26, 2020

ThingWorx and Azure IoT Hub Benchmark This Azure IoT Hub Reference Benchmark showcases the capabilities of ThingWorx and Microsoft Azure IoT Hub, a cloud-hosted solution backend that facilitates secure and reliable communication between an IoT application such as ThingWorx and the devices it manages. By making use of this third party tool, remote monitoring with ThingWorx has never been simpler. In this benchmark, PTC verifies the reliability and scalability of ingesting data through the Azure IoT Hub into the Azure IoT Hub Connector(s) and ThingWorx Foundation. The preliminary version of this document focuses primarily on how the Azure IoT Hub’s capabilities modify and/or enhance the data ingestion and device management capabilities of ThingWorx. Find the benchmark document attached here, and stay tuned for more reference benchmarks to come!

Aug 21, 2020

Smoothing Large Data Sets Purpose In this post, learn how to smooth large data sources down into what can be rendered and processed more easily on Mashups. Note that the Time Series Chart widget is limited to load 8,000 points (hard-coded). This is because rendering more points than this is almost never necessary or beneficial, given that the human eye can only discern so many points and the average monitor can only render so many pixels. Reducing large data sources through smoothing is a recommended best practice for ThingWorx, and for data analysis in general. To show how this is done, there are sample entities provided which can be downloaded and imported into ThingWorx. These demonstrate the capacity of ThingWorx to reduce tens of thousands of data points based on a "smooth factor" live on Mashups, without much added load time required. The tutorial below steps through setting these entities up, including the code used to generate the dummy data. Smoothing the Data on Mashups Create a Value Stream for storing the historical data. Create a Data Shape for use in the queries. The fields should be: TestProperty - NUMBER timestamp - DATETIME Create a Thing (TestChartCapacityThing) for simulating property updates and therefore Value Stream updates. There is one property: TestProperty - NUMBER - not persistent - logged The custom query service on this Thing (QueryNamedPropertyHistory) will have the logic for smoothing the data. Essentially, many points are averaged into one point, reducing the overall size, before the data is returned to the mashup. Unfortunately, there is no service built-in to do this (nothing OOTB service). The code is here (input parameters are to - DATETIME; from - DATETIME; SmoothFactor - INTEGER): // This is just for passing the property name into the query var infotable = Resources["InfoTableFunctions"].CreateInfoTable({infotableName: "NamedProperties"}); infotable.AddField({name: "name", baseType: "STRING"}); infotable.AddRow({name: "TestProperty"}); var queryResults = me.QueryNamedPropertyHistory({ maxItems: 9999999, endDate: to, propertyNames: infotable, startDate: from }); // This will be filled in below, based on the smoothing calculation var result = Resources["InfoTableFunctions"].CreateInfoTable({infotableName: "SmoothedQueryResults"}); result.AddField({name: "TestProperty", baseType: "NUMBER"}); result.AddField({name: "timestamp", baseType: "DATETIME"}); // If there is no smooth factor, then just return everything if(SmoothFactor === 0 || SmoothFactor === undefined || SmoothFactor === "") result = queryResults; else { // Increment by smooth factor for(var i = 0; i < queryResults.rows.length; i += SmoothFactor) { var sum = 0; var count = 0; // Increment by one to average all points in this interval for(var j = i; j < (i+SmoothFactor); j++) { if(j < queryResults.rows.length) if(j === i) { // First time set sum equal to first property value sum = queryResults.getRow(j).TestProperty; count++; } else { // All other times, add property values to first value sum += queryResults.getRow(j).TestProperty; count++; } } var average = sum / count; // Use count because the last interval may not equal smooth factor result.AddRow({TestProperty: average, timestamp: queryResults.getRow(i).timestamp}); } } Create a Timer for updating the property values on the Thing. The Timer should subscribe to itself, containing this code (ensure it is enabled as well): var now = new Date(); if(now.getMilliseconds() % 3 === 0) // Randomly reset the number to simulate outliers Things["TestChartCapacityThing"].TestProperty = Math.random()*100; else if(Things["TestChartCapacityThing"].TestProperty > 100) Things["TestChartCapacityThing"].TestProperty -= Math.random()*10; else Things["TestChartCapacityThing"].TestProperty += Math.random()*10; Don't forget to set the runAsUser in the Timer configuration. To generate many properties, set the updateRate to a small value, like 10 milliseconds. Disable the Timer after many thousands of properties are logged in the Value Stream. Create a Mashup for displaying the property data and capacity of the query to smooth the data. The Mashup should run the service created in step 4 on load. The service input comes from widgets on the mashup: Bindings: Place a Time Series Chart widget in the bottom of the Mashup layout. Bind the data from the query to the chart. View the Mashup. Note the difference in the data... All points in one minute: And a smooth factor of 10 in one minute: Note that the outliers still appear, and the peaks are much easier to see. With fewer points, trends become easier to spot and data is easier to understand. For monitoring the specific nature of the outliers, utilize alerts and other types of displays. Alternative forms of data reduction could involve using the mean of each interval (given by the smoothing factor) or the min or max, as needed for the specific use case. Display multiple types of these options for an even more detailed view. Remember, though, the more data needs to be processed, the slower the Mashup will load. As usual, ensure all mashups are load tested and that the number of end users per Mashup is considered during application design.

Sep 3, 2019

Building More Complex Tests in JMeter Overview This is the second in a series of articles which help inform how to do user load testing in ThingWorx. This article picks up where the previous left off, continuing with the project created there. The screenshots do appear a little differently here because a new “Look and Feel” was selected for the JMeter application (switched from “Metal” to “Windows Classic”) to provide more readable screenshots. In this guide, we are going to make the very simple project more complex, working towards a better representation of a real load test. The steps below walk you through how to create and configure thread groups and parameterize the processes and procedures defined by each thread group. Adding More Thread Groups Within JMeter, thread groups are used to organize the HTTP requests in a test into various processes or procedures, such that different mashups (and all of the HTTP requests required on each) or processes can be executed simultaneously by different thread groups throughout the test. Varying the number of threads in a group is how to vary the number of users accessing that mashup during the test, a number which increases over time in accordance with the ramp up time. The thread group name will also show up in the Summary Report tab at the end of the test, making it easier to parse through and graph the results. Start by renaming the existing thread groups so that their process or procedure names are recognizable at the end of the test: Highlight the line which reads “HTTP(S) Test Script Recorder”. (Optional) Add an Include filter to only capture the URLs relevant to your application using the Requests Filtering tab. For example, with the escape character \ necessary for ‘.’, myhost.mycompany.mydomain becomes: myhost\.mycompany\.mydomain Now record a new thread group clicking the “Start” button: Once the control box shows up in the top left corner, click to open a browser and access the ThingWorx Navigate application. Then click on “View Parts List” or some other mashup: Once the mashup loads, search using a string and/or wildcard, or click on one of the recent results if any exist: Wait for the mashup to fully load with the details on that part or assembly, and then click “Stop” in the recording controller window: All of the HTTP requests performed in the process of loading and using this mashup will be added to the JMeter project here: Next, add a new thread group manually to the project: Highlight the newly created “Thread Group” (default name) and rename it to something that relates to the nature of that process: Drag and drop the new collection of requests so that it is considered a part of the new thread group: Then drag the whole group up so that it is next to the other thread groups in the test: In more complex projects, different thread groups may be added at different times, and each time, the service calls are all assigned an index (at the end of the request URL, for example: <request>-344). These indexes may not be unique depending on how and when the thread groups were created, especially in more complex tests. The easiest way to fix this issue is save the test from the JMeter GUI, then open the JMX file in a text editor and perform a find and replace within the relevant section of text. This is usually done using a regular expression for the number. For example, if the request name indexes are numbered -500 through -525, a regular expression to increase them to -700 through -725 would be (in Notepad++): Find: -5([0-9])([0-9]) Replace: -7\1\2 Note that if you do not use a Request Filter, sometimes the recorder will log URLs that are not part of the target application, like these “generate” samplers. These URLs are typically happening in the background of the browser to track performance, security and errors. These can be deleted: At other times, you will be repeating steps that are already part of another thread group, for example: logging in. This genidkey is a part of the login, as you can see if you look back at the login thread group. Because logging in is only necessary once, and it is assumed to be complete by the time the test starts on the second thread group, this entire section can be deleted: To see for sure if a request can be removed because it is called in a previous thread group, do a non-case-sensitive search for the name of the request: All of the requests found in this particular instance were performed in the previous thread group, so therefore this entire category can also be deleted: Another odd thing you may see (if you do not use a Request Filter with the recording feature) are “blank” requests like these: The recorder isn’t sure what to call these “non-requests”, so anything like this that isn’t an actual URL within the target application should be deleted. Static downloads should be disabled or deleted from scale testing since they are usually cached by the user browser client. In this ThingWorx example, there are static “MediaEntites” which can be deleted or disabled: Within the JMeter client there is no good way to highlight and reset them all at once, unfortunately. The easiest way to remove all of these at once is to open the JMX file in a text editor and use regex expressions for search and replace “enabled=true” with “enabled=false”. Most text editors have examples on how to use regular expressions within their Help topics. The above example is for Notepad++. Parameterize Thread Groups Parameterization is usually the part of creating a JMeter test that takes the most effort and knowledge. Some requests will require the same information for every thread, information which can therefore be defined statically within the JMeter element rather than being parameterized. Some values used within the JMeter test script can be parameterized as inputs in the top level of the test controller, for example: Duration, RampUp time, ApplicationHost, ApplicationPort. Other values may be unique to only one thread group and could be defined in a User Defined Variables element within that group controller. The value(s) used within a request can also be determined on the fly by the results of earlier requests within a thread group. These request results typically must be post-processed and parameterized for later thread elements to function correctly. The highest level values that are unique to each thread should be inputs from a CSV file that are passed into the threads as parameters, for example Username and Password. Data used within the test is usually parameterized in order to better emulate real world application use by multiple users. In the following example, we will parameterize the number of users for each thread group by adding a user- defined variable. Start by selecting the new thread group and parametrizing the number of threads (i.e. the number of users accessing this mashup at a time during the test). The way to enter a variable is with syntax like this: $(searchandviewpartstructure_threads) In this case, make this a user defined variable: or a variable for the whole project, by highlighting “Test Plan” and adding the information there. Begin looking at the samplers to see what types of things need to be parameterized in your test. Consider such things as: thread count (as shown above), ramp up time (also depicted above), duration, timings, roles, URL arguments, info table information, search result information, etc. Another example here parameterizes the search parameters for a query by adding an overall search string column to a CSV file (which can then be randomly generated by some other script): First, parameterize the body data of the request by highlighting the request, and changing the value of the desired field to something like this: $(searchString) Next, define the parameter under the Test Plan and set a default value: Now define the searchString column again as part of the CSV Data Set: Now it can be varied simply by providing different pseudo-random values with wildcards and/or known values in the CSV file. Post Processors and Extractors Most JMeter load tests become more complex when the results of one request are sent as parameters into later requests. This is done in JMeter by using Post Processors (Extractors), tools which facilitate extracting information out of the request results so it can be assigned to JMeter parameters. There are many different types of extractors which can process the results of previous requests: CSS Selector Extractor – commonly used extractor for values returned as html attributes JSON Extractor – processes JSON objects using regular expressions BeanShell Post Processor –facilitates using code scripts to process return text when needed Regular Expression Extractor – JMeter supports use of regular expressions on request results The JSON Extractor can be used to find and store information like the partOID number for a Windchill part as a parameter in JMeter, which can then be used to build more realistic workflows within the JMeter test. The example below steps you through setting up a JSON Post Processor. Start by right-clicking the request that contains the results of our search. Then click “Add” > “Post Processors”> “JSON Extractor”, as shown in the image below: The extractor will now show up under that request as a sub-menu item. Select it, and name the variable something easy to reference. For the JSON Path expressions, pull the object number or some other identifying characteristic out of the search results: $.rows[0].objNumber for example. Another option would be to take information like the partOID number send that into the search string field, by defining both as properties and having one refer to the other. To pull the partOID out, use a Regular Expression Extractor: Another thing to parameterize is the summary report result file name. Adding in the number of users and ramp up time can result in files that are easier to reference later being stored on your machine. We will cover generating and reviewing Summary Reports in full in the next article in this series. Conclusion In this article we saw how to create new thread groups, removing extraneous requests from those groups, and reduce the overall ambiguity of which thread groups are representing which processes or mashup calls. We also covered how to parameterize the individual requests as well as the summary report. Note that things like Windchill URL and hostname, search parameters and part IDs, timings, durations, offsets, anything at all that influence the results of the test, should not be hard-coded. It is better to create variables for these things to ensure that all of the various simulated activities are configured in the exact same way every time. That way, the system can be tested again and again under various strains and loads until the capabilities of the application are verified.

Jul 30, 2020

Setting Up the Azure Load Balancer with a ThingWorx High Availability Deployment Purpose In this post, one of PTC’s most experienced ThingWorx deployment architects, Desheng Xu, explains the steps to configure Azure Load Balancer with ThingWorx when deployed in a High Availability architectural model. This approach has been used successfully on customer implementations for several ThingWorx 7.x and 8.x versions. However, with some of the improvements planned for ThingWorx High Availability architecture in the next major release, this best practice will likely change (so keep an eye out for updates to come). Azure Load Balancer The overview article What is Azure Load Balancer? from Microsoft will give you a high-level understanding of load balancers in general, as well as the capabilities and limitations of Azure Load Balancer itself. For our purposes, we will leverage Azure Load Balancer's capability to manage incoming internet traffic to ThingWorx Platform virtual machine (VM) instances. This configuration is known as a Public Load Balancer. Important Note: Different load balancers operate at different “layers” of the OSI Model. Azure Load Balancer operates at Layer 4 (Transport Layer) – it is indifferent to the specific TCP Payload. As a result, you must either configure both the front-end and back-end to work on SSL, or configure both of them to work on non-SSL communications. “SSL Termination” or “TLS Offload” is not supported by Azure Load Balancer. Azure offers multiple different load balancing solutions. If you need some guidance on choosing the right one for you, I highly recommending reviewing the Microsoft DevBlog post Azure Load Balancing Solutions: A guide to help you choose the correct option. High-Level Diagram: ThingWorx High Availability with Azure Load Balancer To keep this article focused, we will not go into the setup of ThingWorx in a High Availability architecture. It will be assumed that ThingWorx is working correctly and the ZooKeeper cluster is managing failover for the Platform instances as expected. For more details on setting up this configuration, the best place to start would be the High Availability Administrator’s Guide. Planning In this installation, let's assume we have following plan (you will likely need to change these values for your own implementation): Azure Load Balancer will have a public facing domain name: edc.ptc.io Azure Load Balancer will have a public IP: 41.35.22.33 ThingWorx Platform VM instance 1 has a local computer name, like: vm1 ThingWorx Platform VM instance 2 has a local computer name, like: vm2 ThingWorx Preparation By default, the ThingWorx Platform provides a healthcheck end point at /Thingworx/Admin/HA/LeaderCheck , which can only be accessed with a credential configured in platform-settings.json : "HASettings": { "LoadBalancerBase64EncodedCredentials":"QWRtaW5pc3RyYXRvcjphZG1pbg==" } However, Azure Load Balancer does not permit this Health Check with a credential with current versions of ThingWorx. As a workaround, you can create a pings.jsp (using the attached JSP example code) in the Tomcat folder $CATALINA_HOME/webapps/docs . This workaround will no longer be needed in ThingWorx 8.5 and newer releases. There are two lines that likely need to be modified to meet your situation: The hostname in final String probeURL (line 10) must match your end point domain name. It's edc.ptc.io in our example, don’t forget to replace this with your real hostname! You also need to add a line in your local hosts file and point this domain name to 127.0.0.1 . For example: 127.0.0.1 edc.ptc.io The credential in final String base64EncodedCredential (line 14) must match the credential configured in platform-settings.json. Additionally: Don't forget to make the JSP file accessible and executable by the user who starts Tomcat service for ThingWorx. These changes must be applied to both ThingWorx Platform VM instances. Tomcat needs to be configured to support SSL on a specific port. In this example, SSL will be enabled on port 8443. Please make sure similar configuration is included in $CATALINA_HOME/conf/server.xml <Connector protocol="org.apache.coyote.http11.Http11NioProtocol" port="8443" maxThreads="200" scheme="https" secure="true" SSLEnabled="true" keystoreFile="/opt/yourcertificate.pfx" keystorePass="dontguess" clientAuth="false" sslProtocol="TLS" keystoreType="PKCS12"/> The values in keystoreFile and keyStorePass will need to be changed for your implementation. While pkcs12 format is used in above example, you can use a different certificate formats, as long as it is supported by Tomcat (example: jks format). All other parameters, like maxThreads , are just examples - you should adjust them to meet your requirements. How to Verify Before configuring the load balancer, verify that health check workaround is working as expected on both ThingWorx Platform instances. You can use following command to verify: curl -I https://edc.ptc.io:8443/docs/pings.jsp The expected result from active node should look like: HTTP/1.1 200 There will be three or more lines in output, depending on your instance configuration but you should be able to see the keyword: HTTP/1.1 200. Expected result from passive node should look like: HTTP/1.1 503 Load Balancer Configuration Step 1: Select SKU Search for “load balancer” in the Azure market and select Load Balancer from Microsoft Verify the correct vendor before you create a Load Balancer. Step 2: Create load balancer To create a proper load balancer, make sure to read Microsoft’s What is Azure Load Balancer? overview to understand the differences between “basic” and “standard” SKU offerings. If your IT policy only requires SSL communication to the outside but doesn't require a SSL communication in a health probe, then the “basic” SKU should be adequate (not considering zone redundancy). You have to decide following parameters: Region Type (public or Internal) SKU (basic or standard) IP address Public IP address name Availability zone PTC cannot provide specific recommendations for these parameters – you will need to choose them based on your specific business needs, or consult Microsoft for available offerings in your region. Step 3: Start to configure Once a load balancer is successfully created by Azure, You should be able to see: Step 4: Confirm frontend IP Click frontend IP configuration at left side and you should be able to see public IP address configuration. Please make sure to register this IP with your domain name ( edc.ptc.io in our example) in your Domain Name Server (DNS). If you unfamiliar with DNS configuration, you should consult with the administrator of your DNS server. If you are using Azure DNS, this Quickstart article on creating Azure DNS Zones and records may help. Step 5: Configure Backend pools Click Backend pools and click “Add” to add a backend pool definition. Select a name for your Backend pool (using ThingworxBackend in our example). Next step is to choose Virtual network. Once you select Virtual network, then you can choose which VM (or VMs) you want to put behind this load balancer. The VM should be the ThingWorx VM instance. In a high availability architecture, you will typically need to choose two instances to put behind this load balancer. Please Note: The “Virtual machine status” column in this table only shows VM status, but not ThingWorx status. ThingWorx running status will be determined by the health probe configured in the next step. Step 6: Configure Health Probe Health Probe will be used to determine the ThingWorx Platform’s running status. When a ThingWorx Platform instance is running as the leader, then it will give HTTP status code 200 during a health probe. The Azure Load Balancer will rely on this status code to determine if the platform is running properly. When a ThingWorx platform VM is not responding, offline, or not the leader in a High Availability setup, then this health probe will provide response with a different HTTP status code other than 200. For the health probe, select HTTPS for the protocol. In our example port 8443 is used, though another port can be selected if necessary. Then, provide the “/docs/pings.jsp” we created earlier as the probe’s path. You may need to change this path value if you put this file in a different location. Step 7: Configure Load balancing rules. Select “Load balancing rules” from left side and click “Add” Select TCP as protocol, in our example we are using 443 as front-end port and 8443 as back-end port. You can choose other port numbers if necessary. Reminder: Azure Load Balancer is a layer 4 (Transport Layer) router – it cannot differentiate between HTTP or HTTPS requests. It will simply forward requests from front-end to back-end, based on port-forwarding rules defined. Session persistence is not critical for current versions of ThingWorx as only one active node is currently permitted in a High Availability architecture. In the future, selecting Client IP may be required to support active-active architectures. Step 8: Verify health probe Once you complete this configuration, you can go to the $CATALINA_HOME/logs folder and monitor latest local_access log. You should see similar entries as pictured below - HTTP 200 responses should be observed from the ThingWorx leader node, and HTTP 503 responses should be observed from the ThingWorx passive node. In the example below, 168.63.129.16 is the internal IP Address of the load balancer in the current region. Step 9: Network Security Group rules to access Azure Load Balancer On its own, Azure Load Balancer does not have a network access policy – it simply forwards all requests to the back-end pool. Therefore, the appropriate Network Security Group associated with the backend resources within the resource group should have a policy to allow access to the destination port of the backend ThingWorx Foundation server (shown as 8443 here, for example). The following image displays an inbound security rule that will accept traffic from any source, and direct it to port 443 of the IP Address for the Azure Load Balancer. Enjoy!! With the above settings, you should be able to access ThingWorx via: https://edc.ptc.io/Thingworx (replacing edc.ptc.io with the hostname you have selected). Q&A Can I configure the health probe running on a port other than the traffic port (8443) in this case? Yes – if desired you can use a different port for the health probe configuration. Can I use different protocol other than HTTPS for health probe? Yes – you can use different protocol in the health probe configuration, but you will need to develop your own functional equivalent to the pings.jsp example in this article for the protocol you choose. Can I configure ZooKeeper to support the health probe? No – the purpose of the health probe is to inform the Load Balancer which node is providing service (the leader), not to select a leader. In a High Availability architecture, ZooKeeper determines which VM is the leader and talking with the database. This approach will change in future releases where multiple ThingWorx instances are actively processing requests. How well does Azure Load Balancer scale? This question is best answered by Microsoft – as a starting point, we recommend reading the DevBlog post: Azure Load Balancing Solutions: A guide to help you choose the correct option. How do I access logs for Azure Load Balancer? This question is best answered by Microsoft – as a starting point, we recommend reviewing the Microsoft article Azure Monitor logs for public Basic Load Balancer. Do I need to configure specifically for Websocket and/or AlwaysOn communication? No – Azure Load Balancer is a Layer 4 (Transport Protocol) router - it only handles TCP traffic forwarding. Can I leverage this load balancer to access all VMs behind it via ssh? Yes – you could configure Inbound NAT rules for this. If you require specific help in configuring this, the question is best answered by Microsoft. As a starting point, we recommend reviewing the Microsoft tutorial Configure port forwarding in Azure Load Balancer using the portal. Can I view current health probe status on a portal? No – Unfortunately there is no current approach to do this with Azure Load Balancer.

Jul 17, 2019

The IoT Enterprise Deployment Center’s goal is to create and share knowledge around the best practices for architecting, designing, and deploying successful, enterprise-scale Thingworx IoT Solutions. To accomplish this goal, the EDC team takes a “real world” approach, using simulated IoT assets and users to benchmark the capabilities of different Thingworx deployment configurations. First, each implementation is pushed to its limit in an effort to establish real-world baselines, metrics which can be used to help customers determine which architecture choices will work for their custom needs. Then, each implementation is pushed beyond its limits, providing useful insight into where and why things fail, and illuminating potential implementation changes which could push the boundaries further. Through the simulations testing to come, the EDC will be publishing the resulting benchmarks for all to see! These benchmarks will include details on implementation goals and performance metrics for different stages of deployment. Additionally, best-practice articles which illustrate how to deploy the different architectural components (those referenced within the benchmarks) will also be posted, highlighting the optimal approach to integrating everything into the Thingworx platform. Stay tuned to see more about just how versatile the ThingWorx Platform can be! We look forward to discussing these findings as they are published right here on the PTC Community.

Jul 17, 2019

The Property Set Approach This article details an approach developed by Prachi Rath and Roy Clarke, refined by the EDC team in the December 2019's Remote Monitoring of Assets Reference Benchmark , and used to handle multi-property business rules in an Enterprise ThingWorx application. Introduction If there are logic rules which depend upon multiple properties, and each property receives its updates one at a time, then each property will need to have an identical subscription, because there is no way for any one subscription to know the most up-to-date values for the other properties. This inefficient approach would create redundancy and sizing constraints, reducing the capacity of the application to scale up to the Enterprise level. The Property Set Approach resolves this issue by sending in all property updates as one Info Table or JSON property (called the “property set”), which can then have a single subscription. The property set is assembled on the Edge when an update needs to be sent, and then the Platform dissects, processes, and stores the data within this property set as required by the business logic. This approach also involves caching the last property value into a runtime variable so that it can be referenced within the business logic subscription without having to be retrieved from the database. This can significantly improve the runtime of the subscription, reducing the number of resources required to sustain the business logic and ensuring that any alerts or events resulting from the business logic occur as soon as possible. It also reduces the load on the database, ensuring that data ingestion can complete unhindered. So, while there are many benefits to this approach, it is also more complicated. It tightly couples the development of the Edge and Platform code and increases the application complexity, making it slightly less easy to maintain the application in the long run. The property set also requires a little more bandwidth and a more stable internet connection between Edge devices and the Platform since there is more metadata in an Info Table property, and therefore every update is slightly larger than it would be otherwise. So this approach is only recommended when multi-property rules are a requirement of the application and a stable internet connection exists between the Edge and Platform. Platform Implementation I. Create an Info Table (or JSON) Property This tutorial uses the out-of-the-box Data Shape called NamedVTQ for the Info Table property, which is defined on a Thing Template as a remote property. It is important that this is not marked as persistent or logged, as the purpose is to reduce the amount of database writes and reads required by the Platform. The Info Table property has the following property definition: <PropertyDefinition aspect.dataChangeType="ALWAYS" aspect.dataShape="NamedVTQ" aspect.isPersistent="false" baseType="INFOTABLE" isLocalOnly="false" name="numberPropertySetAsInfotable"/> II. Create a Data Change Event Subscription for the Info Table Property The subscription has three parts: Cache the last value for the property in a runtime variable Start off the business rules processing, sending in the whole Info Table Send the Info Table to be logged as individual local properties in the database // First step caches the last Value, refer to the next step… // Second step sets off the business rules processing with the Info Table me.ScaleTestBusinessRuleForPropertySetAsInfotable({ PropertySetAsInfotable: eventData.newValue.value }); // Third step sends the Info Table as one property into a service which parses it into the // individual properties, updating both the runtime properties on the remote thing and the database me.UpdatePropertyValues({ values: eventData.newValue.value /* INFOTABLE */ }); III. Set-Up Caching Each property which needs to be cached should be created on the Thing Template level and named in a similar way, say by placing the word “Last” at the end, such as “Property1” => “Property1Last”, “Property2” => “Property2Last”, etc. This property should NOT be logged or persistent, as the point of this is to store the most recent value in memory, removing any superfluous dependency on database queries in the process. Note that while storing the property in runtime memory makes it much more accessible, it also means that the property needs to be rewritten manually upon Platform restart. Additional code (not provided here) must be written to populate these properties from the database upon application start-up. The following code should be placed in the data change event subscription (option 1 in the case where only a few properties need caching, or option 2 if every property value needs to be cached): Option 1: Some but Not All Properties Need Caching // Names of properties for which you want to cache the last value var propertyNames = ['number1', 'number2']; // Loop through the properties and cache their time if they are found in the property set propertyNames.map(assignLast); // This function can be split into two functions for Age and Last separately if need be function assignLast(propertyName) { logger.debug("Looping for property -> "+ propertyName); var searchprop = new Object(); searchprop.name = propertyName; property = eventData.newValue.value.Find(searchprop); if(property){ logger.debug("Found Row. Name= " + property.name); var lastPropertyName = propertyName+"Last"; if(property.value) { // Set the cache property on me, this entity, to the current property value me[lastPropertyName] = me[propertyName]; } } else { logger.debug("Property Not Found in property set -> " + propertyName); } } Option 2: All Properties Need Caching var rowCount = eventData.newValue.value.getRowCount(); for(var i=0; i<rowCount; i++){ logger.warn("property name->" + eventData.newValue.value[i].name + "----- property new value->" + eventData.newValue.value[i].value.value); var propertyName = eventData.newValue.value[i].name; var lastPropertyName = propertyName+"Last"; me[lastPropertyName] = me[propertyName]; logger.warn("done last subscription, last property value for lastPropertyName" + me[lastPropertyName]); } Useful Platform Code Snippets I. Age Calculation var date1 = new Date(); var date2 = me.GetPropertyTime({ propertyName: propertyName /* STRING */ }); var result = millisToMinutesAndSeconds (dateDifference(date1, date2) ); // This function converts from an unintelligibly large number in milliseconds to something formatted in minutes and seconds function millisToMinutesAndSeconds(millis) { var minutes = Math.floor(millis / 60000); var seconds = ((millis % 60000) / 1000).toFixed(0); return (seconds == 60 ? (minutes+1) + ":00" : minutes + ":" + (seconds < 10 ? "0" : "") + seconds); } II. Sort the Info Table by Time var params = { sortColumn: "time" /* STRING */, t: me.propertySet/* INFOTABLE */, ascending: ascending /* BOOLEAN */ }; var result = Resources["InfoTableFunctions"].Sort(params); III. Search the Info Table for a Property var searchprop = new Object(); searchprop.name = propertyName; property = PropertySetAsInfotable.Find(searchprop); if(property === null){ logger.info("Property Not Found -> " + propertyNumber1); } else { logger.info("Found Row. Name= [" + property.name + "], value= " + property.value.value); } Edge Implementation This example implementation uses the .NET Edge SDK to build a property set Info Table at the Edge. I. Define the Data Shape A standard Data Shape is used (NamedVTQ), but because this Data Shape is not exposed in the Edge SDK code, it has to be created manually. // Data Shape definition for NamedVTQ FieldDefinitionCollection namedVTQFields = new FieldDefinitionCollection(); namedVTQFields.addFieldDefinition(new FieldDefinition(CommonPropertyNames.PROP_NAME, BaseTypes.STRING)); namedVTQFields.addFieldDefinition(new FieldDefinition(CommonPropertyNames.PROP_VALUE, BaseTypes.VARIANT)); namedVTQFields.addFieldDefinition(new FieldDefinition(CommonPropertyNames.PROP_TIME, BaseTypes.DATETIME)); namedVTQFields.addFieldDefinition(new FieldDefinition(CommonPropertyNames.PROP_QUALITY, BaseTypes.STRING)); base.defineDataShapeDefinition("NamedVTQ", namedVTQFields); II. Define the Info Table Property The property defined should NOT be logged or persistent, and it can be read-only, since data is always pushed from the Edge and read from the server cache when accessed on the Platform. Note that the push type of the info table property MUST be set to "ALWAYS" (if set to "VALUE", the data change event will only fire if the number of rows changes). // Property Set Definitions [ThingworxPropertyDefinition( name = "DevicePropertySet", description = "Alternative representation of properties as an Info Table for rules processing", baseType = "INFOTABLE", category = "Status", aspects = new string[] { "isReadOnly:true", "isPersistent:false", "isLogged:false", "dataShape:NamedVTQ", "cacheTime:0", "pushType:ALWAYS" } ) ] III. Define a Property to Store the GOOD Quality Status private static String QUALITY_STATUS_GOOD = QualityStatus.GOOD.name(); IV. Define Functions to Populate the Value Collections An Info Table is really just made up of many Value Collections, where each Value Collection is considered a row. These services take in the name and value of a property and return a Value Collection object which can be added to the property set Info Table. public ValueCollection createNumberValueCollection(String name, double value) { ValueCollection vc = new ValueCollection(); // Add quality and time entries to the Value Collection vc.SetStringValue(CommonPropertyNames.PROP_QUALITY, QUALITY_STATUS_GOOD); vc.SetDateTimeValue(CommonPropertyNames.PROP_TIME, new DatetimePrimitive(DateTime.UtcNow)); vc.SetStringValue(CommonPropertyNames.PROP_NAME, name); vc.SetNumberValue(CommonPropertyNames.PROP_VALUE, value); return vc; } public ValueCollection createBooleanValueCollection(String name, Boolean value) { ValueCollection vc = new ValueCollection(); // Add quality and time entries to the Value Collection vc.SetStringValue(CommonPropertyNames.PROP_QUALITY, QUALITY_STATUS_GOOD); vc.SetDateTimeValue(CommonPropertyNames.PROP_TIME, new DatetimePrimitive(DateTime.UtcNow)); vc.SetStringValue(CommonPropertyNames.PROP_NAME, name); vc.SetBooleanValue(CommonPropertyNames.PROP_VALUE, value); return vc; } V. Build the Property Set Call this code from the processScanRequest method to build the property set. // Create an instance of a new Info Table using the standard "NamedVTQ" Data Shape InfoTable propertySet = new InfoTable(getDataShapeDefinition("NamedVTQ")); // Set name/value for Temperature using convenience function propertySet.addRow(createNumberValueCollection("Temperature", temperature)); // Set name/value for Pressure using convenience function propertySet.addRow(createNumberValueCollection("Pressure", pressure)); // Set name/value for TotalFlow using convenience function propertySet.addRow(createNumberValueCollection("TotalFlow", this._totalFlow)); // Set name/value for InletValve using convenience function propertySet.addRow(createBooleanValueCollection("InletValve", inletValveStatus)); // Set name/value for FaultStatus using convenience function propertySet.addRow(createBooleanValueCollection("FaultStatus", faultStatus)); // Set the property set Info Table property base.setProperty("DevicePropertySet", propertySet); VI. Update the subscribed properties These two lines of code update the properties and events, actually sending the property set (containing all property updates) to the Platform. base.updateSubscribedProperties(15000); base.updateSubscribedEvents(60000); Conclusion Following these steps will enable the Edge to build a property set before sending any property updates to the Platform. The Platform can then rely on caching to process the business logic with no database dependency, which is faster and more efficient than any other approach. Finally the updates are still written to the database, so in the end, there is no functional difference between using a property set and binding each property individually. Please don't hesitate to comment here with any questions about this approach.

Feb 21, 2020

ThingWorx 8.5 Sizing Guide Sizing is a very important part of the application design process, answering such questions as: how much hardware is required? What specifications does this hardware need to handle the expected load? And therefore, what is the overall cost of setting up and maintaining the ThingWorx environment? Properly sizing the environment before development begins ensures that there are no unexpected costs or limitations to application functionality later on down the road. "Hardware sizing is driven by many factors - some more easily calculated than others", as stated in the new ThingWorx 8.5 Sizing Guide. "Measures like data streaming frequency (the data ingestion component) and HTTP request volume (the data visualization component) are easily calculated... However, sizing considerations for the data processing component of the application can depend largely on business use cases and application design." Enterprise-Ready applications have the capacity to handle all aspects of an IoT application, from data ingestion and processing to data visualization, as detailed in the friendly infographic above, which many will recognize from LiveWorx. Inside the ThingWorx 8.5 Sizing Guide, there are formulas designed to help size the more analytical aspects of the application, as well as descriptions of other factors and how they (conceptually) play a role in sizing. There are also two application design examples which step the reader through the calculations, the comparisons, and the selections of hardware for each use case. New in this version, these examples have been simulated in the real world to prove that the theory behind these calculations is sound, and to demonstrate the full process of designing, sizing, and testing an application. One of the examples (shown here) sizes a Connected Product Solution, something which has many, many remote things in the field, each writing to the Platform at a slower rate, for consumption by a large number of general users, who don't access the same mashups many times nor refresh their view very often. The second example is much more complex, modeling an industrial use-case, where there are many different kinds of users each accessing the mashups many times, fewer things, and more variations in the types of properties each thing possesses. These examples are designed to help anyone with any use case step through the sizing of their application properly. Please check out the new ThingWorx 8.5 Sizing Guide, especially because each version of ThingWorx is different and must be sized accordingly. Comments and questions about the guide are very welcome right here on this thread!

Jan 30, 2020

Leveraging Dell and VMWare for Asset Monitoring in Connected Factories As an extension of the Connected Factory Reference Benchmark performed on Microsoft Azure , PTC partnered with Dell Technologies in producing this document, a baseline which illustrates the effectiveness of ThingWorx and Kepware when combined with Dell and VMWare technologies to create solutions for on-premises and hybrid Connected Factory implementations. Please join us in thanking Bhagyashree Angadi, Brian Anzaldua, Todd Edmunds, Mike Hayes, and the Dell Customer Solution Center team in Limerick, Ireland for working with the IOT Enterprise Deployment Center on this benchmark! This benchmark is of a very similar design to a previous publication, but this time designed specifically with Dell Technologies in mind. In a Dell/VMWare architecture, the close proximity of Kepware Server and ThingWorx Foundation provides ideal conditions for network throughput between these components. Combined with the ability to easily monitor and resize virtual machines as your business needs evolve, these hardware configurations can be very effective in on-premises or hybrid deployment scenarios.

Oct 8, 2020

May 15, 2020

The DPM User Experience Written by Tori Firewind, IoT EDC Team As discussed in a previous post, DPM is a tool designed to be beneficial at all levels of a company, from the operators monitoring automated data on production events from the factory machines themselves, to the production supervisors who need to establish, task out, and track machine maintenance and improvement measures. DPM also engages the continuous improvement and plant leadership, by providing a standardized way to monitor performance that ultimately rolls up to the executive level. The end users of DPM are therefore diverse both in how they access DPM, and how they make use of its various features. One of the perks to building DPM on top of the ThingWorx Foundation is that many of the webpages (called “mashups”) within ThingWorx are already responsive, and any which aren’t responsive OOTB can be modified and custom designed for different size viewing screens to ensure that if necessary, end users can access DPM from a variety of locations and devices. Most of the time, end users will be accessing mashups from hard-wired dashboards mounted on the actual devices, or from wireless laptops which have standard size screens with standard resolutions. For use cases involving phones or tablets, however, it may be necessary to see how DPM will perform across a variety of bandwidth and latency conditions. Often, cellular or satellite connection is a must to facilitate field team cooperation, and 5G networks often result in worsened performance. So, to demonstrate the influence of bandwidth and latency on the responsiveness of DPM, the Production Dashboard was loaded in the Google Chrome browser repeatedly under varying conditions. This dashboard is the webpage most operators and field users would access to log event information and production details (so it is widely used by end users). This provides a sort of benchmark of the DPM solution, something which indicates what can be expected and tells us a few things about how DPM should be deployed and configured. Latency was introduced by hosting the servers involved in the test in different regions (all Azure cloud hosted servers, one in US East, one US West, and one in Japan East). Bandwidth was introduced using a tool on the PC with either no bandwidth or 4 megabits/second. Browser caching was turned on and off as well, to simulate the difference between new and return users; new users would not have the webpage cached, so their load times are expected to be longer. Tomcat compression was also configured in half of the runs to demonstrate the importance of compression for optimal performance. Each of these 24 scenarios was then tested 10 times from each location, and the actual data can be found in the attached benchmark document (a working solution benchmark, which is not designed to be referenced directly, as matters of infrastructure may influence the exact performance of the solution). Even with bandwidth, every region sees better performance for return users versus new users, which may be important to note. However, because DPM field users most commonly access DPM often, the return user time is a better indicator of adoption, and those numbers look great in our simulations. Notice the top line which shows the very worst of mobile performance, what happens over networks with bandwidth when Tomcat Compression is not enabled. Load times vary only slightly for regular networks when Tomcat Compression is enabled, and they vastly improve performance across regions and on mobile networks, so it is highly recommended (instructions on how to enable are below). Key Takeaways Latency and bandwidth impact DPM performance in exactly the way one would expect of a web application. While any DPM server can be accessed from any region, regions with more latency will experience delays proportional to the amount of latency. In the chart here, find the three regions represented three times by three different colors (different from the charts above): The three different shades of each color represent the different regions Green represents the optimal configuration settings (Tomcat compression enabled, caching turned on) for returning users with bandwidth limitations (i.e. mobile networks like 5G) Blue shows first-time page visitors with no bandwidth limitations Purple shows first-time visitors that do have bandwidth limitations The uncompressed first-time load for mobile users (those with bandwidth limitations imposed) within the same region is also given to demonstrate the importance of enabling Tomcat Compression (load times only get worse without compression the farther the region) Notice how the green series has lower load times across the board than the blue one, meaning that return users even with bandwidth limitations have better performance across every region than new users. Also notice how the gap is larger between lighter colors and darker colors, where the darker the color, the farther the region from the DPM servers. This indicates that network latency has a more significant influence on performance versus bandwidth, with only longer running transactions like file uploads seeing a significant performance hit when on a network with bandwidth limitations. Find out how to enable tomcat compression and review the full solution benchmark in the document attached.

Apr 25, 2022

5 Common Mistakes to Developing Scalable IoT Applications by Tori Firewind and the IoT EDC Team Introduction To build scalable applications, it’s necessary to identify common mistakes and avoid them at the early stages of development. In an expert session this past month, the PTC Enterprise Deployment Team elaborated on why scalability is important and how to avoid the common development pitfalls in IoT. That video presentation has been adapted here for visual consumption of the content as well. What is Scalability and Why Does it Matter Enterprise ready applications can scale and easily be maintained, which is important even from day 1 because scalability concerns are the largest cause for delays to Go Lives. Applications balance many competing requirements, and performance testing is crucial to ensure an application is ready for Go Live. However, don't just test how many remote assets can connect at once, but also any metrics that are expected to increase in time, like the number of remote properties per thing, the frequency of reporting from those properties, or the number of users accessing the system at once. Also consider how connecting more assets will affect the user experience and business logic, and not just the ability to ingest data. Common Mistake 1: Edge Property Updates Because ThingWorx is always listening for updates pushed from the Edge and those resources are always in use, pulling updates from the Foundation side wastes resources. Fetch from remote every read is essentially a round trip, so it's slower and more memory intensive, but there are reasons to do it, like if the quality tag is needed since the cache doesn't store it. Say a property is pushed at 11:01, and then there's a network issue at 11:02. If the property is pulled from the cache, it will pull the value sent at 11:01 without any indication of there being a more recent value on the Edge device. Most people will use the default options here: read from server cache, which relies on the Edge to push updates, and the VALUE push type, and configuring a threshold is a good idea as well. This way, only those property updates which are truly necessary are sent to the Foundation server. Details on property aspects can be found in KCS Article 252792. This is well documented in another PTC Community post. This approach is necessary and considered a best practice if there is event logic which depends on multiple properties at once. Sending all of the necessary properties to determine if an event should fire in one Infotable ensures there is no need to query the database each time a property update comes in from the Edge, which ensures independent business logic and reduces the load on the database to improve ingestion performance. This is a very broad topic and future articles will address it more specifically. The When Disconnected property aspect is a good way to configure what happens with Edge property values in a mass disconnect scenario. If revenue depends on uptime, consider losing any data that changes while a device is disconnected. All of the updates can be folded into a single value if the changes themselves aren't needed but an updated value is needed to populate remote properties upon reconnect. Many customers will want to keep all of their data, even when a device is offline and use data stores. In this case, consider how much data each Edge device can store (due to memory limitations on the devices themselves), and therefore how long an outage can last before data is lost anyway. Also consider if Foundation can handle massive spikes in activity when this data comes streaming in. Usually, a Connection Server isn't enough. Remember that the more data needs to be kept, the greater the potential for a thundering herd scenario. Handling a thundering herd scenario goes beyond sizing considerations. It is absolutely crucial to randomize the delay each device will wait before attempting to reconnect. It should be considered a requirement to have the devices connect slowly and "ramp up" over time for multiple reasons. One is that too much data coming in too fast could overwhelm the ingestion queue and result in data loss. Another is that the business logic could demand so many system resources, that the Foundation server crashes again and again and cannot be recovered. Turning off the business logic it isn't possible if the downtime is unexpected, so definitely rely instead on randomized reconnection times for Edge devices. Common Mistake 2: Overlooking Differences in HA To accommodate a shared thing model across many servers, changes had to be made in how the thing model is stored and the model tree is walked by the Foundation servers. Model information is no longer cached at the Thing level, and the model tree is therefore walked every time model information is needed, so the number of times a Thing is directly referenced within each service should be limited (see the Help Center for details). It's best to store whatever information is needed from a Thing in an Infotable, making the Things[thingName] reference a single time, outside of any loops. Storing the property definitions outside of the loop prevents the repetitious Thing references within the service, which otherwise would have occurred twice for each property (for both the name and the description), and then again for every single property on the Thing, a runtime nightmare. Certain states previously held in memory are now shared across the cluster, like property values, Thing states, and connection statuses. Improvements have been made to minimize the effects of latency on queries, like how they now only return property values on associated Thing Shapes or Thing Templates. Filtering for properties on implementing Things is still possible, but now there is a specific service to do it, called GetThingPropertyValues (covered in detail in the Help Center). In the script shown above, the first step is a query to get the names of all implementing things of a particular Thing Shape. This is done outside of any loops or queries, so once per service call. Then, an Infotable is built to store what would have been a direct reference to each thing in a traditional loop. This is a very quick loop that doesn't add much by way of runtime since it is all in memory, with no references to the thing model or the database, instead using the results of the first query to build the Infotable. Finally, this thing reference Infotable is passed into the new service GetThingPropertyValues to retrieve all of the property info for all of these things at once, thereby only walking the thing model once. The easiest mistake people would make here is to do a direct thing reference inside of a loop, using code like Things[thingName].Get() over and over again, thereby traversing the thing model repeatedly and adding a lot of runtime. QueryImplementingThingsOptimized is another new service with new parameters for advanced configuration. Searches can now be done on particular networks or to particular depths, and there's an offset parameter that allows for a maximum number of items to be returned starting at any place in the list of Things, where previously if you needed the Things at the end of the list, you had to return all of the Things. All of these options are detailed in the Help Center, as well as the restrictions listed in the image above. Common Mistake 3: Async Service Misuse Async services are sometimes required, say if a user has to trigger many updates on many remote things at once by the click of a button on a mashup that should not be locked up waiting for service completion. Too many async service calls, though, result in spikes in activity and competition for resources. To avoid this mistake, do not use async unless strictly necessary, and avoid launching too many async threads in parallel. A thread dump will show how many threads there are and what they are doing. Common Mistake 4: Thread Pool Overload Adding more threads to the pool may be beneficial in certain circumstances, like if the threads are waiting on other resources to complete their tasks, look stuff up in the database (I/O), or unlock data that can only be accessed one thread at a time (property writes). In this case, threads are waiting on other resources, and not the CPU, so adding more threads to the pool can improve performance. However, too many threads and performance degradation will occur due to increased contention, wasted CPU cycles, and context switches. To check if there are too many or not enough threads in the pool, take thread dumps and time the completion of requests in the system. Also watch the subsystem memory usage, and note that the side of the queue should never approach the max. Also consider monitoring the overall performance of the system (CPU and Memory) with a tool like Grafana, and remember that a good performance test properly exercises all of the business logic and induces threads in a similar way to real world expectations. Common Mistake 5: Stream Etiquette Upserts, or updates to database tables, are expensive operations that can interfere with ingestion if they are performed on the wrong tables. This is why Value Stream and Stream data should never be updated by end users of the application. As described in the DGIS document on best practices, aggregation is the key to unlocking optimal performance because it reduces the size of database tables that require upserts. Each data structure shown here has an optimal use in a well-designed ThingWorx application. Data Tables are great for storing overview information on all of the Things in one view, and queries on this data source are the fastest. Update this data source as often as possible (by timer), allowing enough time for updates to be gathered and any necessary calculations made. Data Tables can also be updated by end users directly because each row locks one at a time during updates. Data Tables should be kept as small as possible to improve performance on mashups, so for instance, consider using one to show all Things per region if there are millions of Things. Roll up information is best stored here to avoid calculations upon mashup load, and while a real-time view of many thousands of things at once is practically impossible, this option allows for a frequently updates overview of many things, which can also drill down to other mashup views that are real-time for one Thing at a time. Value Streams are best used for data ingestion, and queries to these should be kept to a minimum, largely performed by the roll up logic that populates the Data Tables mentioned above. Queries that chart all of the data coming in are best utilized on individual Thing views so that only a handful of users are querying the same data sources at a time. Also be sure to use start and end dates and make use of the "source" field to improve query performance and create a better user experience. Due to the massive size of the corresponding database tables, it's best to avoid updating Value Streams outside of the data ingestion process altogether. Streams are similar, but better for storing aggregated, historical data. Usually once per day or per week (outside of business hours if possible), Value Stream data will be smoothed or reduced into less data points and then stored into Streams. This allows for data to be stored for longer periods of time on the server without using up as much memory or hurting query performance. Then the high volume ingested data sources can be purged frequently, as discussed below. Infotables are the most memory intensive, and are really designed to hold only a small number of rows at a time, usually to facilitate the business logic. Sometimes they will be stored in Streams or Data Tables if they aren't expected to grow larger (see the DGIS Coffee Machine App for an example). Infotables should never be logged; if they are used to transmit Edge property updates (like in the Property Set Approach), they should be processed into other logged (usually local) properties. Referring to the properties themselves is how to get real-time information on a mashup, say by using the GetProperties service and its auto-update option, which relies on internal websockets. This should be done on individual Thing views only, and sizing considerations need to be made if there will be many of these websockets open at once, say if there are many end users all viewing real-time data at a time. In the newer versions of ThingWorx, these cannot be updated directly, so find the system object called ThingWorxPersistenceProvider and use the service UpdateStreamDataProcessingSettings. ThingWorx Foundation processes data received from remote devices in batches in order to manage the data flow and reduce database churn. All of these settings configure how large those batches are and how frequently they are flushed to the database (detailed in full in KCS Article 240607). This is very advanced configuration that heavily depends on use case and infrastructure, but some info applies to most people: adjusting the scan rate is usually not beneficial; a healthy queue should never approach the max limit; and defaults differ by database because they function differently. InfluxDB generally works better when there are less processing threads and higher numbers of things per thread, while PostgresDB can have a lot of threads, preferably with less things per thread. That's why the default values shown here are given as the same number of threads (and this can be changed), but Influx has a larger block size and size threshold because it can handle more items per thread. Value Streams ingest all data into the Foundation server, and so the database tables that correspond with these data sources grow very large, very quickly and need to be purged often and outside of business hours, usually once a day or once per week. That's why it's important to reduce the data down to less points and push them into Streams for historical reference. For a span of years, consider a single point a day might be enough, for a span of hours, consider a data point a minute. Push aggregated data into Streams and then purge the rest as soon as it is no longer needed. In Conclusion

Jun 29, 2021

Thread Safe Coding, Part 2: The Database Locker Approach and Comparison Written by Desheng Xu and edited by @vtielebein Overview This is the second on this topic, describing an alternate approach to thread safe coding than one which requires the Java extension. The demo use case here is the same as in the previous post, and there is a section at the end comparing the two approaches. Database Locker for Thread Safe Coding The database locker is an advanced topic, so some experience with the database thing is assumed. The following steps demonstrate how to be thread safe with a database thing. Create New Database Instance, and New Table for counter It is strongly recommended that a new database instance be created outside of the ThingWorx database schema. This guide will NOT include instructions to create the new database instance. Use the following SQL commands to create a new table: DROP table IF EXISTS counters; CREATE TABLE counters ( name VARCHAR(100) unique , value integer NULL, PRIMARY KEY(name) ); INSERT INTO counters values('DemoCounter',0); This will create a new table called counters, initializing the first counter, called DemoCounter with the value 0. Create a Function to Increase and Return the New counter Value Use the following sample code to create a table lock function: CREATE OR REPLACE FUNCTION IncreaseCounter(coutner_name VARCHAR(100), OUT newvalue INTEGER) AS $$ BEGIN LOCK TABLE counters IN ACCESS EXCLUSIVE MODE; SELECT(SELECT value FROM counters WHERE name = $1) + 1 INTO newvalue; UPDATE counters SET value = newvalue WHERE name = $1; END; $$ language plpgsql; Or use the following SQL command to create a new row level locker function: CREATE OR REPLACE FUNCTION IncreaseCounter(counter_name VARCHAR(100), OUT newvalue INTEGER) AS $$ BEGIN SELECT value FROM counters WHERE name = $1 FOR UPDATE INTO newvalue; newvalue := newvalue + 1; UPDATE counters SET value = newvalue WHERE name = $1; END; $$ language plpgsql; Create a Database Thing Create a thing with the template "database" within ThingWorx, and use the PostgreSQL Driver to connect to the new database instance created above. Create New Services in the Database Thing The service IncreaseCounterDB would be a SQL Query service: SELECT * FROM public.IncreaseCounter([[counter_name]); counter_name would be the input parameter, a STRING which is marked as required. The service GetCounterDB would be another SQL Query service: SELECT value FROM public.counters WHERE name=[[counter_name]] LIMIT 1; counter_name would be another input parameter, a STRING which is also marked as required. The service ResetCounterDB would be a SQL Command service: UPDATE public.counters SET value = 0 WHERE name=[[counter_name]]; counter_name is yet another input parameter, also a STRING and also required. Wrap the Database Thing Service The above database thing service will return an InfoTable, but not an integer. If it's inconvenient to use an InfoTable, wrap the service up into a local Javascript service and return an integer value. The service IncreaseCounter is a wrap up of IncreaseCounterDB and returns an integer value: // result: INFOTABLE dataShape: "" var query_result = me.IncreaseCounterDB({ counter_name: 'DemoCounter' /* STRING */ }); var result = query_result.rows[0]["newvalue"]; Similarly wrap up GetCounter into GetCounterDB: // result: INFOTABLE dataShape: "SingleIntegerDatashape" var query_result = me.GetCounterDB({ counter_name: 'DemoCounter' /* STRING */ }); var result = query_result.rows[0]["value"]; And ResetCounter into ResetCounterDB: // result: NUMBER var query_result = me.ResetCounterDB({ counter_name: 'DemoCounter' /* STRING */ }); var result = 0; Run the Test Again If necessary, head back to the previous post to obtain the tool. Then just change the end point and run a new test: { "host":"twx85.desheng.io", "port":443, "protocol":"https", "endpoint":"/Thingworx/Things/DatabaseDemo/services/IncreaseCounter", "headers":{ "Content-Type":"application/json", "Accept": "application/json", "AppKey":"5cafe6eb-adba-41df-a7d6-4fc8088125c1" }, "payload":{}, "round_break":50000, "req_break":0, "round_size":50, "total_round":20 } Run: Validate the Result Execute the service GetCounter to validate the result: Overall Performance Comparison The Java Extension performance looks the best here, but the database row lock will perform better if there are multiple counters. InfoTable Type Property InfoTable properties have the same thread-safe challenges discussed previously, but they also have some additional challenges due to the way data change events are triggered. This is outside of the scope of this document, but it is worth a very brief mention here. In general, the data change event for an InfoTable fires when the reference to the table is updated, and not the contents of the table. If the values of an InfoTable are updated directly, say by adding or removing a row, then the data change event will not be triggered because the value has technically not changed. Instead, the InfoTable has to be cloned, then modified, and then assigned back to the Thing so that the reference changes as well. Such additional considerations must be made when using other property types than those shown here.

Dec 10, 2020

Is your team operating an effective DevOps pipeline? DevOps is an important part of a mature, enterprise ready application, but the process isn’t simple. This expert session will focus on how a full DevOps pipeline looks like and how PTC can help to build a seamless pipeline. Join us for our upcoming Expert Session to learn how to create a Docker image, integrate Azure with Docker and Git, and set up a seamless DevOps pipeline. When? Thursday, September 30th 2021 | 11 AM EST Host: Tori FIrewind, Senior Engineer in PTC IOT Enterprise Deployment Center Registration link: https://www.ptc.com/en/resources/iiot/webcast/devops-pipeline-thingworx

Sep 22, 2021

Update to Connected Factories Benchmark Scenario Three: One Kepware Server in ThingWorx 9.0 The goal of this scenario is to confirm the same performance in ThingWorx 9.0 as seen in scenario one, where one Kepware Server represented a single factory in version 8.5. Matrix 1 - Slow (15s slow properties, 1s fast) The lower frequency tests performed the same in 9.0. Even the 10k ingestion test, which lies very close to the boundary for a single Kepware Server, passed with no errors. Matrix 2 – Fast (5s slow properties, 500ms fast) These showed similar results, but the 500 thing, 50-10 property test had data loss in 9.0. However, the write rate is much higher than PTC recommends for a single Kepware Server anyway. Matrix 3 – Faster (1s slow properties, 200ms fast) The fastest tests had similar results as well. The larger tests ran with more success with two Kepware Servers (data not shown here). Conclusions ThingWorx 9.0 is similarly capable of ingesting data using Kepware Server. A single instance can still achieve up to 10k wps. Future scenarios will now make use of ThingWorx 9.0. Download the updated draft here!

Oct 22, 2020

ThingWorx Docker Overview and Pitfalls to Avoid by Tori Firewind of the IoT EDC Containers are isolated and can run side-by-side on the same machine, but they share the host OS, making them more efficient in terms of memory usage and scalability. Docker is a great tool for deploying ThingWorx instances because everything is pre-packaged within the Docker image and can be stored in a repository ready for deployment at any time with little configuration required. By using a different container for every component of an application, conflicting dependencies can be avoided. Containers also facilitate the dev ops process, providing consistent application deployments which can be set up, taken down, and tested automatically using scripts. Using containers is advantageous for many reasons: simplified configuration, easier dev ops management, continuous integration and deployment, cost savings, decreased delivery time for new application versions, and many versions of an application running side-by-side without any wasted resources setting them up or tearing them down. The ThingWorx Help Center is a great resource for setting up Docker and obtaining the ThingWorx Docker files from the PTC Software Downloads website. The files provided by PTC handle the creation of the image entirely, simplifying the process immensely. All one has to do is place the ThingWorx version and all of the required dependencies in the staging folder, configure the YML file, and run the build scripts. The Help Center has all of the detailed information required, but there are a few things worth noting here about the configuration process. For one thing, the platform-settings.json file is generated based on the options given in the YML file, so configuration changes made within this configuration file will not persist if the same options aren’t given in the YML file. If using Docker Desktop to run an image on a Windows machine, then the configuration options must be given in an ENV file that can be referenced from the command used to start the image. The names of the configuration parameters differ from the platform-settings.json file in ways that are not always obvious, and a full list can be found here. For example, if extension imports need to be enabled on a ThingWorx instance running in Docker, then the EXTPKG_IMPORT_POLICY_ENABLED option must be added to the environment section of the YML file like this: environment: - "CATALINA_OPTS=-Xms2g -Xmx4g" # NOTE: TWX_DATABASE_USERNAME and TWX_DATABASE_PASSWORD for H2 platform must # be set to create the initial database, or connect to a previous instance. - "TWX_DATABASE_USERNAME=dbadmin" - "TWX_DATABASE_PASSWORD=dbadmin" - "EXTPKG_IMPORT_POLICY_ENABLED=true" - "EXTPKG_IMPORT_POLICY_ALLOW_JARRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_JSRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_CSSRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_JSONRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_WEBAPPRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_ENTITIES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_EXTENTITIES=true" - "EXTPKG_IMPORT_POLICY_HA_COMPATIBILITY_LEVEL=WARN" - "DOCKER_DEBUG=true" - "THINGWORX_INITIAL_ADMIN_PASSWORD=Pleasechangemenow" Note that if the container is started and then stopped in order for changes to the YML file to be made, the license file will need to be renamed from "successful_license_capability_response.bin" to "license_capability_response.bin" so that the Foundation server can rename it. Failing to rename this file may cause an error to appear in the Application Log, and the server to act as if no license was ever installed: "Error reading license feature info for twx_realtime_data_sub". In Docker Desktop on a Windows machine, create a file called whatever.env and list the parameters as shown here: Then, reference this environment file when bringing up the machine using the following command in Powershell: docker run -d --env-file h2.env -p 8080:8080 -v ${pwd}/ThingworxPlatform:/ThingworxPlatform -v ${pwd}/ThingworxStorage:/ThingworxStorage -it <image_id> Notice in this command that the volumes for the ThingworxPlatform and ThingworxStorage folders are specified with the “-v” options. When building the Docker image in Linux, these are given in the YML file under the volumes section like this (only change the path to local mount on the left side of the colon, as the container mount on the right side will never change): volumes: - ./ThingworxPlatform:/ThingworxPlatform - ./ThingworxStorage:/ThingworxStorage - ./tomcat-logs:/opt/apache-tomcat/logs Specifying the volumes this way allows for ThingWorx logs and configuration files to be accessed directly, a crucial requirement to debugging any issues within the Foundation instance. These volumes must be mapped to existing folders (which have write permissions of course) so that if the instance won’t come up or there are any other issues which require help from Tech Support, the logs can be copied out and shared. Otherwise, the Docker container is like a black box which obscures what is really going on. There may not be any errors in the Docker logs; the container may just quit without error with no sign of why it won’t stay up. Checking the ThingWorx and Tomcat logs is necessary to debugging, so be sure to map these volumes correctly. Once these volumes are mapped and ThingWorx is successfully making use of them, adding a license file to the Docker instance is simple. Use the output in the ThingworxPlatform folder to obtain the device ID, grab a valid license file, and put it right back into that ThingworxPlatform folder, exactly the same way as on a regular instance of ThingWorx. However, if the Docker image is being used for a dev ops process, a license may not be necessary. The ThingWorx instance will work and allow development for a time before the trial license expires, which normally will be enough time for developers to make their changes, push those changes to a repository, and tear the container down. Another thing worth noting about ThingWorx Docker image creation is that the version of Java supplied in the staging folder must match the compatibility requirements for each version of ThingWorx. This is the version of Java used by the container to run the Foundation server. In versions of ThingWorx 9.2+, this means using the Amazon Corretto version of Java. The image absolutely will not start ThingWorx successfully if older versions of Java are used, even if the scripts do successfully build the image. Also note that in the newer versions of ThingWorx Docker, the ThingWorx Foundation version within the build.env file is used throughout the Docker image creation process. Therefore, while the archive name can be hard-coded to whatever is desired, the version should be left as is, including any additional specifications beyond just the version number. For example, the name of the archive can be given as Thingworx-Platform-H2-9.2.0.zip (a prettier version of the archive name than is used by default), but the PLATFORM_VERSION should still be set to 9.2.0-b30 (which should be how it appears within the build.env file upon download of the ThingWorx Docker files). Paying attention to every note in the Help Center is critically important to using ThingWorx Docker, as the process is extensive and can become very complicated depending on how the image will be used. However, as long as the volumes are specified and the log files accessible, debugging any issues while bringing up a Docker-contained ThingWorx instance is fairly straightforward. Credits: Images borrowed from ThingWorx Docker Containerization Tech Talk by Adrian Petrescu

Jul 30, 2021

Dev Ops is a crucial process that exists in any software setting, whether you plan on it or not. Chaos in the dev ops process, say because less time is spent here than on the shiny new features that are easy to sell, results in bottlenecks in the dev ops process. Bottlenecks reduce efficiency, and leave you open to vulnerabilities as well. The faster you can get a change properly tested and safely into production, the safer and more stable the system is all along. Issues will arise, they always arise. Are you ready for them? Watch this video, see some of these additional links, and think about your dev ops process now, before the fires start! Useful Links: ThingWorx Monitoring and Alerting Using Prometheus and Grafana, Part 1 ThingWorx Monitoring and Alerting Using Prometheus and Grafana, Part 2 Overview of Monitoring Tools and Diagnostics The System Health Timer

Jan 31, 2023

IoT & Connectivity Tips

ThingWorx DevOps with Azure: The Comprehensive DevOps Guide

ThingWorx Monitoring and Alerting: Using Prometheus and Grafana, Part 2

How to Scale Vertically and Horizontally, and When to Use Sharding

IOT EDC Reference Benchmark - New Scenario Using Multi-Kepware for Connected Factories

Distributed Testing with JMeter

IOT EDC Reference Benchmark - ThingWorx and Azure IoT Hub

Smoothing Large Data Sets

Building More Complex Tests in JMeter

Setting Up Azure Load Balancer with a ThingWorx High Availability Deployment

Introducing the IoT Enterprise Deployment Center

The Property Set Approach

Announcing the Release of the ThingWorx 8.5 Sizing Guide!

IOT EDC Reference Benchmark - Leveraging Dell and VMWare for Asset Monitoring in Connected Factories

ThingWorx 8.5 Architecture Deployment Guide Update

The DPM User Experience

5 Common Mistakes for Developing Scalable IoT Applications

Thread Safe Coding in ThingWorx, Part 2: The Database Locker Approach and Comparison

Live Webinar: Setting up a DevOps pipeline in ThingWorx on September 30th

IOT EDC Reference Benchmark - Updated Connected Factories Benchmark with ThingWorx 9.0

ThingWorx Docker Overview and Pitfalls to Avoid

Dev Ops: The High Level Overview

ThingWorx Learning Paths

Transaction Handling in ThingWorx

Getting Started on the ThingWorx Platform Learning Path