cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - Visit the PTCooler (the community lounge) to get to know your fellow community members and check out some of Dale's Friday Humor posts! X

IoT Tips

Sort by:
Architecting Reason Code Trees in DPM Tori Firewind, IoT EDC   What are Machine Codes? Factory hardware devices communicate status changes to their human operators and other machines (IoT) via machine codes. The manufacturers often determine the machine codes for different types of factory hardware, so those are often pre-determined. However, how the reason trees map these machine codes to corresponding business logic in ThingWorx is entirely customizable. Knowing the best way to design your reason trees for this purpose can be challenging, so this guide is here to help with your conceptual knowledge. Using the UI to create, edit, and configure reason codes in technical detail can be found in the Help Center.    The Tree Trunk At the highest level of the reason tree, the trunk, there are really 3 categories: Availability (A), Performance or Productivity (P), and Quality (Q). These should look familiar; they are the three dimensions of OEE (Overall Equipment Effectiveness). Fg 1. Calculation of OEE Availability refers to long stops, events that stop planned production long enough that it makes sense to track a reason for being down (typically several minutes, but the threshold between a long stop and a short stop can vary depending on the ideal rate of production of materials).                  Availability = Run Time / Planned Production Time   Productivity/Performance really refers to short stops, things that cause the machine to run at less-than- optimal speeds. This can include stops caused by running out of materials for production, doing minor maintenance like switching out a single, easily-changed part, or even frequent breaks due to ill health of an operator. User error can be a cause as well, say if the machine needs a certain heat to produce parts, and the heat keeps fluctuating (requiring the machine to take the time to calibrate for this before starting on production) because operators are smoking out a back door or adjusting thermostat temperatures. Fg 2. Levels of Runnable Time   Operator influence often is a factor when it comes to the conditions that permit optimal performance from machinery, and every factory may face different challenges. Stops like these are not really outages; the amount of downtime isn’t enough to consider the production block entirely unproductive. Production was continuing and ongoing throughout most of the block despite the issues; the rate was just slower than ideal.                  Performance/Productivity = (Total Count / Run Time) / Ideal Run Rate   Quality refers mostly to the number of items that are considered scrap or rework, and it can be split into two categories: start up scrap (that which is expected because the machine is in the process of warming up or being fine-tuned by the operator) and production scrap (things which come out wrong and must be tossed or reworked because the conditions under which they were produced weren’t ideal; this is called first-pass yield only, meaning it's only a "good" product if it passes inspection the first time).                  Quality = Good Count / Total Count​   The Branches and the Leaves of the Tree The “leaves” are the reason codes which directly map to machine codes , and the “branches” are the method of categorization that connects them to the trunk. Both the leaves and the trees, the children and the parent nodes of the tree, are split into two states: planned versus unplanned downtime. Changeovers, maintenance, and even scrap, can be broken down into this dichotomy.   For scrap, there are startup rejects (planned, because the machines have ramp up periods) and production rejects (unplanned, because the conditions weren’t ideal). For maintenance there is planned and unplanned, small changes that occur on the fly that result in productivity loss, and maybe also reduce availability in the long run. Small, unplanned changes can occasionally shift into the availability loss category if a simple, quick repair winds up being complex and time-consuming. A good reason tree can differentiate easily between short and longer stops in order to respond to each in a deliberate way.   To start off in the process of architecting your reason tree, try writing the three categories on a board in a common room in an average factory (or several as a survey). Ask operators to stop in over the course of a few days and write various machine codes that they see often and find useful under one of these categories, or more than one if the machine code pops up under different circumstances and can mean different things. Have them write a 10 word justification, if the association isn’t obvious. Gather all of the “leaves” in this way, and then begin to associate them with the “trunk”, forming the “branches”.   An example tree can be seen in figure 3 here, with leaves like “Changeover” and “Maintenance” being semi-ambiguous; they could just as easily be seen as unplanned stops. Therefore, there may be multiple reason codes mapping up to the top of the tree in more than one branch, and these can have different categories, which controls how the business logic responds to the different codes. The Help Center has more details about how the events are mapped to types, and each type contains multiple categories, as configured by you when you set-up the DPM model. Fg 3: Different types of changeovers may have different codes, and can map up as either planned or unplanned, but all planned and unplanned stops (long stops) are under the Availability category of the trunk. Similarly, small stops can involve idling, like if there are not enough materials, reduced speed if the conditions are not ideal, or other small stops, usually caused by human error or unforeseeable circumstances. Quality loss then refers to the products which fail quality checks, either because the machine still has the wrong paint in the applicator and needs a few rounds to be ready for the next production item, or because the conditions are again, not ideal, and items wind up scrapped.   Example Reason Tree Fg 5 example tree with more specific tags (there may be dozens or hundreds in a full reason tree, though the fewer are needed to capture the events we care about, the better).   Theory of Constraints Fg 6 theory of constraint wheel: an industry process for gradual OEE improvement in factories that has been adapted into the PTC methodology as well. While architecting your reason tree, always remember the key purpose: gathering only as much data as necessary to analyze the efficiency of a factory and to identify the bottleneck, or the most limiting factor. The important point is to identify not just the bottleneck that seems the most troublesome, but the one that actually results in the greatest impact to OEE across the entire factory.   Without software like DPM, and a properly designed reason code tree, the process of improving a factory can be very challenging, involving a lot of guesswork, and sometimes solving one problem at the cost of another. The issue is that these machines produce a LOT of raw data, and humans are not the best tool available to gather and aggregate this data in a consumable way. A good reason tree ensures a smart application that can quickly prioritize the machine (bottleneck) that most impacts production, and not just the machine that functions in the least optimal way.   So, the theory of constraints is really a process for identifying small, incremental changes, which together can make a big difference, and fast, in factory OEE. The rate at which this cycle can be completed varies, however. The slower the process of identifying constraints and the less information that is gathered, the slower and less precise the first two steps of this process. Alternatively, in a traditional constraint identification process, too much information can be a problem as well, due to human limitation, as discussed above. So, DPM is a great benefit in this regard, because it aggregates the data into a consumable, comparable way every 5 minutes, freeing up your human analytics for problem solving and prioritization, and not data gathering and sorting.   Other Key Tips Also remember that a good tree treats the trunk like a whole unit, with each category occupying a percentage of the overall OEE. Afterall, look back up at the 3 dimensions of OEE in the equation above. For example, the more you see issues with availability, the less you will see issues with scrap, for the machine simply doesn’t have as much time to produce scrap if it is constantly down. The more you see issues with quality loss, the less you should see of productivity loss, because these are simply inversely proportional modes; to say it differently, if a machine is running quickly and seeing few minor maintenance stops, then it is likely to produce more scrap (as well as more good product as well).   Another thing to remember is that even DPM is limited in its capacity to interpret raw data. Even while many magnitudes more efficient than any human gathering and analysis could ever be, there is an upper limit to how much raw data DPM can ingest and analyze before the system gets very expensive. For this reason, you want to ensure your reason trees use only as many reason codes as are required to capture the OEE of a factory site. This will mean using different codes for different types of things, most likely, which is easy to do maintainably across many sites using thing shapes. Keeping things tightly defined and organized is the easiest way to ensure a clean, efficient system for gathering and storing data.   Also remember that data will not need to persist very long once DPM is fully operational and adopted by your factories. DPM ensures that the changes made to the production line to improve efficiency are the highest impact, and the least difficult to implement, meaning that there will be a very rapid return on investment, and a process to ensure future issues are identified and resolved quickly. Data from past issues in the factory won’t be as relevant, and historical data stores can be kept smaller than one might think. It is the power of ingesting data directly into the processing and aggregation process, the automatic reduction of data down into presentable, consumable webpages, that makes DPM and ThingWorx such a great factory solution for optimizing OEE.
View full tip
ThingWorx Monitoring and Alerting, Part 2 Using Prometheus and Grafana By Tori Firewind, IoT EDC Building Dashboards     To add a panel which monitors some component of the ThingWorx application to a dashboard in Grafana, click to add a new panel. Under “Metrics” in the box at the bottom of the screen, select what ThingWorx metrics you wish to monitor (type “thingworx” in the search box to see them all). For example, select the Platform Subsystem memory in use:     Label filters aren’t necessary, though you may want to sort by instance if you are monitoring multiple ones with the same dashboard. You may also want to take some time to format the Y axis, which by default will show in bytes. Go to the formatting panel on the right side and scroll down to the section called “Standard options”. For the Unit dropdown, start typing “data” and then select “bytes (SI)”. This will automatically determine if the bytes you’ve provided should really print as MB or GB based on how large the numbers are.     Rename the panel, modify it in any other way desired, and then click Apply (last 5 minutes):     Once you add the panel, you can watch the memory usage as it is scraped by selecting the refresh option (10s or 30s, whatever makes sense based on your scrape interval).     The viewing window is stored in the URL, so that you can generate a report for a specific interval (like when a test was occurring), and then store that result or share it in a more compact way: http://localhost:3000/d/nleucPv4k/thingworx-monitoring?orgId=1&from=1668528038732&to=1668536503953  (absolute timestamps):     Dashboards are just collections of panels which report on all of the various metrics of performance and stability that exist for single components of a system. This is because there can be quite a few metrics worth watching for each individual component. Most of the third-party tools come with their own dashboards, but the ThingWorx component is one which for now, requires some thought and creativity.     Consider your use case carefully and look over the various subsystems contained within ThingWorx. Each part of an application is localized to specific subsystems, and some are more business critical than others. What will go at the top of your dashboard? Add rows, add panels per row, and see what the many choices are for watching your system.     Don’t forget that with Telegraf running, VM or machine usage metrics are also available for display on a dashboard. Things like overall CPU and Memory usage are critical to determining the health of a system, as we have demonstrated in our own reasoning in past benchmarks and scale tests. You can create a panel to monitor the mem_used versus mem_total, like so:     Another metric from Telegraf worth adding is the CPU usage, which should be given “percentage” for the units and which needs a label filter of cpu = cpu-total. If we do some resizing and drag-and-dropping, then we now have the first row of a dashboard:     See how the Platform usage climbs steadily and is purged in a cycle? That is the Java Garbage Collection mechanism, and it’s important to remember to leave room for spikes on top of those peaks. Data can also be calculated or processed in some way to make it more useful for determining system health and stability.     The data in the picture below uses the formula submitted = completed + number queued + number failed. It shows the current queue on the left Y-axis and the max queue on the right (since the two numbers usually are drastically different). It looks pretty, but it doesn’t really tell us much about the system in this format, so let’s do some math and find a representation that is a bit more helpful.     Performing a “non-negative derivative” calculation over the submitted and the completed queue counts over time allows for us to look at the status of the queue as a velocity. When the “complete” speed appears behind the “submitted” speed for too long at a time, then that means the queue is filling up and will eventually result in data loss.     If we take this one step further and calculate the average of the submitted minus the completed over time, then we can actually predict approximately when the queue will fill up. This can then be displayed on a dashboard in Grafana, or used as the basis for an alert.   What to Monitor     In addition to monitoring the system which ThingWorx runs upon, ThingWorx itself can easily be monitored down to the subsystems level by Prometheus due to the Metrics endpoint. Many applications have support built into the way they format the data for scraping, including the JVM (which exposes Prometheus-formatted metrics with the JMX Exporter) and the OS (which can use the Node Exporter or Telegraf for the same purpose). For these more generic components, there are popular community dashboards which can be downloaded and used in Grafana for data analysis and review.     For ThingWorx, there’s different kinds of data to track: subsystem data (see the list on the right) and non-subsystem data. There’s queue based versus non-queue data. These different metrics can collectively characterize the overall health of the application, depending on the use case.     For instance, if this is a system with very many connected devices, one metric which may be important to track is the number of total devices defined on the Foundation server vs. the number of devices which are currently connected. If there are relays involved, then many devices suddenly going offline can mean a relay has failed. Another example is if the system sizing depends on an assumption that there will only ever be a fraction of the total number of devices connected at a time. Use cases like these could be monitored easily by keeping track of the total vs. the number of connected devices.     Other common indicators of a healthy ThingWorx application might include the value stream and stream queues. These queues should fluctuate over time as the data is ingested and processed, but they should never be growing in size. If the stream queues are growing, then that means the data is writing to the queues faster than the queues can write to the database. Eventually, when the system runs out of resources to keep track of the queues, data will be lost. Having the stream information displayed in a chart can make it very easy to spot an upward trend in resource usage early on, which can catch a blockage or bottleneck that needs attention before it starts to affect the larger system in catastrophic ways later.     Memory usage information from the various subsystems might be something worth tracking, as well as the event queue. These can indicate that the business logic is functioning with room to handle spikes, and that the server has enough memory to service all three dimensions of an IoT application: the ingestion, the business logic and thing-based alerting, and the user experience and UI. If file transfers are a key part of the use case, then the number of concurrent transfers, the average speed of them, the size of the files, all of this kind of stuff can be tracked and charted in Grafana by making use of the ThingWorx metrics which automatically show up there once you import the Prometheus data source.     A mature dashboard used for a production environment might look a little like this: For further reading about subsystem monitoring, check out the Help Center.   How to Alert     The alerting mechanism built-in to Prometheus is incredibly easy to configure, so it might be tempting to generate tons of alert rules. However, remember that the more noise a system makes, the harder it is for those monitoring that system to know when action is really required. Playbooks which document how to respond to alerts, who to contact, how to act, and all the information necessary to handle an alert, should be created as an ongoing part of the DevOps process.     Alerts should fire with the right severity in the subject line, as well as all of the information about the issue that is currently known, presented in a concise way, so that whoever receives the alert starts thinking about the root cause sooner and recovers the system faster. Those who receive the alert should have the ability to facilitate its resolution, and know who is expected to react to any alerts which come in.     In the ThingWorx monitoring stack, Prometheus handles the alert rules and the generation of alerts, but alert filtering and delivery is managed in an external alerts manager.     Generally, you want your alerts to follow a curve. If the current queue size exceeds 50% of the maximum, perhaps that isn’t a huge deal, if the application catches up quickly. How long are spikes in queue processing expected to last? Perhaps if the queue size is over half-full for 10 seconds, 30 seconds, then that means the queue is falling behind and not catching up. Ok, so this might be a warning level alert. When does this become an error? Well, let’s say the queue exceeds 90% of the max queue size. This might want to alert the moment it hits the mark. Now, farther along the curve, it may not take as long before data gets lost.     As the severity of the situation increases, the threshold for alerting should increase as well. That way when errors do alert, it is a sure thing that they require a response immediately. The alerts are then pushed into the “Alerts Manager” for delivery based on your management rules. The Alerts Manager may decide to withhold warnings altogether, or send them to a much smaller mailing list, whatever filtering helps to ensure the right people receive the right alerts, right when they need them.   In Conclusion, A Healthy Application...     Has stable memory usage that fluctuates predictably and doesn’t grow over time. In a system experiencing mild issues, the memory starts to trend upward:     If left unattended, systems like this may eventually experience outages. Finding the issue this early means there is even time to do some digging, debugging, taking of stack traces, and other such troubleshooting steps before the system must be restarted or recovered. That can really make the difference in identifying and resolving before there are real problems.     One metric which makes for good alerting is the total number of failed stream entries, which can indicate there’s an issue writing to the database even before the queue has started to fill up. Other alerts may include warnings and errors based around percentages of memory used or queues filled, which depend on how long the queues take to fill up and how long the state has been at its increased usage.     Prometheus has all of the tools necessary to make this possible across a variety of infrastructures and use cases. Set it up on a local machine and poke around at what ThingWorx metrics are available to meet your monitoring needs.
View full tip
ThingWorx Monitoring and Alerting, Part 1 Using Prometheus and Grafana By Tori Firewind, IoT EDC Introduction and Getting Started     As ThingWorx has become a more mature product during the lifetime of the IoT EDC, so too have our dev ops recommendations. As we’ve stated throughout many posts now, testing is a key part of ensuring enterprise readiness, and it occurs at every stage of the process: from unit testing to preserve individual service logic, to integration tests which preserve the functionality of the application as a whole, to user and edge load testing and user experience testing, which ensure enterprise readiness. So testing is a critical component, but the process of dev ops never stops. In order to effectively test the system, a comprehensive monitoring solution is also required.     Once the application is tested and the changes pushed into production, there is no knowing with certainty that everything will run smoothly indefinitely. Random spikes in usage, server bandwidth or availability, any unforeseeable factors like these can come along and cause issues for a system. If these issues aren’t detected and addressed early, then they can very rapidly morph into much larger problems: outages, data loss, inflated data tables which are hard to revert due to their size. It is critical to detect performance issues on a system as early as possible, to have as much information as is necessary to figure out where the problem is heading, and what may have started it. Monitoring is key to a healthy system. CI/CD stands for “Continuous Integration/Continuous Deployment”, a never-ending cycle of improvement. Testing just once before the initial go live isn’t enough. Each system should have automated tests that run continuously, as well as monitors and alerts which reveal problems sooner. Diagnostic tools play a role as well, being the bridge from the end of the dev ops process cycle back to the beginning (monitoring into planning). A good CI/CD dev ops process will ensure that problems are found earlier, fixed more rapidly, and fixed for everyone using the system.       In a fully mature dev ops pipeline, issues are anticipated, discovered and researched before they become production outages or critical issues. These investigations or testing follow-ups produce development tasks (usually bugs, but also features at times) which then start the dev ops cycle all over again. This is why a good, efficient dev ops pipeline is needed, one which allows changes to quickly and safely go from development to production.     This is also why diagnostic tools play a role in the monitoring piece of the dev ops process. They are the bridge between monitoring and planning. Tools like Dynatrace can be configured to provide call stacks and take thread dumps when issues start to occur, before the system is performing so poorly it needs a restart, which happens automatically in a cluster and can clear out any trace of the issue.     Thread dumps are often necessary to diagnosing the root cause of the issue (to permanently fix it), and doing so quickly ensures application stability and availability. That is, after all, the purpose of the dev ops process. Diagnostics is therefore an equally important piece of the dev ops Figure-8-shaped pie, and one which deserves its own spotlight in an article to come.     Every piece of the dev ops process must be viewed as equally important in its own way, lest the dev ops cycle get hung up on bottlenecks of its own. A safe and stable system is not one which never experiences issues, it is one which has a good, efficient plan in place to handle recovery and prevention of repetition. A wholesome dev ops process is a happy dev ops process.   The Monitoring Stack     There are many monitoring options available, but in our experience one of the easiest and most effective monitoring stacks to use with ThingWorx is Prometheus for metrics gathering with Grafana for metrics analysis and review. In a mature monitoring stack, Telegraf is also commonly installed on each VM/host to gather the system metrics (like CPU and Memory usage, things we’ve stated are good metrics of system performance and stability in past articles on scale and size testing) and output them in Prometheus format.     Prometheus is a highly scalable open-source monitoring framework that contains out of the box monitoring and alert capabilities for Kubernetes-based deployments (not covered in this article). Using Prometheus is very simple because the ThingWorx application exposes a metrics endpoint which is formatted directly for use by Prometheus. There is also built-in alerting in Prometheus, but not the ability to form dashboards for reviewing data or screenshotting it for documentation purposes. That’s where Grafana comes into play. Grafana has a preconfigured Prometheus-type data source and many preconfigured dashboard templates for various applications and services. Telegraf is also easily imported into Grafana, as is shown in the section below. The Prometheus targets in the larger diagram are expanded out on the left. For each target, some tool exports the data in a syntax which Prometheus can scrape. For VMs, this can be Telegraf, for Kubernetes, the Node Exporter. JVM has a JMX Exporter, and other tools like CX Server use Graphite. Many apps already have a Prometheus endpoint built-in, like ThingWorx and Zookeeper. Telegraf is not strictly necessary; the node exporter can also be used on VMs, but Telegraf is the more common choice since it is a more mature dev ops tool.     Once Prometheus is scraping the targets, alerting on them can be done with OOTB Prometheus functionality, and dashboards for monitoring can be made easily in Grafana (with built-in support as well). This stack does not include the diagnostics piece, something which triggers thread dumps or the like when issues do occur. There are too many ways to conduct a successful diagnostic piece to cover here.   How to Get Started     Getting started monitoring a ThingWorx application is incredibly easy in the latest versions. Simply open up a browser, and type in the ThingWorx URL, followed by “/Metrics”. At this endpoint, there is a specially formatted response that can automatically be read by the Prometheus monitoring software which contains subsystem and service data. In addition to the application metrics, Prometheus can be configured to collect metrics from a node exporter at the (virtualized) operating system or container (Kubernetes) level as well.     If you haven’t already, install Grafana, install Telegraf as a service, and install Docker Desktop. These are the tools required (in addition to ThingWorx of course) to set-up a simple sandbox system for familiarization with the monitoring stack recommended by PTC. The easiest way to try Prometheus on a local Windows instance is to use Docker. The command for that will be found below, but first open up Docker Desktop to set contextual parameters that the command line will need. Then, modify the configuration file for Telegraf or create one (called telegraf.conf in the same folder as the exe file), and put the following into the file (or uncomment it; the default config file has thousands of lines, so just search for “prometheus”):             Output plugin [[outputs.prometheus_client]] listen = "0.0.0.0:9125"             Alternatively, install the Prometheus Node Exporter tool, which will likely require some additions to the Prometheus config file (not covered here) which we are about to create.     Then, create a configuration file (called prom_config_localhost_scraper.yml in the command to come), add the following (assuming a standard localhost installation of ThingWorx):             # my global config global: scrape_interval: 45s evaluation_interval: 30s scrape_timeout: 30s # scrape_timeout is set to the global default (10s). rule_files: - prom_config_rules.yml scrape_configs: - job_name: thingworx static_configs: - targets: ['host.docker.internal:8080'] basic_auth: username: "Administrator" password: "admin!123456789" metrics_path: /Thingworx/Metrics scheme: http params: x-thingworx-session: - "false" - job_name: prometheus static_configs: - targets: ['localhost:9090'] - job_name: Telegraf # If telegraf is installed, grab stats about the local # machine by default. static_configs: - targets: ['host.docker.internal:9125']                 This example script file uses the host.docker.internal instead of localhost for the server target for ThingWorx because it is running outside of the Docker container which contains Prometheus. This yml file configures Prometheus to monitor both ThingWorx and itself, as well as the server metrics coming from Telegraf (as long as they are configured to push). It’s a sandbox-only configuration, really, as you wouldn’t want to use the Administrator user, or have the password printed in plain text in the config file in a real system. Also note the need for the x-thingworx-session parameter, as runaway sessions which spawn every 30s or so (whatever the scrape interval is) will result in memory issues over time (so we don’t want to use sessions here).     The rules file given here (prom_config_rules.yml) needs to be created separately. This is where all of the alert rules will be defined. This will determine if an alert state is happening, but without configuring the alert manager, there won’t be any notification. That isn’t covered here but is covered extensively in the Grafana docs. Here is an alert example:             groups: - name: alert.rules rules: # Alert for any instance that is unreachable for >5 minutes. - alert: HighMemory expr: mem_used > 14000000 for: 1s labels: severity: page annotations: summary: "High Memory" description: "Localhost Memory Usage is High"             Now, save these files and use Powershell to run the Docker container:             docker run -p 9090:9090 -v C:\<path_to_document>\prom_config_localhost_scraper.yml:/etc/prometheus/prometheus.yml prom/prometheus                 It should download Prometheus and install it in that container (if this is the first time), allowing you to very rapidly deploy it to an endpoint of localhost:9090 by default. If there is an error like the one shown below, this means that you forgot to start Docker Desktop (the application) before opening Powershell. Docker Desktop sets system parameters required for containers to run in a command line (in Linux, it should work if Docker is installed for use by the command line, simple as that).     The localhost endpoints are accessible in a browser. ThingWorx defaults to localhost:8080 endpoint. Prometheus defaults to localhost:9090. Telegraf is on port 9125. Open any of these in a browser tab to see the full monitoring stack. You can see easily if Prometheus is working by clicking “Status” > “Targets” at localhost:9090:     If all of the targets appear as blue and say “last scrape” and a time stamp, then they’re working as expected. If they don’t, ensure you have the right ports, that there aren’t any firewall issues (if things aren’t all on localhost), and that everything is running without errors.     The last step in the process here is to install a dashboard tool like Grafana. Once this is installed and running on localhost:3000 (by default), you can display the data from Prometheus with a few configuration steps the Grafana UI. Highlight over the settings icon in the bottom left of the screen, and then click on “Data sources”. Select the “Add data source” button, and then click on Prometheus. You have to type the URL again  (localhost:9090), but most of the defaults will be ok here, and all you have to do is click “Save and test”.     Now both targets should appear within Grafana, with their metrics showing up throughout the Grafana UI. This data source is what allows for the building of monitoring dashboards.    
View full tip
The IoT Building Block Design Framework By Victoria Firewind and Ward Bowman, Sr. Director of the IoT EDC   Building Block Overview As detailed quite extensively on its own designated Help Center page, building blocks are the future of scalable and maintainable IoT architecture. They are a way to organize development and customization of ThingWorx solutions into modular, well-defined components or packages. Each building block serves a specific purpose and exists as independently as possible from other modules. Some blocks facilitate external data integration, some user interface features, and others the manipulation or management of different kinds of equipment. There are really no limits to how custom a ThingWorx solution can be, and customizations are often a major hurdle to a well-oiled dev ops pipeline. It’s therefore crucial for us all to use a standard framework, to ensure that each piece of customization is insular, easy-to-replace, and much more maintainable. This is the foundation of good IoT application design.   PTC’s Building Block Framework At PTC, building blocks are broken down in a couple different ways: categories and types. The category of a building block is primarily in reference to its visibility and availability for use by the greater ThingWorx community. We use our own framework here at PTC, so our solution offerings are based around solution-specific building blocks, things which we provide as complete, SaaS solutions. These solution-specific building blocks combine up into single solutions like DPM, offerings which require a license, but can then easily be deployed to a number of systems. PTC solutions provide many, complex, OOTB features, like the Production Dashboard of DPM, and the OperationKPI building block.   Anyone doing any sort of development with ThingWorx, however, does still have access to many other building blocks, included with the domain specific building block category. These are the pieces used to build the solution building blocks like DPM, and they can be used to build other more custom solutions as well. Take the OperationKPI block which can vary greatly from one customer to the next in how it is calculated or analyzed. The pieces used to build the version that ships with DPM are right there, for instance, Shift and ReasonCode. They are designed to have minimal dependencies themselves, meant to be used as dependencies by custom blocks which do custom versions of the module logic found within the PTC solution-specific blocks. Then there are the common blocks as well, and these are used even more widely for things like user management and database connectivity.   The type of the building block refers really to where that building block falls in the greater design. A UI building block consumes data and displays it, so that is the View of a classic MVC design pattern. However, sometimes user input is needed, so perhaps that UI building block will depend on another block. This other block could have a utility entity that fuels UI logic, and the benefit to having that be a separate block is then the ease with which it can be subbed out from one ThingWorx instance to the next.   Let’s say there are regional differences in how people make use of technologies. If the differences are largely driven by what data is available from physical devices at a particular site, then perhaps these differences require different services to process user input or queries on the same UI mashups used across every site. Well in this case, having the UI block stand alone is smart, because then the Model and Controller blocks can be abstracted out and instantiated differently at different sites.   The two types that largely define the Model and Controller of the classic MVC are called Abstract and Implementation building blocks, and the purposes are intertwined, but distinct: abstract building blocks expose the common API endpoints which allow for the implementation blocks to vary so readily. The implementation blocks are then those which actually alter the data model, which is what happens when the InitializeSolution method is called. In that service, they are given everything they need to generate their data tables and data constraints, so that they are ready to be used once the devices are connected.   Occasionally, UI differences will also be necessary when factories or regions have different ways of doing things. When this happens, even mashups can be abstracted using abstract building blocks. Modular mashups which can be combined into larger ones can be provided in an abstract building block, and these can be used to form custom mashups from common parts across many sites in different implementation blocks all based on the same abstract one.   The last type of building block is the standard building block, which is the most generic. This one is not intended to be overridden, often serving as the combination of other building blocks into final solutions. It is also the most basic combination of components necessary to adhere to the building block design framework and interface with the shared deployment infrastructure. The necessary components include a project entity, which contains all of the entities for the building block, an entry point, which contains the metadata (name, description, version, list of dependencies, etc.) and overrides the service for automatic model generation (DeployComponent), and the building block manager. The dashed arrows indicate an entity implements another, and the solid arrows, extension. The manager is the primary service layer for the building block, and it makes most of the implementation decisions. It also has all of the information required to configure menus and select which contained mashups to use when combining modular mashups into larger views. The manager really consists of 3 entities: a thing template with the properties and configuration tables, a thing which makes use of these, and a thing shape which defines all of the services (which can often be overridden) which the manager thing may make use of.   Most building blocks will also contain security entities which handle user permissions, like groups which can be updated with users. Anyone requiring access to the contents of a particular building block then simply needs to be added to the right groups later on. This as well as model logic entities like thing shapes can be used concurrently to use organizational security for visibility controls on individual equipment, allowing some users to see some machines and not others, and so on.   All of the Managers must be registered in the DefaultGlobalManagerConfiguration table on the PTC.BaseManager thing, or in the ManagerConfiguration table of any entity that implements the PTC.Base.ConfigManagement_TS thing shape. Naming conventions should also be kept standard across blocks, and details on those best practices can be found in the Build Block Help Center.   Extending the Data Model Using Building Blocks Customizing the data model using building blocks is pretty straight-forward, but there are some design considerations to be made. One way to customize the data model is to add custom properties to existing data model entities. For instance, let’s say you need an additional field to keep track of the location of a job order, and so you could add a City field to PTC.JobOrder.JobOrder_AP data shape. However, doing this also requires substantial modification to the PTC.JobOrder.Manager thing, if the new data shape field is meant to interface with the database.   So this method is not as straight-forward as it may seem, but it is the easiest solution in an “object-oriented” design pattern, one where the data varies very little from site to site, but the logic that handles that data does vary. In this design pattern, there will be a thing shape for each implementing building block that handles the same data shape the abstract building block uses, just in different ways. A common use case for why this may happen is if the UI components vary from site to site or region to region, and the logic that powers them must also vary, but the data source is relatively consistent.   Another way to extend the data model is to add custom data shapes and custom managers to go with them. This requires you to create a new custom building block, one which extends the base entry point as needed for the type of block. This method may seem more complicated than just extending a data shape, but it is also easier to do programmatically: all you have to do is create a bunch of entities (thing templates, thing shapes, etc.) which implement base entities. A fresh data shape can be created with complete deliberation for the use case, and then services which already exist can be overridden to handle the new data shape instead. It is a cleaner, more automation friendly approach.   However, the database will still need to be updated, and this time CRUD services are necessary as well, those for creating and managing instances of the data shape. So, this option is not really less effort in the short term. In the long-term, though, scripts that automate much of the process for building block generation can be used to quickly and easily allow for development of new modules in a more complex ThingWorx solution. The ideal is to use abstract blocks defined by the nature of the data shape which each of their implementing blocks will use to hook the view into the data model.   This “data-driven” design pattern is one which involves creating a brand-new block for each customization to the data model. The functionality of each of these blocks centers around what the data table and constraints must look like for that data store, and the logic to handle the different data types should vary within each implementation block, but override the common abstract block interface so that any data source can be plugged into the thing model, and ThingWorx will know what to do with it.   The requests can then contain whatever information they contain (like MQTT), and which logic is chosen to process that data will be selected based on what information is received by the ThingWorx solution for each message. Data shapes can be abstracted in this way, so that a single subscription need exist to the data shape used in the abstract building block, and its logic knows how to call whatever functions are necessary based on whatever data it receives. This allows for very maintainable API creation for both data ingestion and event processing.
View full tip
Using the Solution Central API Pitfalls to Avoid by Victoria Firewind, IoT EDC   Introduction The Solution Central API provides a new process for publishing ThingWorx solutions that are developed or modified outside of the ThingWorx Platform. For those building extensions, using third party libraries, or who just are more comfortable developing in an IDE external to ThingWorx, the SC API makes it simple to still utilize Solution Central for all solution management and deployment needs, according to ThingWorx dev ops best practices. This article hones in one some pitfalls that may arise while setting up the infrastructure to use the SC API and assumes that there is already AD integration and an oauth token fetcher application configured for these requests.   CURL One of the easiest ways to interface with the SC API is via cURL. In this way, publishing solutions to Solution Central really involves a series of cURL requests which can be scripted and automated as part of a mature dev ops process. In previous posts, the process of acquiring an oauth token is demonstrated. This oauth token is good for a few moments, for any number of requests, so the easiest thing to do is to request a token once before each step of the process.   1. GET info about a solution (shown) or all solutions (by leaving off everything after "solutions" in the URL)     $RESULT=$(curl -s -o test.zip --location --request GET "https://<your_sc_url>/sc/api/solutions/org.ptc:somethingoriginal12345:1.0.0/files/SampleTwxExtension.zip" ` --header "Authorization: Bearer $ACCESS_TOKEN" ` --header 'Content-Type: application/json' ` )     Shown here in the URL, is the GAV ID (Group:Artifact_ID:Version). This is shown throughout the Swagger UI (found under Help within your Solution Central portal) as {ID}, and it includes the colons. To query for solutions, see the different parameter options available in the Swagger UI found under Help in the SC Portal (cURL syntax for providing such parameters is shown in the next example).   Potential Pitfall: if your solution is not published yet, then you can get the information about it, where it exists in the SC repo, and what files it contains, but none of the files will be downloadable until it is published. Any attempt to retrieve unpublished files will result in a 404.   2. Create a new solution using POST     $RESULT=$(curl -s --location --request POST "https://<your_sc_url>/sc/api/solutions" ` --header "Authorization: Bearer $ACCESS_TOKEN" ` --header 'Content-Type: application/json' ` -d '"{\"groupId\": \"org.ptc\", \"artifactId\": \"somethingelseoriginal12345\", \"version\": \"1.0.0\", \"displayName\": \"SampleExtProject\", \"packageType\": \"thingworx-extension\", \"packageMetadata\": {}, \"targetPlatform\": \"ThingWorx\", \"targetPlatformMinVersion\": \"9.3.1\", \"description\": \"\", \"createdBy\": \"vfirewind\"}"' )     It will depend on your Powershell or Bash settings whether or not the escape characters are needed for the double quotes, and exact syntax may vary. If you get a 201 response, this was successful.   Potential Pitfalls: the group ID and artifact ID syntax are very particular, and despite other sources, the artifact ID often cannot contain capital letters. The artifact ID has to be unique to previously published solutions, unless those solutions are first deleted in the SC portal. The created by field does not need to be a valid ThingWorx username, and most of the parameters given here are required fields.   3.  PUT the files into the project     $RESULT=$(curl -L -v --location --request PUT "<your_sc_url>/sc/api/solutions/org.ptc:somethingelseoriginal12345:1.0.0/files" ` --header "Authorization: Bearer $ACCESS_TOKEN" ` --header 'Accept: application/json' ` --header 'x-sc-primary-file:true' ` --header 'Content-MD5:08a0e49172859144cb61c57f0d844c93' ` --header 'x-sc-filename:SampleTwxExtension.zip' ` -d "@SampleTwxExtension.zip" ) $RESULT=$(curl -L --location --request PUT "https://<your_sc_url>/sc/api/solutions/org.ptc:somethingelseoriginal12345:1.0.0/files" ` --header "Authorization: Bearer $ACCESS_TOKEN" ` --header 'Accept: application/json' ` --header 'Content-MD5:fa1269ea0d8c8723b5734305e48f7d46' ` --header 'x-sc-filename:SampleTwxExtension.sha' ` -d "@SampleTwxExtension.sha" )     This is really TWO requests, because both the archive of source files and its hash have to be sent to Solution Central for verifying authenticity. In addition to the hash file being sent separately, the MD5 checksum on both the source file archive and the hash has to be provided, as shown here with the header parameter "Content-MD5". This will be a unique hex string that represents the contents of the file, and it will be calculated by Azure as well to ensure the file contains what it should.   There are a few ways to calculate the MD5 checksums and the hash: scripts can be created which use built-in Windows tools like certutil to run a few commands and manually save the hash string to a file:      certutil -hashfile SampleTwxExtension.zip MD5 certutil -hashfile SampleTwxExtension.zip SHA256 # By some means, save this SHA value to a file named SampleTwxExtension.sha certutil -hashfile SampleTwxExtension.sha MD5       Another way is to use Java to generate the SHA file and calculate the MD5 values:      public class Main { private static String pathToProject = "C:\\Users\\vfirewind\\eclipse-workspace\\SampleTwxExtension\\build\\distributions"; private static String fileName = "SampleTwxExtension"; public static void main(String[] args) throws NoSuchAlgorithmException, FileNotFoundException { String zip_filename = pathToProject + "\\" + fileName + ".zip"; String sha_filename = pathToProject + "\\" + fileName + ".sha"; File zip_file = new File(zip_filename); FileInputStream zip_is = new FileInputStream(zip_file); try { // Calculate the MD5 of the zip file String md5_zip = DigestUtils.md5Hex(zip_is); System.out.println("------------------------------------"); System.out.println("Zip file MD5: " + md5_zip); System.out.println("------------------------------------"); } catch(IOException e) { System.out.println("[ERROR] Could not calculate MD5 on zip file named: " + zip_filename + "; " + e.getMessage()); e.printStackTrace(); } try { // Calculate the hash of the zip and write it to a file String sha = DigestUtils.sha256Hex(zip_is); File sha_output = new File(sha_filename); FileWriter fout = new FileWriter(sha_output); fout.write(sha); fout.close(); System.out.println("[INFO] SHA: " + sha + "; written to file: " + fileName + ".sha"); // Now calculate MD5 on the hash file FileInputStream sha_is = new FileInputStream(sha_output); String md5_sha = DigestUtils.md5Hex(sha_is); System.out.println("------------------------------------"); System.out.println("Zip file MD5: " + md5_sha); System.out.println("------------------------------------"); } catch (IOException e) { System.out.println("[ERROR] Could not calculate MD5 on file name: " + sha_filename + "; " + e.getMessage()); e.printStackTrace(); } }     This method requires the use of a third party library called the commons codec. Be sure to add this not just to the class path for the Java project, but if building as a part of a ThingWorx extension, then to the build.gradle file as well:     repositories { mavenCentral() } dependencies { compile fileTree(dir:'twx-lib', include:'*.jar') compile fileTree(dir:'lib', include:'*.jar') compile 'commons-codec:commons-codec:1.15' }       Potential Pitfalls: Solution Central will only accept MD5 values provided in hex, and not base64. The file paths are not shown here, as the archive file and associated hash file shown here were in the same folder as the cURL scripts. The @ syntax in Powershell is very particular, and refers to reading the contents of the file, in this case, or uploading it to SC (and not just the string value that is the name of the file). Every time the source files are rebuilt, the MD5 and SHA values need to be recalculated, which is why scripting this process is recommended.   4. Do another PUT request to publish the project      $RESULT=$(curl -L --location --request PUT "https://<your_sc_url>/sc/api/solutions/org.ptc:somethingelseoriginal12345:1.0.0/publish" ` --header "Authorization: Bearer $ACCESS_TOKEN" ` --header 'Accept: application/json' ` --header 'Content-Type: application/json' ` -d '"{\"publishedBy\": \"vfirewind\"}"' )     The published by parameter is necessary here, but it does not have to be a valid ThingWorx user for the request to work. If this request is successful, then the solution will show up as published in the SC Portal:    Other Pitfalls Remember that for this process to work, the extensions within the source file archive must contain certain identifiers. The group ID, artifact ID, and version have to be consistent across a couple of files in each extension: the metadata.xml file for the extension and the project.xml file which specifies which projects the extensions belong to within ThingWorx. If any of this information is incorrect, the final PUT to publish the solution will fail.   Example Metadata File:     <?xml version="1.0" encoding="UTF-8"?> <Entities> <ExtensionPackages> <ExtensionPackage artifactId="somethingoriginal12345" dependsOn="" description="" groupId="org.ptc" haCompatible="false" minimumThingWorxVersion="9.3.0" name="SampleTwxExtension" packageVersion="1.0.0" vendor=""> <JarResources> <FileResource description="" file="sampletwxextension.jar" type="JAR"></FileResource> </JarResources> </ExtensionPackage> </ExtensionPackages> <ThingPackages> <ThingPackage className="SampleTT" description="" name="SampleTTPackage"></ThingPackage> </ThingPackages> <ThingTemplates> <ThingTemplate aspect.isEditableExtensionObject="false" description="" name="SampleTT" thingPackage="SampleTTPackage"></ThingTemplate> </ThingTemplates> </Entities>       Example Projects XML File:     <?xml version="1.0" encoding="UTF-8"?> <Entities> <Projects> <Project artifactId="somethingoriginal12345" dependsOn="{&quot;extensions&quot;:&quot;&quot;,&quot;projects&quot;:&quot;&quot;}" description="" documentationContent="" groupId="org.ptc" homeMashup="" minPlatformVersion="" name="SampleExtProject" packageVersion="1.0.0" projectName="SampleExtProject" publishResult="" state="DRAFT" tags=""> </Project> </Projects> </Entities>       Another large issue that may come up is that requests often fail with a 500 error and without any message. There are often more details in the server logs, which can be reviewed internally by PTC if a support case is opened. Common causes of 500 errors include missing parameter values that are required, including invalid characters in the parameter strings, and using an API URL which is not the correct endpoint for the type of request. Another large cause of 500 errors is providing MD5 or hash values that are not valid (a mismatch will show differently).    Another common error is the 400 error, which happens if any of the code that SC uses to parse the request breaks. A 400 error will also occur if the files are not being opened or uploaded correctly due to some issue with the @ syntax (mentioned above).  Another common 400 error is a mismatch between the provided MD5 value for the zip or SHA file, and the one calculated by Azure ("message: Md5Mismatch"), which can indicate that there has been some corruption in the content of the upload, or simply that the MD5 values aren't being calculated correctly. The files will often say they have 100% uploaded, even if they aren't complete, errors appear in the console, or the size of the file is smaller than it should be if it were a complete upload (an issue with cURL).   Conclusion Debugging with cURL can be a challenge. Note that adding "-v" to a cURL command provides additional information, such as the number of bytes in each request and a reprint of the parameters to ensure they were read correctly. Even still, it isn't always possible for SC to indicate what the real cause of an issue is. There are many things that can go wrong in this process, but when it goes right, it goes very right. The SC API can be entirely scripted and automated, allowing for seamless inclusion of externally-developed tools into a mature dev ops process.
View full tip
User Load Testing in ThingWorx Java Client Tutorial Written by Tori Firewind, IoT EDC   Introduction As stated in previous posts, user load testing is a critical component of ensuring a ThingWorx solution is Enterprise-ready. Even a sturdy new feature that seems to function well in development can run into issues once larger loads are thrown into the mix. That's why no piece of code should be considered production-ready until it has undergone not just unit and integration testing (detailed in our Comprehensive DevOps Guide), but also load testing that ensures a positive user experience and an adequately sized server to facilitate the user load.    The EDC has spent quite a few posts detailing the process of setting up an accurate, real-world testing suite using JMeter for ThingWorx. In this piece, we detail an alternative approach that makes use of the Java Spring Boot Framework to call rest requests against the ThingWorx server and simulate the user load. This Java Client tutorial produces a very immature user load client, one which would still take a lot of development to function as flexibly as the JMeter tutorial counterpart. For Java developers, however, this is still a very attractive approach; it allows for more custom, robust testing suites that come only as an investment made in a solid testing tool.   For someone experienced in Java, the risk is smaller of overlooking some aspect of simulation that JMeter may have handled automatically. For example, JMeter automatically creates more than one HTTP session, and it's much easier to implement randomized user logins instead of one account. The Java Client could do it with some extra work (not demonstrated here), but it uses just the Administrator login by default for a quick and dirty sort of load test, one focused less on the customer experience and more on server and database performance under the strain of the user requests (the method used in our sizing guidance, for instance, to see if a server is sized correctly).   The amount of time required to develop a Java Client isn't so bad for a Java developer, and when compared with learning the JMeter Framework, might be a better investment. A tool like this can handle a greater number of threads on a single testing VM; JMeter caps out around 250 threads per client on an 8Gb VM (under ideal conditions), while a Java Client can have thousands of threads easily. Likewise, a Java Client has less memory overhead than JMeter, less concern for garbage collection, and less likelihood that influence from heap memory management will affect the test results.   However, remember that everything in a Java Client has to be built from scratch and maintained over time. That means that beyond the basic tutorial here, there needs to be some kind of metrics gathering and analysis tool implemented (JMeter has built-in reporting tools), the calls need to be randomized, and not called at set intervals like they are here (which is not a very accurate representation of user load compared to a real-world scenario), and the number of users accessing the system at once should probably vary over time (to resemble peak usage hours). JMeter has a recording tool to ensure all the necessary REST requests to simulate a mashup load are made, so great care has to be taken to ensure all of the necessary REST calls for a mashup are made by the Java Client if a true simulation is called for by that approach.    Java Client Tutorial   Conclusion Neither a Java Client nor a JMeter testing suite is inherently better than the other, and both have their place within PTC's various testing processes. The best test of all is to stand up any sort of user load testing client, either of these approaches, at the same time as the UAT or QA user experience testing. QA testers who load and click about on mashups in true, user fashion can then see most accurately how the mashups will perform and what the users will experience in the Enterprise-ready, production application once the changes go out.
View full tip
Solution Central and Azure Active Directory Written by: Tori Firewind, IoT EDC   As we’ve said in a previous post, Solution Central (SC) is a crucial part of any mature dev ops pipeline. In its latest version (3.1.0), it manages custom solutions even more easily due to the SC API. However, this comes with a few requirements that can be a little tricky.   One of the more complex configuration pieces for using this new API involves a cloud-hosted Active Directory (AD) application within Azure AD. In order to make use of the new API, your organization’s users must exist on an Azure AD tenant that is separate from the PTC tenant. Most PTC customers are placed on this PTC-owned tenant by default, so additional configuration may be required to set up the AD instance within Azure before the SC API can be used. The type of tenant has to be Azure, of course, as that is what PTC uses: a multi-tenancy Azure infrastructure for all authentication.   In this usage, “tenant” is a cloud-hosting, infrastructure term which essentially refers to an Azure VM hosting an Active Directory server, as well as managing the many AD components that would otherwise require a lot of oversight. Users within that AD server are grouped so that only those solutions published by their own organizations are visible within Solution Central. In this way, the term “tenant” can also refer to a partition of some kind of data; a Solution Central tenant includes the users, the solutions, and is sort of like a sub-tenant of the larger PTC infrastructure. PTC uses a global Solution Central tenant for deployment of its own solutions, like DPM (Digital Performance Management), which is therefore available to every user in the PTC Azure AD tenant or a tenant which has been connected using the tutorial below.   There are good reasons to want to use your own Azure AD integrated into Solution Central that do not involve using the SC API. For one thing, it allows for direct control over the users that have access; otherwise, a ticket to PTC support is necessary every time a user needs access granted or removed. The tutorial below is a great reference for anyone using Solution Central, with steps 1-4 offering an easy guide for integrating your own Azure AD and simplifying your user management.   To use the SC API, however, there are additional steps required to create an application for retrieving an oauth token. This application needs the “solution-publisher” role or else bad request/forbidden errors will pop up. This role is automatically available in your Azure AD tenant once it is linked to the PTC AD tenant, but it does have to be manually assigned to the application, whose sole function is to request this access token for authenticating requests against the SC server.   The Azure application you need to create is essentially a plugin with permission to publish solutions against the SC API. It functions a little like a “login” in database terminology, serving the function of authenticating to an endpoint (in this case not tied to a user or any kind of identity). This application must be able to request an access token successfully and return that to the scripts which call upon it in order to perform the project creation and publication requests. The tutorial below steps you through how to create this application and request all of your solutions via the SC API.   Tutorial: Create Azure AD Application to Access SC via API Create a new tenant in Azure AD for users who will have access to the SC API Create a ticket with PTC to provide the tenant ID and begin the onboarding process Once that process completes, you will receive an email with a custom link to login to the Solution Central portal for your organization, where only your solutions exist Users will then need to be added as “Custom Global Administrators” on the SC enterprise application within Azure Portal to grant them access to login to Solution Central in a browser In order for us to be able to use the SC API, however, more work needs to be done; an application to request an oauth token for requests must be created in Azure Portal Navigate to “App Registrations” from the Azure AD page in Azure Portal and then click “New registration” Enter the application name (ours is called “gradle-plugin-tokenfetcher” in the examples shown here), and select the “single tenant” radio button Enter a URL for redirect URI and for “Select a platform” select “Web” Click “Register”, and wait for the application created notification Now we need to add permissions to see Solution Central Open the app from under “App registrations” and select the “API permissions” tab Click “Add a permission” and in the pop-up window that appears, select the “APIs my organization uses” tab, which should have PTC Solution Central listed, if this tenant has been linked to the PTC tenant per step 2 above Select “Application Permissions” and then check the “solution-publisher” role Click “Add permissions” The application now has access to Solution Central as a publisher, able to send solution publications over the API and not just via the Platform interface   The application is ready now, so we can create a request to test that it has access to SC Using Windows Powershell or something similar, create a script to make a couple of cURL requests against first the Azure AD and then the Solution Central API (attached as well): The tenant ID and the directory ID are the same value (listed on the tenant under “Overview”), and the client ID and application ID are the same as well (listed on the application, shown here), so don’t be confused by terminology The client app secret is the “Value” provided by the system when a new client secret is created under “Certificates & Secrets” (remember to copy this as soon as it is made and before clicking away, as after that, it will no longer be visible): The Solution Central app ID can be found under “App registrations”, listed within the details for the “solution-publisher” role: This access token request piece must be done before every request to the SC API, so this secret should be kept in some kind of password management tool (or a global environment variable in GitLab) so that it isn’t found anywhere in the source code If the script pings the SC API successfully, then a list of solutions will be returned
View full tip
The DPM User Experience Written by Tori Firewind, IoT EDC Team   As discussed in a previous post, DPM is a tool designed to be beneficial at all levels of a company, from the operators monitoring automated data on production events from the factory machines themselves, to the production supervisors who need to establish, task out, and track machine maintenance and improvement measures. DPM also engages the continuous improvement and plant leadership, by providing a standardized way to monitor performance that ultimately rolls up to the executive level. The end users of DPM are therefore diverse both in how they access DPM, and how they make use of its various features.   One of the perks to building DPM on top of the ThingWorx Foundation is that many of the webpages (called “mashups”) within ThingWorx are already responsive, and any  which aren’t responsive OOTB can be modified and custom designed for different size viewing screens to ensure that if necessary, end users can access DPM   from a variety of locations and devices. Most of the time, end users will be accessing mashups from hard-wired dashboards mounted on the actual devices,    or from wireless laptops which have standard size screens with standard resolutions. For use cases involving phones or tablets, however, it may be necessary to see how DPM will perform across a variety of bandwidth and latency conditions. Often, cellular or satellite connection is a must to facilitate field team cooperation, and 5G networks often result in worsened performance.   So, to demonstrate the influence of bandwidth and latency on the responsiveness of DPM, the Production Dashboard was loaded in the Google Chrome browser repeatedly under varying conditions. This dashboard is the webpage most operators and field users would access to log event information and production details (so it is widely used by end users). This provides a sort of benchmark of the DPM solution, something which indicates what can be expected and tells us a few things about how DPM should be deployed and configured.   Latency was introduced by hosting the servers involved in the test in different regions (all Azure cloud hosted servers, one in US East, one US West, and one in Japan East). Bandwidth was introduced using a tool on the PC with either no bandwidth or 4 megabits/second.   Browser caching was turned on and off as well, to simulate the difference between new and return users; new users would not have the webpage cached, so their load times are expected to be longer. Tomcat compression was also configured in half of the runs to demonstrate the importance of compression for optimal performance.   Each of these 24 scenarios was then tested 10 times from each location, and the actual data can be found in the attached benchmark document (a working  solution benchmark, which is not designed to be referenced directly, as matters of infrastructure may influence the exact performance of the solution).  Even with bandwidth, every region sees better performance for return users versus new users, which may be important to note. However, because DPM field users most commonly access DPM often, the return user time is a better indicator of adoption, and those numbers look great in our simulations. Notice the top line which shows the very worst of mobile performance, what happens over networks with bandwidth when Tomcat Compression is not enabled. Load times vary only slightly for regular networks when Tomcat Compression is enabled, and they vastly improve performance across regions and on mobile networks, so it is highly recommended (instructions on how to enable are below).   Key Takeaways Latency and bandwidth impact DPM performance in exactly the way one would expect of a web application. While any DPM server can be accessed from any region, regions with more latency will experience delays proportional to the amount of latency. In the chart here, find the three regions represented three times by three different colors (different from the charts above): The three different shades of each color represent the different regions Green represents the optimal configuration settings (Tomcat compression enabled, caching turned on) for returning users with bandwidth limitations (i.e. mobile networks like 5G) Blue shows first-time page visitors with no bandwidth limitations Purple shows first-time visitors that do have bandwidth limitations The uncompressed first-time load for mobile users (those with bandwidth limitations imposed) within the same region is also given to demonstrate the importance of enabling Tomcat Compression (load times only get worse without compression the farther the region) Notice how the green series has lower load times across the board than the blue one, meaning that return users even with bandwidth limitations have better performance across every region than new users. Also notice how the gap is larger between lighter colors and darker colors, where the darker the color, the farther the region from the DPM servers. This indicates that network latency has a more significant influence on performance versus bandwidth, with only longer running transactions like file uploads seeing a significant performance hit when on a network with bandwidth limitations.  Find out how to enable tomcat compression  and review the full solution benchmark in the document attached.  
View full tip
Announcing: ThingWorx Solution Central 3.1.0 and its New API Written by: Tori Firewind of the IoT EDC   Solution Central 3.1.0 ThingWorx Solution Central (SC) is the solution management tool for ThingWorx and Digital Performance Management (DPM), the latest version of which (DPM 1.1) can now be deployed directly from the PTC Solutions menu of SC. Streamlining packaging strategies and ensuring efficient solution deployment can now be done for all kinds of ThingWorx solutions, even those with heavy customization. The new API allows for updated building blocks from within the PTC Solutions menu to be easily discovered and deployed right alongside custom solutions. Even the most advanced developers can now house their deployment management process within SC.   As discussed in a previous article, Solution Central forms a necessary part of a mature DevOps pipeline, usually as a set of services within Foundation which finalize and publish the solution to the Solution Central servers. The recommendation to utilize Solution Central from within Foundation remains a best practice for ThingWorx DevOps because the vast majority of solutions benefit from using the ThingWorx APIs, which scan and check for dependencies and proper XML formatting on each included entity.   Packaging and publishing the solution from within ThingWorx is the easiest and most straightforward way recommended by PTC, but it is now possible to publish to Solution Central using a standard API for those who need to publish from Jenkins or other build jobs. If there are legacy extensions, 3rd party tool dependencies, or other customizations within the ThingWorx application, then it may be beneficial to use this new API instead.   The new API also allows for editable extensions and entities within a published solution, though PTC still recommends avoiding this as a general practice. It is usually better for purposes of maintainability and ease of upgrades to just publish the solution again (with an incremented) version each time any changes are made.   How to Use the API Within Solution Central, a new menu option has been added to review the API, with information about the different types of requests and their parameters and responses. To access this within the Solution Central UI, open the help menu and navigate to “Public APIs” (see the image on the right). To see a sample response, select a request type and scroll down to the “Responses” section (shown below). Examples of error responses are also provided, and it’s important to ensure that whatever makes these requests can properly report or log any errors for troubleshooting and maintenance of the DevOps process.   The general steps for making use of this API are as follows: Create the solution resource with a POST Add some files to that solution with PUTs First, create the solution archive with the right Solution Identifiers; this should contain at least one project XML and all of the entities belonging to that ThingWorx project Next, compute MD5 using a tool like DigestUtils on the contents of the archive; this checksum is required for Solution Central Compute the SHA hash on the archive and save it; this will need to be provided along with the archive in the PUT requests Compute MD5 on the hash file also Finally, make the two PUT requests, one for the archive and one for its hash; for example cURL requests, see the Help Center Publish the solution using a PUT So, with a little more work it is now possible to make use of SC in a more custom DevOps process. It is now possible to build JSON or XML solutions using development tools outside of ThingWorx and still publish these customizations to Solution Central. The process of DevOps Management just became more versatile, and with the ease of deployment of DPM and other PTC building blocks as well, ThingWorx is now more accessible and easy to use than ever before.
View full tip
MachNation  Podcast Replay: Enterprise-Specific Implementation Testing a podcast, by Mike Jasperson,  VP of the IoT EDC   MachNation, a company   exclusively dedicated to testing and benchmarking Internet of Things (IoT) platforms, end-to-end solutions, and services, has conducted a recent podcast series featuring our very own Mike Jasperson, Vice President of the IoT Enterprise Deployment Center here at PTC. Performance IoT    is a podcast series that brings together experts who make IoT performance testing and high-resiliency IoT part of their IoT journey. Mike Jasperson's podcast is episode 5 in the series, titled:   Enterprise-Specific Implementation Testing .  Enjoy!
View full tip
Distributed Timer and Scheduler Execution in a ThingWorx High Availability (HA) Cluster Written by Desheng Xu and edited by Mike Jasperson    Overview Starting with the 9.0 release, ThingWorx supports an “active-active” high availability (or HA) configuration, with multiple nodes providing redundancy in the event of hardware failures as well as horizontal scalability for workloads that can be distributed across the cluster.   In this architecture, one of the ThingWorx nodes is elected as the “singleton” (or lead) node of the cluster.  This node is responsible for managing the execution of all events triggered by timers or schedulers – they are not distributed across the cluster.   This design has proved challenging for some implementations as it presents a potential for a ThingWorx application to generate imbalanced workload if complex timers and schedulers are needed.   However, your ThingWorx applications can overcome this limitation, and still use timers and schedulers to trigger workloads that will distribute across the cluster.  This article will demonstrate both how to reproduce this imbalanced workload scenario, and the approach you can take to overcome it.   Demonstration Setup   For purposes of this demonstration, a two-node ThingWorx cluster was used, similar to the deployment diagram below:   Demonstrating Event Workload on the Singleton Node   Imagine this simple scenario: You have a list of vendors, and you need to process some logic for one of them at random every few seconds.   First, we will create a timer in ThingWorx to trigger an event – in this example, every 5 seconds.     Next, we will create a helper utility that has a task that will randomly select one of the vendors and process some logic for it – in this case, we will simply log the selected vendor in the ThingWorx ScriptLog.     Finally, we will subscribe to the timer event, and call the helper utility:     Now with that code in place, let's check where these services are being executed in the ScriptLog.     Look at the PlatformID column in the log… notice that that the Timer and the helper utility are always running on the same node – in this case Platform2, which is the current singleton node in the cluster.   As the complexity of your helper utility increases, you can imagine how workload will become unbalanced, with the singleton node handling the bulk of this timer-driven workload in addition to the other workloads being spread across the cluster.   This workload can be distributed across multiple cluster nodes, but a little more effort is needed to make it happen.   Timers that Distribute Tasks Across Multiple ThingWorx HA Cluster Nodes   This time let’s update our subscription code – using the PostJSON service from the ContentLoader entity to send the service requests to the cluster entry point instead of running them locally.       const headers = { "Content-Type": "application/json", "Accept": "application/json", "appKey": "INSERT-YOUR-APPKEY-HERE" }; const url = "https://testcluster.edc.ptc.io/Thingworx/Things/DistributeTaskDemo_HelperThing/services/TimerBackend_Service"; let result = Resources["ContentLoaderFunctions"].PostJSON({ proxyScheme: undefined /* STRING */, headers: headers /* JSON */, ignoreSSLErrors: undefined /* BOOLEAN */, useNTLM: undefined /* BOOLEAN */, workstation: undefined /* STRING */, useProxy: undefined /* BOOLEAN */, withCookies: undefined /* BOOLEAN */, proxyHost: undefined /* STRING */, url: url /* STRING */, content: {} /* JSON */, timeout: undefined /* NUMBER */, proxyPort: undefined /* INTEGER */, password: undefined /* STRING */, domain: undefined /* STRING */, username: undefined /* STRING */ });   Note that the URL used in this example - https://testcluster.edc.ptc.io/Thingworx - is the entry point of the ThingWorx cluster.  Replace this value to match with your cluster’s entry point if you want to duplicate this in your own cluster.   Now, let's check the result again.   Notice that the helper utility TimerBackend_Service is now running on both cluster nodes, Platform1 and Platform2.   Is this Magic?  No!  What is Happening Here?   The timer or scheduler itself is still being executed on the singleton node, but now instead of the triggering the helper utility locally, the PostJSON service call from the subscription is being routed back to the cluster entry point – the load balancer.  As a result, the request is routed (usually round-robin) to any available cluster nodes that are behind the load balancer and reporting as healthy.   Usually, the load balancer will be configured to have a cookie-based affinity - the load balancer will route the request to the node that has the same cookie value as the request.  Since this PostJSON service call is a RESTful call, any cookie value associated with the response will not be attached to the next request.  As a result, the cookie-based affinity will not impact the round-robin routing in this case.   Considerations to Use this Approach   Authentication: As illustrated in the demo, make sure to use an Application Key with an appropriate user assigned in the header. You could alternatively use username/password or a token to authenticate the request, but this could be less ideal from a security perspective.   App Deployment: The hostname in the URL must match the hostname of the cluster entry point.  As the URL of your implementation is now part of your code, if deploy this code from one ThingWorx instance to another, you would need to modify the hostname/port/protocol in the URL.   Consider creating a variable in the helper utility which holds the hostname/port/protocol value, making it easier to modify during deployment.   Firewall Rules: If your load balancer has firewall rules which limit the traffic to specific known IP addresses, you will need to determine which IP addresses will be used when a service is invoked from each of the ThingWorx cluster nodes, and then configure the load balancer to allow the traffic from each of these public IP address.   Alternatively, you could configure an internal IP address endpoint for the load balancer and use the local /etc/hosts name resolution of each ThingWorx node to point to the internal load balancer IP, or register this internal IP in an internal DNS as the cluster entry point.
View full tip
Thundering Herd Scenarios in ThingWorx Written by Jim Klink, Edited by Tori Firewind   Introduction The thundering herd topic is quite vast, but it can be broken down into two main categories: the “data flood” and the “reconnect storm”. One category involves what happens to the business login (the “data flood” scenario) and affects both Factory and Connected Products use cases. The other category involves bringing many, many devices back online in a short time (the “reconnect storm” scenario), which largely influences Connected Products scenarios.   Citation: https://gfp.sd.gov/buffalo-roundup/ Think of Connected Products as a thundering stampede of many small buffalo, which then makes a Factory thundering herd scenario a stampede of a couple massive brontosaurus, much fewer in number, but still with lots of persisted data to send back in. This article focuses in on how to manage the “reconnect storm” scenario, by delaying the return of individual buffalo to reduce the intensity of the stampede. Find here the necessary insights on how to configure your ThingWorx edge applications to minimize the effect of a server down scenario.    The C-SDK will be used for examples, but the general principles will apply to any of the ThingWorx edge options (EMS, .Net SDK, Java SDK).  This article also references the ExampleAgent application which is built using the C-SDK. The ExampleAgent is available for download as an attachment to this post.  It offers an easily configurable edge solution for Windows and Linux that can be used for the following purposes: Foundation for rapid development of a robust custom edge application based on the ThingWorx C-SDK for use by customers and partners. Full featured, well documented, ‘C’ source code example of developing an application using the ThingWorx C-SDK. A “local” issue is one which affects a single agent, a loss of connectivity due to hardware malfunction or local network issues. Local issues are quite common in the IoT world, and recovery usually isn’t too much of a challenge. A “global” issue occurs when many agents disconnect simultaneously, usually because there is an issue with the ThingWorx server itself (though the Load Balancer, Connection Server, or web hosting software could also be the source). Perhaps it is a scheduled software update, perhaps it is unexpected downtime due to issues, but either way, it’s important to consider how the fleet of agents will respond if ThingWorx suddenly becomes unavailable.   There are two broad issues to consider in a situation like this. One is maintaining the agent’s data so that it can be sent when the connection becomes available again. This can be done in the C-SDK using an offline file storage system, which includes properties, events, and services. Offline storage is configured in the twConfig.h file in the C SDK.  The second issue the number of Agents seeking to reconnect to the server in a short period of time when the server is available.    Of course, if revenue is based on uptime, perhaps persisting data is less critical and can be lost, making things simple. However, in most cases, this data will need to be stored on the edge device until reconnect. Then, once the server comes back up, suddenly all of this data comes streaming in from all of the many edge devices simultaneously.   This flood of both data and reconnection of a multitude of agents can create what is called a “thundering herd” scenario, in which ThingWorx can become backlogged with data processing requests, data can be lost if the queues are overwhelmed, or worst-case, the Foundation server can become unresponsive once again. This is when outages become costly and drag on longer than necessary. Several factors can lead to a thundering herd scenario, including the number of agents in the fleet, the amount of stored data per agent, the amount of data ordinarily sent by these devices, which is sent side-by-side with the stored data upon reconnection, and how much processing occurs once all of this data is received on the Foundation server.   The easiest way to mitigate a potential thundering herd scenario, and this is considered a ThingWorx best practice as well, is to randomize the reconnection of devices. Each agent can be configured to delay itself by a random amount of time before attempting to reconnect after a loss of connectivity. This random delay then distributes the number of assets connecting at a time over a longer period, thus minimizing the impact of the reconnections on ThingWorx. There are several configuration settings that help in this regard.   Configuring the Herd (C-SDK) The C-SDK is great at managing agent connectivity, having a lot of options for fine-tuning the connections. The web-socket connection is managed by the SDK layer of the edge device (which also manages the retry process). To review the source code for how connections are made, see the C-SDK file found here: src\api\twApi.c, specifically the function called twApi_Connect().   The ExampleAgent uses custom configuration files to manage this process from the application layer, a more robust and complete solution. Detailed here are the configuration options in the ExampleAgent attached to this post, most of which can be found in its ws_connection.json configuration file: connect_timeout is used throughout the C-SDK as the time to wait for a web-socket connection to be established (i.e. the ‘timeout’ value). This is the maximum delay for the socket to be established or to send and receive data. If it is established sooner, then a success code is returned. If a connection is not established in the configured timeout period, then an error is returned. Setting this value to 10 seconds is reasonable, for reference. connect_retries is the number of times the SDK will attempt to establish a connection before the twApi_Connect() function returns an error. Setting this to -1 will trigger the SDK to stay in the loop infinitely until a connection is established. connect_retry_interval is the delay between connection retries. max_connect_delay is used as a delay before even entering the loop, that which uses the connect_retries and connect_retry_interval parameters to establish the connection. The SDK function twAddConnectionDelay() is called, which delays by a random amount of time between 0 seconds and the value given by this parameter. This random delay is only used once per call to twApi_Connect().  This is therefore the parameter most critical to preventing thundering herd scenarios (as discussed above). Configuring the SDK agents to reconnect in this way is critical, but there are also some drawbacks, namely that while the twApi_Connect() function is running, there is no clean way of shutting the agent down. Likewise, the agent only does ONE randomized delay per call of the twApi_Connect() function, meaning that if reconnection cannot occur immediately, it’s still possible for many agents to try to reconnect at once. Consider this when determining what values to assign to these parameters.   ExampleAgent Design The ExampleAgent provided here is a fully implemented, configurable application, like the EMS in terms of functionality, but containing only simulated data. The data capture component is missing here and has to be custom developed. Attached alongside this source code is extensive documentation that explains how to get the application set up and configured. This isn’t meant to be used directly in a production environment.   Please note that the ExampleAgent is provided as-is; it is not an officially released product by PTC.   This disclaimer includes the ExampleAgent source code, build process, documentation, deliverables as well as any ExampleAgent modifications to the official releases of the C-SDK or the SCM extension product. Full and sole responsibility for the use, deployment, reliability, and accuracy of any ExampleAgent related code, documentation, etc. falls to the user, and any use of the ExampleAgent is an implicit agreement with this disclaimer.   The ExampleAgent was developed by PTC sales and services to help in the Edge application development process.  For assistance, support, or additional development, an authorized statement of work is needed.  Please Note:  PTC support is not aware of the existence of the ExampleAgent and cannot provide assistance.    Because of the small downside to configuring the twApi_Connect() function directly as discussed above, there is alternative approach given here as well. The ExampleAgent module ConnectionMgr.c controls the calling of the twApi_Connect() on a dedicated connection thread. The ConnectionMgrThreadFunction() contains the source code necessary to understanding this process.   The ConnectionMgr.c workflow and source code visualization via Microsoft Visual Studio are in the diagrams below. The ExampleAgent defines its own randomized delay to mitigate the thundering herd scenario while still deploying an edge system that responds to shut down requests cleanly. In this case, the randomized delay is configured by the parameter reconnect_random_delay_seconds in the agent_config.json file. Since the ConnectionMgrThreadFunction() controls the calling of twApi_Connect(), the ConnectionMgrThreadFunction() will delay the randomized value EVERY time before calling this reconnect function. A separate thread is created to call the reconnect function so that there are still resources available for data processing and to check for shutdown signals and other conditions.   Recommended Values These recommendations are based around managing the reconnection process from the application layer. These may be different if the C-SDK is configured directly, but creating application layer management is recommended and provided in the ExampleAgent attached. The ExampleAgent is configured by default to simplify the SDK layer’s involvement.   These configuration options tell the SDK layer to try to connect just once, after just 1 second: There is no official recommendation for the above values due to the fact that every use case is different and will require different fine tuning to work well.   Then this setting here handles the retry process from within the application layer of the ExampleAgent: Conclusion To reduce the chances of a thundering herd scenario, configure the fleet to reconnect after differing random delays. The larger the random delay times, the longer it takes for the fleet to come back online and fleet data to be received. While more complex ThingWorx deployment architectures (such as container-based deployments like Kubernetes or Thingworx High Availability (HA) clusters) can also help to address the increased peak load during a thundering herd event, randomized reconnect delays can still be an effective tool.        
View full tip
ThingWorx Docker Overview and Pitfalls to Avoid    by Tori Firewind of the IoT EDC Containers are isolated and can run side-by-side on the same machine, but they share the host OS, making them more efficient in terms of memory usage and scalability.   Docker is a great tool for deploying ThingWorx instances because everything is pre-packaged within the Docker image and can be stored in a repository ready for deployment at any time with little configuration required.  By using a different container for every component of an application, conflicting dependencies can be avoided. Containers also facilitate the dev ops process, providing consistent application deployments which can be set up, taken down, and tested automatically using scripts.   Using containers is advantageous for many reasons: simplified configuration, easier dev ops management, continuous integration and deployment, cost savings, decreased delivery time for new application versions, and many versions of an application running side-by-side without any wasted resources setting them up or tearing them down. The ThingWorx Help Center is a great resource for setting up Docker and obtaining the ThingWorx Docker files from the PTC Software Downloads website. The files provided by PTC handle the creation of the image entirely, simplifying the process immensely. All one has to do is place the ThingWorx version and all of the required dependencies in the staging folder, configure the YML file, and run the build scripts. The Help Center has all of the detailed information required, but there are a few things worth noting here about the configuration process.   For one thing, the platform-settings.json file is generated based on the options given in the YML file, so configuration changes made within this configuration file will not persist if the same options aren’t given in the YML file. If using Docker Desktop to run an image on a Windows machine, then the configuration options must be given in an ENV file that can be referenced from the command used to start the image. The names of the configuration parameters differ from the platform-settings.json file in ways that are not always obvious, and a full list can be found here.   For example, if extension imports need to be enabled on a ThingWorx instance running in Docker, then the EXTPKG_IMPORT_POLICY_ENABLED option must be added to the environment section of the YML file like this:     environment: - "CATALINA_OPTS=-Xms2g -Xmx4g" # NOTE: TWX_DATABASE_USERNAME and TWX_DATABASE_PASSWORD for H2 platform must # be set to create the initial database, or connect to a previous instance. - "TWX_DATABASE_USERNAME=dbadmin" - "TWX_DATABASE_PASSWORD=dbadmin" - "EXTPKG_IMPORT_POLICY_ENABLED=true" - "EXTPKG_IMPORT_POLICY_ALLOW_JARRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_JSRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_CSSRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_JSONRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_WEBAPPRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_ENTITIES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_EXTENTITIES=true" - "EXTPKG_IMPORT_POLICY_HA_COMPATIBILITY_LEVEL=WARN" - "DOCKER_DEBUG=true" - "THINGWORX_INITIAL_ADMIN_PASSWORD=Pleasechangemenow"   Note that if the container is started and then stopped in order for changes to the YML file to be made, the license file will need to be renamed from "successful_license_capability_response.bin" to "license_capability_response.bin" so that the Foundation server can rename it. Failing to rename this file may cause an error to appear in the Application Log, and the server to act as if no license was ever installed: "Error reading license feature info for twx_realtime_data_sub".   In Docker Desktop on a Windows machine, create a file called whatever.env and list the parameters as shown here: Then, reference this environment file when bringing up the machine using the following command in Powershell:      docker run -d --env-file h2.env -p 8080:8080 -v ${pwd}/ThingworxPlatform:/ThingworxPlatform -v ${pwd}/ThingworxStorage:/ThingworxStorage -it <image_id>     Notice in this command that the volumes for the ThingworxPlatform and ThingworxStorage folders are specified with the “-v” options. When building the Docker image in Linux, these are given in the YML file under the volumes section like this (only change the path to local mount on the left side of the colon, as the container mount on the right side will never change):      volumes: - ./ThingworxPlatform:/ThingworxPlatform - ./ThingworxStorage:/ThingworxStorage - ./tomcat-logs:/opt/apache-tomcat/logs     Specifying the volumes this way allows for ThingWorx logs and configuration files to be accessed directly, a crucial requirement to debugging any issues within the Foundation instance. These volumes must be mapped to existing folders (which have write permissions of course) so that if the instance won’t come up or there are any other issues which require help from Tech Support, the logs can be copied out and shared. Otherwise, the Docker container is like a black box which obscures what is really going on. There may not be any errors in the Docker logs; the container may just quit without error with no sign of why it won’t stay up. Checking the ThingWorx and Tomcat logs is necessary to debugging, so be sure to map these volumes correctly.   Once these volumes are mapped and ThingWorx is successfully making use of them, adding a license file to the Docker instance is simple. Use the output in the ThingworxPlatform folder to obtain the device ID, grab a valid license file, and put it right back into that ThingworxPlatform folder, exactly the same way as on a regular instance of ThingWorx. However, if the Docker image is being used for a dev ops process, a license may not be necessary. The ThingWorx instance will work and allow development for a time before the trial license expires, which normally will be enough time for developers to make their changes, push those changes to a repository, and tear the container down.   Another thing worth noting about ThingWorx Docker image creation is that the version of Java supplied in the staging folder must match the compatibility requirements for each version of ThingWorx. This is the version of Java used by the container to run the Foundation server. In versions of ThingWorx 9.2+, this means using the Amazon Corretto version of Java. The image absolutely will not start ThingWorx successfully if older versions of Java are used, even if the scripts do successfully build the image.   Also note that in the newer versions of ThingWorx Docker, the ThingWorx Foundation version within the build.env file is used throughout the Docker image creation process. Therefore, while the archive name can be hard-coded to whatever is desired, the version should be left as is, including any additional specifications beyond just the version number. For example, the name of the archive can be given as Thingworx-Platform-H2-9.2.0.zip (a prettier version of the archive name than is used by default), but the PLATFORM_VERSION should still be set to 9.2.0-b30 (which should be how it appears within the build.env file upon download of the ThingWorx Docker files).   Paying attention to every note in the Help Center is critically important to using ThingWorx Docker, as the process is extensive and can become very complicated depending on how the image will be used. However, as long as the volumes are specified and the log files accessible, debugging any issues while bringing up a Docker-contained ThingWorx instance is fairly straightforward.     Credits: Images borrowed from ThingWorx Docker Containerization Tech Talk by Adrian Petrescu
View full tip
5 Common Mistakes to Developing Scalable IoT Applications by Tori Firewind and the IoT EDC Team Introduction To build scalable applications, it’s necessary to identify common mistakes and avoid them at the early stages of development. In an expert session this past month, the PTC Enterprise Deployment Team elaborated on why scalability is important and how to avoid the common development pitfalls in IoT. That video presentation has been adapted here for visual consumption of the content as well.   What is Scalability and Why Does it Matter Enterprise ready applications can scale and easily be maintained, which is important even from day 1 because scalability concerns are the largest cause for delays to Go Lives.  Applications balance many competing requirements, and performance testing is crucial to ensure an application is ready for Go Live. However, don't just test how many remote assets can connect at once, but also any metrics that are expected to increase in time, like the number of remote properties per thing, the frequency of reporting from those properties, or the number of users accessing the system at once. Also consider how connecting more assets will affect the user experience and business logic, and not just the ability to ingest data.   Common Mistake 1: Edge Property Updates Because ThingWorx is always listening for updates pushed from the Edge and those resources are always in use, pulling updates from the Foundation side wastes resources. Fetch from remote every read is essentially a round trip, so it's slower and more memory intensive, but there are reasons to do it, like if the quality tag is needed since the cache doesn't store it. Say a property is pushed at 11:01, and then there's a network issue at 11:02. If the property is pulled from the cache, it will pull the value sent at 11:01 without any indication of there being a more recent value on the Edge device. Most people will use the default options here: read from server cache, which relies on the Edge to push updates, and the VALUE push type, and configuring a threshold is a good idea as well. This way, only those property updates which are truly necessary are sent to the Foundation server. Details on property aspects can be found in KCS Article 252792.   This is well documented in another PTC Community post. This approach is necessary and considered a best practice if there is event logic which depends on multiple properties at once. Sending all of the necessary properties to determine if an event should fire in one Infotable ensures there is no need to query the database each time a property update comes in from the Edge, which ensures independent business logic and reduces the load on the database to improve ingestion performance. This is a very broad topic and future articles will address it more specifically. The When Disconnected property aspect is a good way to configure what happens with Edge property values in a mass disconnect scenario. If revenue depends on uptime, consider losing any data that changes while a device is disconnected. All of the updates can be folded into a single value if the changes themselves aren't needed but an updated value is needed to populate remote properties upon reconnect. Many customers will want to keep all of their data, even when a device is offline and use data stores. In this case, consider how much data each Edge device can store (due to memory limitations on the devices themselves), and therefore how long an outage can last before data is lost anyway. Also consider if Foundation can handle massive spikes in activity when this data comes streaming in. Usually, a Connection Server isn't enough. Remember that the more data needs to be kept, the greater the potential for a thundering herd scenario.   Handling a thundering herd scenario goes beyond sizing considerations. It is absolutely crucial to randomize the delay each device will wait before attempting to reconnect. It should be considered a requirement to have the devices connect slowly and "ramp up" over time for multiple reasons. One is that too much data coming in too fast could overwhelm the ingestion queue and result in data loss. Another is that the business logic could demand so many system resources, that the Foundation server crashes again and again and cannot be recovered. Turning off the business logic it isn't possible if the downtime is unexpected, so definitely rely instead on randomized reconnection times for Edge devices.   Common Mistake 2: Overlooking Differences in HA To accommodate a shared thing model across many servers, changes had to be made in how the thing model is stored and the model tree is walked by the Foundation servers. Model information is no longer cached at the Thing level, and the model tree is therefore walked every time model information is needed, so the number of times a Thing is directly referenced within each service should be limited (see the Help Center for details).   It's best to store whatever information is needed from a Thing in an Infotable, making the Things[thingName] reference a single time, outside of any loops. Storing the property definitions outside of the loop prevents the repetitious Thing references within the service, which otherwise would have occurred twice for each property (for both the name and the description), and then again for every single property on the Thing, a runtime nightmare.   Certain states previously held in memory are now shared across the cluster, like property values, Thing states, and connection statuses. Improvements have been made to minimize the effects of latency on queries, like how they now only return property values on associated Thing Shapes or Thing Templates. Filtering for properties on implementing Things is still possible, but now there is a specific service to do it, called GetThingPropertyValues (covered in detail in the Help Center).    In the script shown above, the first step is a query to get the names of all implementing things of a particular Thing Shape. This is done outside of any loops or queries, so once per service call. Then, an Infotable is built to store what would have been a direct reference to each thing in a traditional loop. This is a very quick loop that doesn't add much by way of runtime since it is all in memory, with no references to the thing model or the database, instead using the results of the first query to build the Infotable. Finally, this thing reference Infotable is passed into the new service GetThingPropertyValues to retrieve all of the property info for all of these things at once, thereby only walking the thing model once. The easiest mistake people would make here is to do a direct thing reference inside of a loop, using code like Things[thingName].Get() over and over again, thereby traversing the thing model repeatedly and adding a lot of runtime. QueryImplementingThingsOptimized is another new service with new parameters for advanced configuration. Searches can now be done on particular networks or to particular depths, and there's an offset parameter that allows for a maximum number of items to be returned starting at any place in the list of Things, where previously if you needed the Things at the end of the list, you had to return all of the Things. All of these options are detailed in the Help Center, as well as the restrictions listed in the image above.    Common Mistake 3: Async Service Misuse   Async services are sometimes required, say if a user has to trigger many updates on many remote things at once by the click of a button on a mashup that should not be locked up waiting for service completion. Too many async service calls, though, result in spikes in activity and competition for resources. To avoid this mistake, do not use async unless strictly necessary, and avoid launching too many async threads in parallel. A thread dump will show how many threads there are and what they are doing.   Common Mistake 4: Thread Pool Overload Adding more threads to the pool may be beneficial in certain circumstances, like if the threads are waiting on other resources to complete their tasks, look stuff up in the database (I/O), or unlock data that can only be accessed one thread at a time (property writes). In this case, threads are waiting on other resources, and not the CPU, so adding more threads to the pool can improve performance. However, too many threads and performance degradation will occur due to increased contention, wasted CPU cycles, and context switches.   To check if there are too many or not enough threads in the pool, take thread dumps and time the completion of requests in the system. Also watch the subsystem memory usage, and note that the side of the queue should never approach the max. Also consider monitoring the overall performance of the system (CPU and Memory) with a tool like Grafana, and remember that a good performance test properly exercises all of the business logic and induces threads in a similar way to real world expectations.   Common Mistake 5: Stream Etiquette Upserts, or updates to database tables, are expensive operations that can interfere with ingestion if they are performed on the wrong tables. This is why Value Stream and Stream data should never be updated by end users of the application. As described in the DGIS document on best practices, aggregation is the key to unlocking optimal performance because it reduces the size of database tables that require upserts. Each data structure shown here has an optimal use in a well-designed ThingWorx application.   Data Tables are great for storing overview information on all of the Things in one view, and queries on this data source are the fastest. Update this data source as often as possible (by timer), allowing enough time for updates to be gathered and any necessary calculations made. Data Tables can also be updated by end users directly because each row locks one at a time during updates. Data Tables should be kept as small as possible to improve performance on mashups, so for instance, consider using one to show all Things per region if there are millions of Things. Roll up information is best stored here to avoid calculations upon mashup load, and while a real-time view of many thousands of things at once is practically impossible, this option allows for a frequently updates overview of many things, which can also drill down to other mashup views that are real-time for one Thing at a time.   Value Streams are best used for data ingestion, and queries to these should be kept to a minimum, largely performed by the roll up logic that populates the Data Tables mentioned above. Queries that chart all of the data coming in are best utilized on individual Thing views so that only a handful of users are querying the same data sources at a time. Also be sure to use start and end dates and make use of the "source" field to improve query performance and create a better user experience. Due to the massive size of the corresponding database tables, it's best to avoid updating Value Streams outside of the data ingestion process altogether.   Streams are similar, but better for storing aggregated, historical data. Usually once per day or per week (outside of business hours if possible), Value Stream data will be smoothed or reduced into less data points and then stored into Streams. This allows for data to be stored for longer periods of time on the server without using up as much memory or hurting query performance. Then the high volume ingested data sources can be purged frequently, as discussed below.   Infotables are the most memory intensive, and are really designed to hold only a small number of rows at a time, usually to facilitate the business logic. Sometimes they will be stored in Streams or Data Tables if they aren't expected to grow larger (see the DGIS Coffee Machine App for an example). Infotables should never be logged; if they are used to transmit Edge property updates (like in the Property Set Approach), they should be processed into other logged (usually local) properties.   Referring to the properties themselves is how to get real-time information on a mashup, say by using the GetProperties service and its auto-update option, which relies on internal websockets. This should be done on individual Thing views only, and sizing considerations need to be made if there will be many of these websockets open at once, say if there are many end users all viewing real-time data at a time.   In the newer versions of ThingWorx, these cannot be updated directly, so find the system object called ThingWorxPersistenceProvider and use the service UpdateStreamDataProcessingSettings. ThingWorx Foundation processes data received from remote devices in batches in order to manage the data flow and reduce database churn. All of these settings configure how large those batches are and how frequently they are flushed to the database (detailed in full in KCS Article 240607). This is very advanced configuration that heavily depends on use case and infrastructure, but some info applies to most people: adjusting the scan rate is usually not beneficial; a healthy queue should never approach the max limit; and defaults differ by database because they function differently. InfluxDB generally works better when there are less processing threads and higher numbers of things per thread, while PostgresDB can have a lot of threads, preferably with less things per thread. That's why the default values shown here are given as the same number of threads (and this can be changed), but Influx has a larger block size and size threshold because it can handle more items per thread. Value Streams ingest all data into the Foundation server, and so the database tables that correspond with these data sources grow very large, very quickly and need to be purged often and outside of business hours, usually once a day or once per week. That's why it's important to reduce the data down to less points and push them into Streams for historical reference. For a span of years, consider a single point a day might be enough, for a span of hours, consider a data point a minute. Push aggregated data into Streams and then purge the rest as soon as it is no longer needed.   In Conclusion
View full tip
Unlocking the Power of Industrial Data Presentation by Mike Jasperson, VP of the IoT Enterprise Deployment Center   his video presentation was performed at the Digital Transformations in Manufacturing conference of 2021, hosted by Enterprise Digital. In this presentation, Mike Jasperson goes over the benefits to modernizing and consolidating access to time-stamped data that is ingested from equipment and sensors into a central location like ThingWorx. Moving away from monolithic, legacy, and siloed systems, and towards more agile solutions, has never been more critical in order to increase machine, operational, and business efficiencies while also opening up visibility into data systems and infrastructure deployments.   This video partners with InfluxData to help customers extract value from IoT data systems, maximizing both performance and operational capabilities of their monitoring systems. To stay competitive in the IoT market, it's important to review the best practices for scaling and testing your industrial metrics solutions, as well as how to get the best performance out of your digital data solutions by using time-series optimized databases like InfluxDB. Open source technologies discussed here are a great way to create modular and upgradable solutions and accelerate IoT innovation.     (view in My Videos)
View full tip
Interested in learning how others using and/or hosting ThingWorx solutions can comply with various regulatory and compliance frameworks?   Based on inquiries regarding the ability of customers to meet a wide range of obligations – ranging from SOC 2 to ISO 27001 to the Department of Defense’s Cybersecurity Maturity Model Certification (CMMC) – the PTC's IoT Product Management and EDC teams have collaborated on a set of detailed articles explaining how to do so.   Please check out the ThingWorx Compliance Hub (support.ptc.com login required) for more information!
View full tip
Thread Safe Coding, Part 2: The Database Locker Approach and Comparison Written by Desheng Xu and edited by @vtielebein    Overview This is the second on this topic, describing an alternate approach to thread safe coding than one which requires the Java extension. The demo use case here is the same as in the previous post, and there is a section at the end comparing the two approaches.   Database Locker for Thread Safe Coding The database locker is an advanced topic, so some experience with the database thing is assumed. The following steps demonstrate how to be thread safe with a database thing.   Create New Database Instance, and New Table for counter It is strongly recommended that a new database instance be created outside of the ThingWorx database schema. This guide will NOT include instructions to create the new database instance. Use the following SQL commands to create a new table: DROP table IF EXISTS counters; CREATE TABLE counters ( name VARCHAR(100) unique , value integer NULL, PRIMARY KEY(name) ); INSERT INTO counters values('DemoCounter',0); This will create a new table called counters, initializing the first counter, called DemoCounter with the value 0. Create a Function to Increase and Return the New counter Value Use the following sample code to create a table lock function: CREATE OR REPLACE FUNCTION IncreaseCounter(coutner_name VARCHAR(100), OUT newvalue INTEGER) AS $$ BEGIN LOCK TABLE counters IN ACCESS EXCLUSIVE MODE; SELECT(SELECT value FROM counters WHERE name = $1) + 1 INTO newvalue; UPDATE counters SET value = newvalue WHERE name = $1; END; $$ language plpgsql;​ Or use the following SQL command to create a new row level locker function: CREATE OR REPLACE FUNCTION IncreaseCounter(counter_name VARCHAR(100), OUT newvalue INTEGER) AS $$ BEGIN SELECT value FROM counters WHERE name = $1 FOR UPDATE INTO newvalue; newvalue := newvalue + 1; UPDATE counters SET value = newvalue WHERE name = $1; END; $$ language plpgsql;   Create a Database Thing Create a thing with the template "database" within ThingWorx, and use the PostgreSQL Driver to connect to the new database instance created above. Create New Services in the Database Thing The service IncreaseCounterDB would be a SQL Query service: SELECT * FROM public.IncreaseCounter([[counter_name]);​ counter_name would be the input parameter, a STRING which is marked as required. The service GetCounterDB would be another SQL Query service: SELECT value FROM public.counters WHERE name=[[counter_name]] LIMIT 1; counter_name would be another input parameter, a STRING which is also marked as required. The service ResetCounterDB would be a SQL Command service: UPDATE public.counters SET value = 0 WHERE name=[[counter_name]]; counter_name is yet another input parameter, also a STRING and also required.  Wrap the Database Thing Service The above database thing service will return an InfoTable, but not an integer. If it's inconvenient to use an InfoTable, wrap the service up into a local Javascript service and return an integer value. The service IncreaseCounter is a wrap up of IncreaseCounterDB and returns an integer value: // result: INFOTABLE dataShape: "" var query_result = me.IncreaseCounterDB({ counter_name: 'DemoCounter' /* STRING */ }); var result = query_result.rows[0]["newvalue"]; Similarly wrap up GetCounter into GetCounterDB: // result: INFOTABLE dataShape: "SingleIntegerDatashape" var query_result = me.GetCounterDB({ counter_name: 'DemoCounter' /* STRING */ }); var result = query_result.rows[0]["value"];​ And ResetCounter into ResetCounterDB: // result: NUMBER var query_result = me.ResetCounterDB({ counter_name: 'DemoCounter' /* STRING */ }); var result = 0;​ Run the Test Again If necessary, head back to the previous post to obtain the tool. Then just change the end point and run a new test: { "host":"twx85.desheng.io", "port":443, "protocol":"https", "endpoint":"/Thingworx/Things/DatabaseDemo/services/IncreaseCounter", "headers":{ "Content-Type":"application/json", "Accept": "application/json", "AppKey":"5cafe6eb-adba-41df-a7d6-4fc8088125c1" }, "payload":{}, "round_break":50000, "req_break":0, "round_size":50, "total_round":20 }​ Run: Validate the Result Execute the service GetCounter to validate the result: Overall Performance Comparison The Java Extension performance looks the best here, but the database row lock will perform better if there are multiple counters.   InfoTable Type Property InfoTable properties have the same thread-safe challenges discussed previously, but they also have some additional challenges due to the way data change events are triggered. This is outside of the scope of this document, but it is worth a very brief mention here.    In general, the data change event for an InfoTable fires when the reference to the table is updated, and not the contents of the table. If the values of an InfoTable are updated directly, say by adding or removing a row, then the data change event will not be triggered because the value has technically not changed. Instead, the InfoTable has to be cloned, then modified, and then assigned back to the Thing so that the reference changes as well. Such additional considerations must be made when using other property types than those shown here. 
View full tip
Update to Connected Factories Benchmark   Scenario Three: One Kepware Server in ThingWorx 9.0 The goal of this scenario is to confirm the same performance in ThingWorx 9.0 as seen in scenario one, where one Kepware Server represented a single factory in version 8.5.   Matrix 1 - Slow (15s slow properties, 1s fast) The lower frequency tests performed the same in 9.0. Even the 10k ingestion test, which lies very close to the boundary for a single Kepware Server, passed with no errors. Matrix 2 – Fast (5s slow properties, 500ms fast) These showed similar results, but the 500 thing, 50-10 property test had data loss in 9.0. However, the write rate is much higher than PTC recommends for a single Kepware Server anyway.     Matrix 3 – Faster (1s slow properties, 200ms fast) The fastest tests had similar results as well. The larger tests ran with more success with two Kepware Servers (data not shown here).   Conclusions ThingWorx 9.0 is similarly capable of ingesting data using Kepware Server. A single instance can still achieve up to 10k wps. Future scenarios will now make use of ThingWorx 9.0.   Download the updated draft here!
View full tip
Announcing the Final Installment   JMeter for ThingWorx, the Comprehensive Guide and Best Practice Tips This is the final post on using JMeter for ThingWorx. Below there are best practice tips for using JMeter and for load testing in general. Attached to this post is a comprehensive guide including all of the information from every post we've made on JMeter, including the tutorials. For a more central source, feel free to download the guide , or see the past posts here: JMeter for ThingWorx (original post) Building More Complex Tests in JMeter Distributed Testing with JMeter Generating and Reviewing JMeter Results   JMeter Best Practice Tips Use Distributed Testing As already mentioned in a previous post, each JMeter client can only handle about 150-250 threads depending on the complexity of the tests, and each client will need around 1 CPU and up to 8 GB of RAM for the Java heap. Some test plans will run with fewer host resources, so resizing the test client VM up or down is often required during test development. Create a batch or shell script to start the multiple JMeter clients for greater ease of use. Use Non-Graphical Mode Non-graphical mode allows the system to scale up higher; client processing uses up resources just to keep the simulation running, but with graphical mode turned off, there is less of an impact on the response times and other results. Graphical mode is essentially only used for debugging. Turn off Embedded Resources This setting reloads all of the typically cached requests over and over; there will be far more download requests, and to the exclusion of other requests, than is helpful. Ensure this box is not checked, especially in the HTTP Requests Defaults element:   Browser caching means that this setting doesn’t actually simulate a proper user load, given that many of the reloaded resources would not be reloaded by actual users. Use this incrementally, for one or two HTTP requests only, if there is a reason why those requests might need to download fresh images, scripts, or other resources with each call; for instance, simulate page timeouts using this once per hour or something similar. Using this across the whole project will prevent it from scaling well, while not actually simulating real-world conditions. Avoid Using Listeners For instance, the “View Results Tree”, which uses additional resources that may impact the results in disingenuous ways, based around the needs of the clients themselves and not the actual response times of the server. Many listeners are only for debugging a handful of threads while designing the tests. A list of recommended listeners for different purposes is in JMeter documentation. Summary Report is the only one you want enabled, as that exports the results as a csv or similarly formatted file, which can then be used to build reports. JMeter CAN handle SSO JMeter can authenticate into and test an SSO-enabled system. Sometimes the SSO configuration is essential for customers, and they may be quick to assume therefore that they cannot use JMeter, but that's not entirely true. Some external tools that might help with this are BlazeMeter (mentioned again in just a moment) and Fiddler, a good tool for decoding what data a particular SSO setup is exchanging during the authentication process. Use Logic Controllers for Parametrization Parametrization is critical to mirroring a proper user load, and allowing different data sets to be queried or created; the load should seem organic, random in the right ways, with actions occurring at random times, not predictable times, to prevent seeing artificial peaks of usage that don’t represent real usage of the Foundation server. Random order controllers direct the threads down different paths based on random dice rolls, allowing for a randomized collection of user activity each time, not something that has to be regenerated like a set of Boolean values that is specified in an input CSV and used to navigate a series of true or false switches. Switches just look for an environment variable to be either 1 or 0, and when it hits a switch that’s a 1, it triggers the switch below, running them in the order given under the transaction controller that goes with the switch. In this image, the 1’s and 0’s are given in the CSV input file; randomizing that input file therefore randomizes the execution of the switches too:   Use Commercial Add-Ons There are many external, add-on tools and plugins which enhance JMeter’s capabilities. One external tool that can enhance JMeter’s capabilities is Blazemeter, which has some free and some paid options to help create better reports, removing automatically much of the “garbage” REST calls (which would otherwise need to be manually deleted), and provide more consumable test reports right out of the box. Other tools and plugins include: Maven Netbeans SonarQube Jenkins Autometer Gradle Amazon EC2 Lightning IntelliJ IDEA Cassandra Grafana For more best practice information, see the JMeter Best Practice Manual.   General Load Testing Guidelines Concurrency Requirements – How to Properly Estimate the Size of the Load Test Take a brand new ThingWorx-based app. How people will be accessing the system and how often? How many are business users? How many are engineers? What do they do? Many assume that every named user in the corporate LDAP will need to access to the server, often 10s of thousands of users; this generally drastically oversizes the system. Load testing for many thousands of users is very hard and requires a lot of set-up, tuning, and optimization to get right; so if it seems that thousands of users are expected, then validating this claim is important: most customers don’t really have that many concurrent users in an engineering system. Use estimates based on how many people work at which offices, which time zones those people are in, and what kinds of users they are. Do they need access to engineering data? Perhaps there are simpler mashups for them that uses less resources. One tool for these sorts of estimations that PTC offers is the office time zone overlap Windchill Sizing Calculator (shown here) Other ways to estimate include: Analyzing the business processes, things like how long workloads typically take to complete and how many workloads are generated per day, converted into hour, minute, or second as desired for the peak duration, the length of the test. “Day in the Life” modelling, or considering things like “what does user X do in a day?” Maybe, user X checks out some drawings, edits them, and then checks them back in at 4:30. Maybe user Y actually digs into the underlying parts and assemblies, putting in change requests or orders throughout the day, instead of waiting for the end. Models are made based around the types of users. Also consider: What are worst case scenarios? What are the longest running activities? What produces the largest data transfers? What activities have large, heavy data base queries? When is the peak overlap of usage? Beginning and end of day downloads and check ins? Reports that are generated regularly? How do these impact the foreground users? For a simpler estimate, start with a percentage of the named user count, anywhere from 5-15% is a good ballpark percentage. Don’t overestimate to feel like the application has been financially worth it; even if everyone is logged in and using it all at once, which is unlikely, load testing for every single user doesn’t take into account the fact that people pause in between clicking on things to think, type emails, get coffee, and so forth. Fewer people than expected are actually doing concurrent activities like loading web pages and updating data streams. Whenever possible, use concurrency data from existing customer systems to guide the estimate for the new system. Legacy system are great places to start.   Use Grafana to monitor the system side throughout the load test, which is also required to know the test has been successful; also set up Grafana to monitor the application once it goes live, to both prevent and mitigate more rapidly any technical issues with the server. Also remember that PTC Technical Support is here to help! Provide thread dumps with an open case to any TSE, and they will help troubleshoot the tests and review any errors in the ThingWorx or Tomcat logs.    
View full tip
Leveraging Dell and VMWare for Asset Monitoring in Connected Factories   As an extension of the Connected Factory Reference Benchmark performed on Microsoft Azure , PTC partnered with Dell Technologies in producing this document, a baseline which illustrates the effectiveness of ThingWorx and Kepware when combined with Dell and VMWare technologies to create solutions for on-premises and hybrid Connected Factory implementations. Please join us in thanking Bhagyashree Angadi, Brian Anzaldua, Todd Edmunds, Mike Hayes, and the Dell Customer Solution Center team in Limerick, Ireland for working with the IOT Enterprise Deployment Center on this benchmark!   This benchmark is of a very similar design to a previous publication, but this time designed specifically with Dell Technologies in mind. In a Dell/VMWare architecture, the close proximity of Kepware Server and ThingWorx Foundation provides ideal conditions for network throughput between these components. Combined with the ability to easily monitor and resize virtual machines as your business needs evolve, these hardware configurations can be very effective in on-premises or hybrid deployment scenarios.
View full tip
New Scenario Using Multi-Kepware for Asset Monitoring in Connected Factories   A new scenario has been completed for Connected Factory implementations, furthering the IOT EDC's goal of providing a reference library of ThingWorx performance. This scenario builds upon the first, with additional tests being performed to demonstrate the capabilities of multiple Kepware Servers running side-by-side. Horizontal scaling is very common for multi-line factory implementations, so be sure to check out the new scenario in this ever-expanding benchmark document.   Note that tests below 10,000 writes per second were not repeated with multiple Kepware Servers, since there is little reason to desire such a configuration in implementations that small. ThingWorx deployment sizing was also held constant throughout these tests to demonstrate the limits of a given configuration. Changes that may improve the results of a failed test (such as adding CPUs or Memory) will be mentioned but not validated as part of this benchmark.   Let us know about your applications and how they compare with the data shared here. Happy developing!
View full tip