cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - You can Bookmark boards, posts or articles that you'd like to access again easily! X

IoT Tips

Sort by:
Greetings, Community Members! PTC has launched ThingWorx 9.6.0 as of June, and it's now ready for your upgrade! Let's explore the key enhancements in this latest version of ThingWorx.   What are the top three areas of updates in ThingWorx 9.6?   Performance, Scalability, Reliability, and Security: There's a significant boost in file transfer performance between connected devices and the ThingWorx platform. A notable enhancement in server startup performance, with some instances showing an 84% reduction in startup time. Numerous logging improvements, including limitations on log verbosity, log filtration, and configurable log storage capabilities, contribute to the stabilization of the ThingWorx system. Additionally, ThingWorx now supports log extraction to third-party software like Sumologic, Datadog, Splunk, Grafana, etc., utilizing the industry-standard OpenTelemetry framework starting from TWX 9.6. Content Security Policy has been implemented, fortifying ThingWorx against script and data injection attacks, man-in-the-middle (MITM) attacks, and clickjacking. Several tech stack updates in 9.6; support now available for Azure B2C and TLS 1.3 (limited)   Developer Productivity: Introducing a new Collection widget with improved performance, enabling a transition away from the legacy Collection and Repeater widgets. Several other new widgets such as KPI dial widget, Tree selector widget, Progress Tracker widget and other critical enhancements for ThingWorx WebComponents are now available Support for viewing mashup configurations such as layouts, bindings, and widget properties in a read-only mode, helping improve user experience by allowing multiple users to review mashup designs simultaneously without making edits.   Solutions updates: Streamlined continuous improvement with the ability to View and Create Actions in One-Click from performance analysis screens for the Digital Performance Management (DPM) solution. Enhanced performance for the Asset Monitoring & Utilization (AMU) solution, with alarm events creation now handled asynchronously. Several fixes and improvements for Connected Work Cell (CWC), Real-Time Production Performance Monitoring (RTPPM), and DPM such as limits evaluation, messages not displayed, incorrectly calculated KPIs and issues with Running Time on the operator display and more, helping customers achieve continuous improvements in their manufacturing operations   View release notes here and be sure to upgrade to 9.6!   Dilanur Bayraktar ThingWorx Product Management
View full tip
Check out this new framework to achieving digital manufacturing success. Learn about the top 3 areas teams need to consider!    Identify a unified end goal Align it with the most impactful use cases Formulate a lasting strategy that resonates their long-term vision Discover More! 
View full tip
Tune in to The Lean Manufacturer podcast where expert guests bring their outside view of the IIoT and discuss various aspects of manufacturing. Over the course of the series, we’ll cover some of the most important ways the IIoT can maximize manufacturing efficiency and bring value to your organization, including the need for reducing planned and unplanned downtime, enabling operational efficiency, ensuring digital continuous improvement, and so much more.      
View full tip
Are you ready for #WorldIoTDay?
View full tip
PTC was recognized an outright leader in the global IoT market.
View full tip
  Dev Ops is a crucial process that exists in any software setting, whether you plan on it or not. Chaos in the dev ops process, say because less time is spent here than on the shiny new features that are easy to sell, results in bottlenecks in the dev ops process. Bottlenecks reduce efficiency, and leave you open to vulnerabilities as well. The faster you can get a change properly tested and safely into production, the safer and more stable the system is all along.   Issues will arise, they always arise. Are you ready for them? Watch this video, see some of these additional links, and think about your dev ops process now, before the fires start!   Useful Links:  ThingWorx Monitoring and Alerting Using Prometheus and Grafana, Part 1 ThingWorx Monitoring and Alerting Using Prometheus and Grafana, Part 2 Overview of Monitoring Tools and Diagnostics The System Health Timer
View full tip
Architecting Reason Code Trees in DPM Tori Firewind, IoT EDC   What are Machine Codes? Factory hardware devices communicate status changes to their human operators and other machines (IoT) via machine codes. The manufacturers often determine the machine codes for different types of factory hardware, so those are often pre-determined. However, how the reason trees map these machine codes to corresponding business logic in ThingWorx is entirely customizable. Knowing the best way to design your reason trees for this purpose can be challenging, so this guide is here to help with your conceptual knowledge. Using the UI to create, edit, and configure reason codes in technical detail can be found in the Help Center.    The Tree Trunk At the highest level of the reason tree, the trunk, there are really 3 categories: Availability (A), Performance or Productivity (P), and Quality (Q). These should look familiar; they are the three dimensions of OEE (Overall Equipment Effectiveness). Fg 1. Calculation of OEE Availability refers to long stops, events that stop planned production long enough that it makes sense to track a reason for being down (typically several minutes, but the threshold between a long stop and a short stop can vary depending on the ideal rate of production of materials).                  Availability = Run Time / Planned Production Time   Productivity/Performance really refers to short stops, things that cause the machine to run at less-than- optimal speeds. This can include stops caused by running out of materials for production, doing minor maintenance like switching out a single, easily-changed part, or even frequent breaks due to ill health of an operator. User error can be a cause as well, say if the machine needs a certain heat to produce parts, and the heat keeps fluctuating (requiring the machine to take the time to calibrate for this before starting on production) because operators are smoking out a back door or adjusting thermostat temperatures. Fg 2. Levels of Runnable Time   Operator influence often is a factor when it comes to the conditions that permit optimal performance from machinery, and every factory may face different challenges. Stops like these are not really outages; the amount of downtime isn’t enough to consider the production block entirely unproductive. Production was continuing and ongoing throughout most of the block despite the issues; the rate was just slower than ideal.                  Performance/Productivity = (Total Count / Run Time) / Ideal Run Rate   Quality refers mostly to the number of items that are considered scrap or rework, and it can be split into two categories: start up scrap (that which is expected because the machine is in the process of warming up or being fine-tuned by the operator) and production scrap (things which come out wrong and must be tossed or reworked because the conditions under which they were produced weren’t ideal; this is called first-pass yield only, meaning it's only a "good" product if it passes inspection the first time).                  Quality = Good Count / Total Count​   The Branches and the Leaves of the Tree The “leaves” are the reason codes which directly map to machine codes , and the “branches” are the method of categorization that connects them to the trunk. Both the leaves and the trees, the children and the parent nodes of the tree, are split into two states: planned versus unplanned downtime. Changeovers, maintenance, and even scrap, can be broken down into this dichotomy.   For scrap, there are startup rejects (planned, because the machines have ramp up periods) and production rejects (unplanned, because the conditions weren’t ideal). For maintenance there is planned and unplanned, small changes that occur on the fly that result in productivity loss, and maybe also reduce availability in the long run. Small, unplanned changes can occasionally shift into the availability loss category if a simple, quick repair winds up being complex and time-consuming. A good reason tree can differentiate easily between short and longer stops in order to respond to each in a deliberate way.   To start off in the process of architecting your reason tree, try writing the three categories on a board in a common room in an average factory (or several as a survey). Ask operators to stop in over the course of a few days and write various machine codes that they see often and find useful under one of these categories, or more than one if the machine code pops up under different circumstances and can mean different things. Have them write a 10 word justification, if the association isn’t obvious. Gather all of the “leaves” in this way, and then begin to associate them with the “trunk”, forming the “branches”.   An example tree can be seen in figure 3 here, with leaves like “Changeover” and “Maintenance” being semi-ambiguous; they could just as easily be seen as unplanned stops. Therefore, there may be multiple reason codes mapping up to the top of the tree in more than one branch, and these can have different categories, which controls how the business logic responds to the different codes. The Help Center has more details about how the events are mapped to types, and each type contains multiple categories, as configured by you when you set-up the DPM model. Fg 3: Different types of changeovers may have different codes, and can map up as either planned or unplanned, but all planned and unplanned stops (long stops) are under the Availability category of the trunk. Similarly, small stops can involve idling, like if there are not enough materials, reduced speed if the conditions are not ideal, or other small stops, usually caused by human error or unforeseeable circumstances. Quality loss then refers to the products which fail quality checks, either because the machine still has the wrong paint in the applicator and needs a few rounds to be ready for the next production item, or because the conditions are again, not ideal, and items wind up scrapped.   Example Reason Tree Fg 5 example tree with more specific tags (there may be dozens or hundreds in a full reason tree, though the fewer are needed to capture the events we care about, the better).   Theory of Constraints Fg 6 theory of constraint wheel: an industry process for gradual OEE improvement in factories that has been adapted into the PTC methodology as well. While architecting your reason tree, always remember the key purpose: gathering only as much data as necessary to analyze the efficiency of a factory and to identify the bottleneck, or the most limiting factor. The important point is to identify not just the bottleneck that seems the most troublesome, but the one that actually results in the greatest impact to OEE across the entire factory.   Without software like DPM, and a properly designed reason code tree, the process of improving a factory can be very challenging, involving a lot of guesswork, and sometimes solving one problem at the cost of another. The issue is that these machines produce a LOT of raw data, and humans are not the best tool available to gather and aggregate this data in a consumable way. A good reason tree ensures a smart application that can quickly prioritize the machine (bottleneck) that most impacts production, and not just the machine that functions in the least optimal way.   So, the theory of constraints is really a process for identifying small, incremental changes, which together can make a big difference, and fast, in factory OEE. The rate at which this cycle can be completed varies, however. The slower the process of identifying constraints and the less information that is gathered, the slower and less precise the first two steps of this process. Alternatively, in a traditional constraint identification process, too much information can be a problem as well, due to human limitation, as discussed above. So, DPM is a great benefit in this regard, because it aggregates the data into a consumable, comparable way every 5 minutes, freeing up your human analytics for problem solving and prioritization, and not data gathering and sorting.   Other Key Tips Also remember that a good tree treats the trunk like a whole unit, with each category occupying a percentage of the overall OEE. Afterall, look back up at the 3 dimensions of OEE in the equation above. For example, the more you see issues with availability, the less you will see issues with scrap, for the machine simply doesn’t have as much time to produce scrap if it is constantly down. The more you see issues with quality loss, the less you should see of productivity loss, because these are simply inversely proportional modes; to say it differently, if a machine is running quickly and seeing few minor maintenance stops, then it is likely to produce more scrap (as well as more good product as well).   Another thing to remember is that even DPM is limited in its capacity to interpret raw data. Even while many magnitudes more efficient than any human gathering and analysis could ever be, there is an upper limit to how much raw data DPM can ingest and analyze before the system gets very expensive. For this reason, you want to ensure your reason trees use only as many reason codes as are required to capture the OEE of a factory site. This will mean using different codes for different types of things, most likely, which is easy to do maintainably across many sites using thing shapes. Keeping things tightly defined and organized is the easiest way to ensure a clean, efficient system for gathering and storing data.   Also remember that data will not need to persist very long once DPM is fully operational and adopted by your factories. DPM ensures that the changes made to the production line to improve efficiency are the highest impact, and the least difficult to implement, meaning that there will be a very rapid return on investment, and a process to ensure future issues are identified and resolved quickly. Data from past issues in the factory won’t be as relevant, and historical data stores can be kept smaller than one might think. It is the power of ingesting data directly into the processing and aggregation process, the automatic reduction of data down into presentable, consumable webpages, that makes DPM and ThingWorx such a great factory solution for optimizing OEE.
View full tip
ThingWorx Monitoring and Alerting, Part 2 Using Prometheus and Grafana By Tori Firewind, IoT EDC Building Dashboards     To add a panel which monitors some component of the ThingWorx application to a dashboard in Grafana, click to add a new panel. Under “Metrics” in the box at the bottom of the screen, select what ThingWorx metrics you wish to monitor (type “thingworx” in the search box to see them all). For example, select the Platform Subsystem memory in use:     Label filters aren’t necessary, though you may want to sort by instance if you are monitoring multiple ones with the same dashboard. You may also want to take some time to format the Y axis, which by default will show in bytes. Go to the formatting panel on the right side and scroll down to the section called “Standard options”. For the Unit dropdown, start typing “data” and then select “bytes (SI)”. This will automatically determine if the bytes you’ve provided should really print as MB or GB based on how large the numbers are.     Rename the panel, modify it in any other way desired, and then click Apply (last 5 minutes):     Once you add the panel, you can watch the memory usage as it is scraped by selecting the refresh option (10s or 30s, whatever makes sense based on your scrape interval).     The viewing window is stored in the URL, so that you can generate a report for a specific interval (like when a test was occurring), and then store that result or share it in a more compact way: http://localhost:3000/d/nleucPv4k/thingworx-monitoring?orgId=1&from=1668528038732&to=1668536503953  (absolute timestamps):     Dashboards are just collections of panels which report on all of the various metrics of performance and stability that exist for single components of a system. This is because there can be quite a few metrics worth watching for each individual component. Most of the third-party tools come with their own dashboards, but the ThingWorx component is one which for now, requires some thought and creativity.     Consider your use case carefully and look over the various subsystems contained within ThingWorx. Each part of an application is localized to specific subsystems, and some are more business critical than others. What will go at the top of your dashboard? Add rows, add panels per row, and see what the many choices are for watching your system.     Don’t forget that with Telegraf running, VM or machine usage metrics are also available for display on a dashboard. Things like overall CPU and Memory usage are critical to determining the health of a system, as we have demonstrated in our own reasoning in past benchmarks and scale tests. You can create a panel to monitor the mem_used versus mem_total, like so:     Another metric from Telegraf worth adding is the CPU usage, which should be given “percentage” for the units and which needs a label filter of cpu = cpu-total. If we do some resizing and drag-and-dropping, then we now have the first row of a dashboard:     See how the Platform usage climbs steadily and is purged in a cycle? That is the Java Garbage Collection mechanism, and it’s important to remember to leave room for spikes on top of those peaks. Data can also be calculated or processed in some way to make it more useful for determining system health and stability.     The data in the picture below uses the formula submitted = completed + number queued + number failed. It shows the current queue on the left Y-axis and the max queue on the right (since the two numbers usually are drastically different). It looks pretty, but it doesn’t really tell us much about the system in this format, so let’s do some math and find a representation that is a bit more helpful.     Performing a “non-negative derivative” calculation over the submitted and the completed queue counts over time allows for us to look at the status of the queue as a velocity. When the “complete” speed appears behind the “submitted” speed for too long at a time, then that means the queue is filling up and will eventually result in data loss.     If we take this one step further and calculate the average of the submitted minus the completed over time, then we can actually predict approximately when the queue will fill up. This can then be displayed on a dashboard in Grafana, or used as the basis for an alert.   What to Monitor     In addition to monitoring the system which ThingWorx runs upon, ThingWorx itself can easily be monitored down to the subsystems level by Prometheus due to the Metrics endpoint. Many applications have support built into the way they format the data for scraping, including the JVM (which exposes Prometheus-formatted metrics with the JMX Exporter) and the OS (which can use the Node Exporter or Telegraf for the same purpose). For these more generic components, there are popular community dashboards which can be downloaded and used in Grafana for data analysis and review.     For ThingWorx, there’s different kinds of data to track: subsystem data (see the list on the right) and non-subsystem data. There’s queue based versus non-queue data. These different metrics can collectively characterize the overall health of the application, depending on the use case.     For instance, if this is a system with very many connected devices, one metric which may be important to track is the number of total devices defined on the Foundation server vs. the number of devices which are currently connected. If there are relays involved, then many devices suddenly going offline can mean a relay has failed. Another example is if the system sizing depends on an assumption that there will only ever be a fraction of the total number of devices connected at a time. Use cases like these could be monitored easily by keeping track of the total vs. the number of connected devices.     Other common indicators of a healthy ThingWorx application might include the value stream and stream queues. These queues should fluctuate over time as the data is ingested and processed, but they should never be growing in size. If the stream queues are growing, then that means the data is writing to the queues faster than the queues can write to the database. Eventually, when the system runs out of resources to keep track of the queues, data will be lost. Having the stream information displayed in a chart can make it very easy to spot an upward trend in resource usage early on, which can catch a blockage or bottleneck that needs attention before it starts to affect the larger system in catastrophic ways later.     Memory usage information from the various subsystems might be something worth tracking, as well as the event queue. These can indicate that the business logic is functioning with room to handle spikes, and that the server has enough memory to service all three dimensions of an IoT application: the ingestion, the business logic and thing-based alerting, and the user experience and UI. If file transfers are a key part of the use case, then the number of concurrent transfers, the average speed of them, the size of the files, all of this kind of stuff can be tracked and charted in Grafana by making use of the ThingWorx metrics which automatically show up there once you import the Prometheus data source.     A mature dashboard used for a production environment might look a little like this: For further reading about subsystem monitoring, check out the Help Center.   How to Alert     The alerting mechanism built-in to Prometheus is incredibly easy to configure, so it might be tempting to generate tons of alert rules. However, remember that the more noise a system makes, the harder it is for those monitoring that system to know when action is really required. Playbooks which document how to respond to alerts, who to contact, how to act, and all the information necessary to handle an alert, should be created as an ongoing part of the DevOps process.     Alerts should fire with the right severity in the subject line, as well as all of the information about the issue that is currently known, presented in a concise way, so that whoever receives the alert starts thinking about the root cause sooner and recovers the system faster. Those who receive the alert should have the ability to facilitate its resolution, and know who is expected to react to any alerts which come in.     In the ThingWorx monitoring stack, Prometheus handles the alert rules and the generation of alerts, but alert filtering and delivery is managed in an external alerts manager.     Generally, you want your alerts to follow a curve. If the current queue size exceeds 50% of the maximum, perhaps that isn’t a huge deal, if the application catches up quickly. How long are spikes in queue processing expected to last? Perhaps if the queue size is over half-full for 10 seconds, 30 seconds, then that means the queue is falling behind and not catching up. Ok, so this might be a warning level alert. When does this become an error? Well, let’s say the queue exceeds 90% of the max queue size. This might want to alert the moment it hits the mark. Now, farther along the curve, it may not take as long before data gets lost.     As the severity of the situation increases, the threshold for alerting should increase as well. That way when errors do alert, it is a sure thing that they require a response immediately. The alerts are then pushed into the “Alerts Manager” for delivery based on your management rules. The Alerts Manager may decide to withhold warnings altogether, or send them to a much smaller mailing list, whatever filtering helps to ensure the right people receive the right alerts, right when they need them.   In Conclusion, A Healthy Application...     Has stable memory usage that fluctuates predictably and doesn’t grow over time. In a system experiencing mild issues, the memory starts to trend upward:     If left unattended, systems like this may eventually experience outages. Finding the issue this early means there is even time to do some digging, debugging, taking of stack traces, and other such troubleshooting steps before the system must be restarted or recovered. That can really make the difference in identifying and resolving before there are real problems.     One metric which makes for good alerting is the total number of failed stream entries, which can indicate there’s an issue writing to the database even before the queue has started to fill up. Other alerts may include warnings and errors based around percentages of memory used or queues filled, which depend on how long the queues take to fill up and how long the state has been at its increased usage.     Prometheus has all of the tools necessary to make this possible across a variety of infrastructures and use cases. Set it up on a local machine and poke around at what ThingWorx metrics are available to meet your monitoring needs.
View full tip
This video is Module 10: ThingWorx Foundation & Analytics Integration of the ThingWorx Analytics Training videos. It gives a brief review of core ThingWorx Platform functionality, and how the Analytics server works on top of the platform. It also describes the process of creating a simple application, complete with a mashup to display the information from a predictive model.
View full tip
This video continues Module 9: Anomaly Detection of the ThingWorx Analytics Training videos. It begins with a ThingWatcher exercise, and concludes by describing Statistical Process Control (SPC). The "SPC Accelerator" will be covered in Module 9 Part 3.
View full tip
This video begins Module 9: Anomaly Detection of the ThingWorx Analytics Training videos. It describes how Thingwatcher can be set up to monitor values streaming from connected assets, and send an alert if its behavior deviates from its 'normal' behavior.
View full tip
This video concludes Module 8: Time Series Modeling of the ThingWorx Analytics Training videos. 
View full tip
This video continues Module 8: Time Series Modeling of the ThingWorx Analytics Training videos. It continues to show how ThingWorx Analytics automatically transforms time series datasets into ones that are ready for machine learning. It also describes the concept of virtual sensors. It finishes by describing the time series dataset that will be used in the following modules.
View full tip
This video begins Module 8: Time Series Modeling of the ThingWorx Analytics Training videos. It describes the differences between time series and cross-sectional datasets. It begins to show how ThingWorx Analytics automatically transforms time series datasets into ones that are ready for machine learning. 
View full tip
This video concludes Module 7: Predictive & Prescriptive Scoring of the ThingWorx Analytics Training videos. It describes how ThingWorx Analytics automatically evaluates a range of values for chosen fields to produce prescriptive scores. 
View full tip
This video begins Module 7: Predictive & Prescriptive Scoring of the ThingWorx Analytics Training videos. It describes how a trained machine learning model takes inputs and makes predictions of different kinds, depending on the use case. It shows how scoring works in production, taking inputs from various sources and producing a score to help users make informed decisions. It also covers the concept of field importance in an individual score.
View full tip
This video concludes Module 6: Predictive Models & Model Validation of the ThingWorx Analytics Training videos. 
View full tip
This video continues Module 6: Predictive Models & Model Validation of the ThingWorx Analytics Training videos. It covers some modeling techniques to help build better predictive models. It discusses the dangers of models that overfit data, and how to avoid overfitting. 
View full tip
This video continues Module 6: Predictive Models & Model Validation of the ThingWorx Analytics Training videos. It then begins to describe some of the performance metrics used to evaluate predictive models. 
View full tip
This video continues Module 6: Predictive Models & Model Validation of the ThingWorx Analytics Training videos. It describes the remaining machine learning algorithms used by ThingWorx Analytics to build predictive models that weren't covered in Part 1. In addition, this video describes the different kinds of ensembles you can build that utilize multiple algorithms. 
View full tip
This video begins Module 6: Predictive Models & Model Validation of the ThingWorx Analytics Training videos. It gives examples of different types of goal variables. It also discusses data considerations in predictive modeling, It begins describing the machine learning algorithms used by ThingWorx Analytics to build predictive models. 
View full tip
Announcements